Using Few-Shot Learning Materials of Multiple SPOCs to Develop Early Warning Systems to Detect Students at Risk

Early warning systems (EWSs) have been successfully used in online classes, especially in massive open online courses, where it is nearly impossible for students to interact face-to-face with their teachers. Although teachers in higher education institutions typically have smaller class sizes, they also face the challenge of being unable to have direct contact with their students during distance teaching. In this research, we examined the online learning trajectories of students participating in four small private online courses that were all taught by one teacher. We collected relevant data of 1,307 students from the campus learning management system. Subsequently, we constructed 18 prediction models, one for each week of the course, to develop an EWS for identifying students in online asynchronous learning at risk of failing (i.e., students who fail their final examination). Our results indicated that the fifth-week model successfully predicted student performance, with an accuracy exceeding 83% from the eighth week onward.


Introduction
Learning management systems (LMSs) are used to quantify the learning behavior of students, enabling teachers to obtain data that are unavailable through face-to-face teaching in physical classrooms. Teachers can model or predict students' behaviors by using data mining or analysis (Papamitsiou & Economides, 2014). Massive open online courses (MOOCs) are particularly suitable for learning analytics or building prediction models because they involve the accumulation of large amounts of student data, which is helpful for the early detection of students who may be unable to complete such an online course (He et al., 2015) or for predicting academic results (Li et al., 2017). An early warning system (EWS) for online teaching is a precision teaching tool. Institutions of higher education have achieved digital transformation through the value-added application of learning data. Teachers have consequently become adept at running online courses, which may include setting up decision support systems (Kotsiantis, 2011), conducting instructional interventions at the most appropriate time by using EWSs (Howard et al., 2018), and predicting academic failure (Costa et al., 2017). Research in this field has focused on collecting data on students who are "at risk" or "off track" and determining why they failed or ceased learning; however, studies have focused on the period following the completion of courses, which is too late to provide adequate support to these students (Hu et al., 2014). Related research has also revealed that teachers can use the LMS data of single online courses on platforms such as Moodle (Cerezo et al., 2016;Romero et al., 2008) and Blackboard (Morris et al., 2005;Tempelaar et al., 2015) to build effective predictive models as warning systems (Hu et al., 2014;Macfadyen & Dawson, 2010). However, for small private online courses (SPOCs) in universities, current empirical research has focused on how the demand for teachers to build EWSs for asynchronous distance teaching courses through small samples from SPOCs has decreased. This decline may be due to the limitations of having fewer students in a class or the convenience of face-to-face consultations between teachers and students on campus. Our review of the literature also revealed that few teachers are able to use the data from multiple courses in the LMS of their institution to successfully develop portable prediction models as warning systems. Researchers have argued that this may be due to the differences between the courses and their instructional design (Gašević et al., 2016;Macfadyen & Dawson, 2010). Even if the data learning models of various courses within an institution are designed with high prediction accuracy, substantial differences may remain in the accuracy of the models (Conijn et al., 2017;Gašević et al., 2016).
Therefore, although more institutions of higher education are offering SPOCs, research analyzing the use of few-shot learning materials for developing warning systems for SPOCs remains limited. Using few-shot learning is to predict something based on a few limited training examples. Currently, teachers are facing 10%-20% higher dropout rates for online courses than for face-to-face courses (Bawa, 2016). Thus, teachers require tools to help them identify struggling students before they drop out or fail. In this study, we collected small-sample data from different courses taught by the same teacher, while the courses were running, to build a portable student learning prediction model that can act as a warning system in SPOCs.
Students who are at risk can be identified by analyzing data from the students' online learning trajectory that are accumulated and entered weekly into the LMS. "Students at risk" in the current paper refers to students who scored lower than 60 points on the course's final assessment. In this study, we addressed the primary research question-How can teachers use few-shot learning materials from multiple SPOCs to develop an EWS to detect students at risk?-as well as the following two related research questions: 1. How far in advance can the model predict a student's academic performance?
3 2. Can the model be used to predict academic performance in other courses taught by the same teacher?

Educational Data Mining
Data mining is widely used in educational institutions. The goal of educational data mining (EDM) is generally to explore the meaning behind data to improve the teaching process (Saa et al., 2019). In EDM, statistical models, mathematical algorithms, and machine learning methods are employed to analyze large data sets and reveal the correlation between learning behavior patterns and results. EDM enables teachers to gain an overview of the effective learning and behavior of students in the learning process (Ramaswami & Bhaskaran, 2009). Baradwaj and Pal (2011) summarized common data mining algorithms, including classification, clustering, the regression technique, the association rule, neural networks, decision trees, and the nearest neighbor method. Numerous researchers have applied these EDM techniques to predict student performance (Francis & Babu, 2019;Okubo et al., 2017;Sana et al., 2019).
EDM involves several steps. The first step is to determine the purpose of the research and collect data from an appropriate educational environment. The second step is to perform data preprocessing procedures.
Subsequently, a prediction model is trained. After the model or pattern is established, the EDM results can provide the teacher with feedback for decision making or intervention. EDM has several applications such as predicting student performance; providing feedback for supporting instructors; offering personalization or recommendations to students; creating alerts for stakeholders; and performing student modeling, domain modeling, and student grouping and profiling (Baker et al., 2012;Romero & Ventura, 2013).
Along with the popularization of distance education, EDM research on LMS databases has also increased.
For example, Chen et al. (2018) analyzed students' learning behavior data in short online courses and predicted students' learning performance at an early stage, i.e., after the first week of class (area under the curve ≥ 0.7). Kim et al. (2018) used deep learning to predict the results of students enrolled in online courses. Another study analyzed the LMS data of 658 students from nine courses in the first week and found that the online learning behaviors of students who passed the course differed significantly from those of students who did not pass (Milne et al., 2012).
As mentioned, EDM can be used to predict student learning performance, which then enables teachers to intervene early to improve student learning effectiveness. Currently, teachers can apply EDM technology first to establish a predictive model and subsequently to determine students' actual behavior in the LMS; teachers can then apply a data-driven teaching intervention. This process involves teachers establishing a scientific EWS to help students succeed.

EWSs in Education
EWSs have been used by educational institutions to identify students who are at risk or off track (Barry & Reschly, 2012). An EWS helps teachers understand students' behavior and performance through the collection of student behavioral data and building of a prediction model based on an algorithm. For example, researchers analyzed the behavioral data of students in distance courses at the Open University in the United Kingdom to predict their participation rate (Hussain et al., 2018). Teachers of distance courses can improve their students' learning and participation by establishing monitoring and guidance strategies 4 on the basis of information from an EWS (Rodrigues et al., 2016) and providing timely interventions and remedies, especially in situations where a student is unable to satisfy specific indicators (Howard et al., 2018). One of Europe's largest distance education institutions, the Open University, developed four prediction models to identify students at risk of failure at an early stage of a course; these results are provided to teachers every week in the form of a feedback dashboard (Wolff et al., 2014). Baker et al. (2015) built a model to make early predictions regarding the success and failure of students by analyzing students' online course activity data. The accuracy rate of the model in identifying students most likely to perform poorly was 59.5%. Other research used the EWS plug-in on Moodle to build prediction models, and the accuracy rate was 60.8% (Jokhan et al., 2018). The model developed by Conijn et al. (2016) for predicting whether students would be able to pass their courses achieved an overall accuracy rate of 68.7%. Related research has revealed that EWS prediction models differ in terms of their accuracy in various distance courses. However, the key to a successful EWS remains whether teachers are able to obtain a highly accurate prediction model.
To enable the wider use of prediction models, researchers have considered the portability of such models (Gašević et al., 2016;Jayaprakash et al., 2014). For example, in the 2011 Open Academic Analytics Initiative, an open-source model for predicting student success was developed (Lauría et al., 2012). Subsequently, these researchers performed a cross-institution practical test with data from Purdue University and Marist College (N = 18,968 and 27,276, respectively) to assess the portability of the student performance prediction model. The results revealed that although the LMS as well as teaching methods and types differed between these two institutions, similarities could be found in the student performance prediction model and related analysis. Another study investigated the portability of prediction models among various courses in the same institution, revealing poorer results than those obtained in the aforementioned research. The researchers suggested that the poor results were due to the difference in instructional design between the courses . Thus, if highly dissimilar instructional designs are used in different courses, considerable disparities might also appear in the degree of use of the LMS module.
To enable regular teachers to use small samples from multiple SPOCs to promote precision education, scholars have expanded empirical research to consider the portability of prediction models. In the current research, we collected small-sample data from four asynchronous distance courses offered through an LMS at a public university of science and technology in central Taiwan; the courses were all taught by the same teacher. The data were used to build a prediction model that was then developed into an EWS for identifying students at risk of failing the course; the EWS was subsequently tested on a new course. Because the courses were all taught by the same teacher, their instructional designs were highly similar. This mitigated the effect of instructional design differences on the model.

Methodology Participants and Data Collection
The LMS used in this research recorded every student's detailed learning activities in a database, including platform logins; page clicks; test completions; the opening, closing, and downloading of course materials; the upload of assignments; assignment grades; and browsing and posting behavior in the discussion area.
Data on student activities were saved in a log file format, which meant that a record would be generated whenever an activity occurred. We used an application programming interface (API) to gather the necessary information for the prediction and analysis model. We These courses were all asynchronous online courses with a total of 1,278 students. The courses and their assignments were designed in accordance with the Taiwanese Ministry of Education's digital course certification. Although the courses and their content differed, they were similar in their instructional design and course requirements, such as the weighting of grades, examinations, discussion topics, and number of assignments. Each asynchronous online course lasted 18 weeks. A summary of the online instructional design is provided in Table 1.

Data Preprocessing
Data preprocessing, including data integration and data aggregation, was conducted on data from the LMS database to build the prediction model. The preprocessing stage of this research involved four steps. The first step was to filter out possible features from the database. We used analysis of variance as the basis for filtering learning features. We used the R 3.6.3 data mining software. Twenty features were generated (Table 2). The focus of this research's prediction model was on predicting whether a student would be able to pass the final examination. Every student was assigned a specific label, namely pass or fail. If the student obtained a score of ≥60% for the examination, they received the pass label; otherwise, the fail label was applied. We collected the information of 1,278 students, among which 1,135 passed and 143 failed.
The second step of the preprocessing stage was to collate statistical information that represented every week's cumulative learning progress. We gathered this cumulative learning progress information because the distance courses were all asynchronous. The teacher allowed the students to set their own speed for completing the online learning task within the 18 weeks of the semester. Subsequently, because this research used an unsupervised learning algorithm, an autoencoder was set up. Therefore, the third step involved using [0,1] normalization to normalize the characteristic variables; that is, the range of the 7 characteristics was converted to the 0-1 range. In addition to the data used for classification (i.e., academic performance), all other variables were also normalized. The calculation is expressed in Formula (1).
The fourth step was to divide the information into a training set and test set. We randomly split the information into the training set and test set at a ratio of 7:3. The information in the training set was used to train the model, and the test set was used to evaluate the model to prevent the model from displaying over-fitting results.

Building the Model
Machine learning involves the automatic identification of a complex pattern according to the features extracted from a given data set and the making of an intelligent decision regarding new data (Kotsiantis et al., 2004). We employed a convolutional neural network (CNN) to build the prediction model.
We designed the prediction and analysis model in Python (Bowles, 2015) and used the PyTorch deep learning framework. A total of 18 predictive models were obtained in this research. Each forecasting model was based on 1 week (7 days) of data. When selecting training samples for the weekly predictive model, we selected the data set of students who had actual learning records in the LMS that week. Students who did not exhibit learning behavior that week were excluded from the training model sample for that week. To verify the model, we only selected 70% of each week's student samples for each week's model training. The remaining 30% was retained as the test data set of the predictive models.
Finally, to verify the portability of the prediction models, we gathered data from the Introduction to Artificial Intelligence distance course (N = 59) from the 2019-2020 academic year. That course was selected for verifying predictive models because it was taught by the same teacher and included a similar teaching design and similar course requirements as the courses used for the training models. Moreover, the course was offered at the same institution and used the same LMS as the other four courses.

CNN Performance Evaluation
We used a confusion matrix to verify the prediction model performance in classification. The confusion matrix is a binary classification, which is displayed in a two-by-two table. This table shows the training and performance of the network. The confusion matrix for each week is listed separately, and its format is presented in Table 3. True passed (TP) indicates the student was predicted to pass and eventually did pass. True negative (TN) reveals the number of failing students who were classified accurately. False passed (FP) refers to the number of students who failed the course but had been predicted to pass. False negative (FN) denotes students who were predicted to fail but eventually passed.
The accuracy, sensitivity, specificity, and precision values were calculated from the confusion matrix (Saito & Rehmsmeier, 2015). The relevant values for each model were calculated using equations 2 to 5.
The Fβ measure (F score) was obtained using the precision and sensitivity (recall) values (Toraman et al., 2019). A β value of 0.5, 1, or 2 is typically used (Goutte & Gaussier, 2005). Equation 6 was used to obtain the F score. In this study, β was 2.
A commonly used metric when performing classification is accuracy (Hanley & McNeil, 1982;He & Garcia, 2009). Precision is equivalent to the positive predictive value, and specificity is equal to 1; the TPR(true positive rate) and sensitivity are equivalent to the recall rate, respectively.

Descriptive Statistics and Data Preprocessing
We selected four courses for creating predictive models and one course for verifying the portability of the predictive models. The descriptive statistics are shown in Table 4.

Prediction Model
To create an early-stage prediction model, we obtained data on the features from the training set each week.
We created a total of 18 prediction models based on each week's accumulated data. The confusion matrix was used to determine the specificity, precision, sensitivity, F-Measure and accuracy of the models. The results presented in Table 6. indicated that when looking at accuracy column, we found that the average percentage ranges from 59% at the 2nd week to 84% at the 18th week in training our model. However, the percentage ranges from 57% at the 7th week to 84% at the 18th week in testing our model. Notably, the accuracy of training data rises from 59% at the 7th week to 80% at the 8th week and the accuracy of testing data rises from 57% at the 7th week to 77% at the 8th week. Altogether, it suggests that we could predict whether students will fail or not in the middle of 18 weeks.

Portability of the Prediction Model
We verified the prediction model accuracy against the learning data gathered from the students taking the Introduction to Artificial Intelligence distance course in the 2019-2020 academic year. The prediction model was assessed in terms of its accuracy in predicting the academic performance of the students in this new course; the results revealed an accuracy rate of ≥81% from the eighth week onward. The verification results of the prediction model are displayed in Figure 2.

Figure 2
Weekly Accuracy of the Verified Course

Discussion and Conclusions
Developing an EWS and identifying students at risk in a timely manner is one of the strategies of precision education for which schools and teachers have been advocating. Compared with face-to-face classes, distance courses enable the collection of more student learning information. However, for teachers who do not run MOOCs, gathering sufficient training information to build a usable prediction model themselves is a considerable challenge. The proportion of students who fail their SPOC is often higher than that of students who have face-to-face classes, especially for distance courses that use asynchronous teaching long term or during periods of special restrictions (e.g., contact restriction during a pandemic). Teachers' successful collection of small-sample learning information from multiple SPOCs and training of a portable prediction model would greatly benefit the development of an EWS, enabling teachers to employ precision education. This research is based on few-shot learning practice which feeds a predictive model with a very small amount of training data to discover patterns in data regarding accurate predictions. In this research, we gathered learning information from one teacher's multiple SPOCs on an LMS platform to create an EWS for identifying students at risk of failing. Our results revealed that students at risk can be correctly identified from the fifth week of the course onward on the basis of their online learning behavior (accuracy was 69%).
The model's accuracy reached ≥ 80% for weeks 8, 10, 13, 14, 17, and 18. In this study, we obtained the accuracy of the confusion matrix to verify predictive models' performance. Additionally, the study also Week 14 obtained the sensitivity, specificity, precision, and F measurement for each week to help teachers make comprehensive judgments when choosing different weekly patterns on the basis of their early warning plan.
The main purpose of this research was to collect small-sample information from multiple SPOCs with a similar instructional design and taught by a single teacher to build a usable prediction model. Our findings help expand knowledge on the portability of prediction models and help confirm previous research that has indicated that the difference in instructional design between courses negatively affects the accuracy of student performance prediction. Therefore, teachers may use this prediction model in other distance courses that have similar online instructional designs and apply instructional interventions for students who are identified. Through instructional intervention, the online learning behavior of students taking SPOCs can be modified and their online learning experience enriched, such as through self-regulated learning. We endeavor to expand this research project by integrating automated data collection, feature selection, and model update mechanisms into the prediction model to enhance the model's adaptability and usability.

Practical Implications
In this study, we attempted to address a problem in EWS design: the necessity of first collecting big data on student learning before the development of early warning models. As a possible supporting technology, artificial intelligence has emerged in many industries. However, because of the lack of large data sets, educational institutions have yet to widely adopt this technology. In this context, teachers also miss the opportunity to develop predictive models for their SPOCs and cannot establish an EWS. Because teachers cannot directly supervise students' online learning behaviors as they would in the classroom, students who take online asynchronous courses are at an increased risk of failure.
The findings of this research may be of value to those who teach asynchronous distance courses, educational authorities, and information technology (IT) directors of academic institutions.

Teachers
Teachers should consider other factors in addition to online teaching design and regard the online learning environment as a sustainable and circular ecosystem. For example, in this study, we used former students' learning data sets and used a CNN to establish an early warning model to reduce future students' learning risk. This system is sustainable because new data can be integrated into the early warning model to improve its accuracy. In this manner, teachers can offer precision education through data-driven interventions. This system can support teachers in realizing the digital transformation of education. Such a system also enables teachers to devote more energy to supporting students' success in a timely and personalized manner.

Educational Authorities
Educational authorities should fine-tune their vision, draft policies, and provide funding for the development of learner-oriented artificial intelligence (AI) to enrich students' distance learning experiences and teacher effectiveness in SPOCs. For example, educational authorities could organize seminars to promote dialogue among university teachers, data analysts, and IT specialists. Administrators could also use case studies of successful AI applications in teaching as the basis for training materials to develop AI applications in distance education. Finally, relevant authorities could host conferences or workshops on the ethics of applying AI in education to enhance the knowledge of teachers and related personnel.

IT Directors
IT directors of academic institutions should establish systems that enable teachers to rapidly obtain LMS course data. For example, this could be done by establishing a learning data warehouse where online course data could be stored or providing an automatic access mechanism that gives teachers timely access to data (e.g., through an API). IT directors should also organize and publish descriptions of the data set, such as in a codebook.

Recommendations for Further Research
The data sets we used to build the EWS were all derived from a university in Taiwan. This research also preliminarily verified that the early warning model could be transferred to another course if its instructional design was similar to that of the source course. However, we did not further examine the uncertainty factors that may cause model migration to fail because of the bias in training data collection; this may arise for courses with multicultural learners or in the transfer of the model for use on students in other grades (e.g., K-12).