Internationalizing Professional Development: Using Educational Data Mining to Analyze Learners’ Performance and Dropouts in a French MOOC

This paper uses data mining from a French project management MOOC to study learners’ performance (i.e., grades and persistence) based on a series of variables: age, educational background, socioprofessional status, geographical area, gender, selfversus mandatory-enrollment, and learning intentions. Unlike most studies in this area, we focus on learners from the French-speaking world: France and French-speaking European countries, the Caribbean, North Africa, and Central and West Africa. Results show that the largest gaps in MOOC achievements occur between 1) learners from partner institutions versus self-enrolled learners 2) learners from European countries versus lowand middle-income countries, and 3) learners who are professionally active versus inactive learners (i.e., with available time). Finally, we used the CHAID data-mining method to analyze the main characteristics and discriminant factors of MOOC learner performance and dropout.


Introduction
the "grammar of schooling." For example, people who register for a MOOC may quit after a few weeks because they have acquired the skills or knowledge they wanted and were thus satisfied. Such cases can be considered a learning success from the participant's point of view, as they benefitted from an informal type of learning. However, the participant can also be considered a dropout from the "grammar of schooling" standpoint. Perhaps herein lies the difference between attrition and dropout.
Since the reasons for attrition are a complex issue in MOOCs, in this study we decided to measure persistence in course assessments; this variable is easy to measure empirically and can be considered a type of attrition. This approach was possible because the format of instruction and assessment in the PM MOOC includes weekly evaluations, as well as a final exam, and follows a schedule with weekly deliverables as opposed to self-paced learning. Students can, certainly, audit the course without completing the evaluations, or shift from active to passive participation; but, missing the weekly evaluations, especially after completing the first ones, can indicate a process of attrition in certain cases, especially in light of the learning goals that the participants initially set.

The Context of the Study: The French PM MOOC
The French PM MOOC launched its first edition in September 2013 and has been hosted on the Open edX Learning Management System (LMS) since 2018. The common core curriculum of the MOOC consists of four units within the first 4 weeks and an evaluation at the end of each weekly unit. Learners' global grade is calculated as follows, • Pre-MOOC mind-mapping module: 1% • First four weekly evaluations: 19% • Final exam: 80% Another distinguishing characteristic of this MOOC is the salience of academic cohorts (AC) among its learners. Half of the 6,400 active learners in the September 2018 session were students from partner institutions and had enrolled in the MOOC through their professor. AC students come from French higher-education institutions, and the weight of the MOOC in their curriculum is a powerful incentive for their success. The participation of this "captive" audience is one of the reasons for the high learner completion (success/active learner) rate (56%) in the 12th PM MOOC. According to Jordan (2013), the completion rate in the first edition of the PM MOOC was 50.7%.

Research Questions
Our research questions focus on the relationship between MOOC learners' demographic backgroundsin terms of age, gender, geographical area (region), education, and socio-professional status (SPS)-as independent variables; and student performance-in terms of MOOC final grades and dropout ratesas dependent variables (Table 1).

Study Sample
Participants in our study were registered in the 12th edition of the French PM MOOC. MOOC registration characteristics were as follows, • 18,302 learners were enrolled in the MOOC.
• 6,449 were active learners (i.e., completed at least one weekly evaluation).
• 3,602 of the learners achieved a passing grade.
The study questionnaire was posted on the MOOC platform one week before the beginning of the course.
It was then made available to all enrolled registrants. Of the 18,302 learners enrolled in the MOOC, 1,792 responded to the questionnaire; 42.2% of the respondents were female, and 155 respondents were AC students (Table 2).

Procedure
The questionnaire was created using Google Form, and a link for its online completion was posted during Week 0 of the MOOC (i.e., the introductory week). The second source of data was learners' performance, which was obtained from file extractions through the edX platform.

Questionnaire
Demographic background. We asked demographic and sociological questions regarding participants' age (measured in years), gender, country of residence (the countries were merged into regions), SPS, and prior education.
MOOC certificate as a goal. The question on achievement goal, or intention, was measured using a Likert scale from 1 to 4 for the item: "I want to achieve the course certificate."

Learner Performance Data
Final grade. We determined the final grade by adding the final exam score (80%), the average score obtained in the four weekly evaluations (19%), and the score obtained in the pre-MOOC test (1%).
Dropout rate. Some researchers have attempted to calculate a context-based dropout rate or to take the participants' perspective into account (Liyanagunawardena, Parslow, & Williams, 2014). For example, Henderikx, Kreijns, and Kalz (2017), measured the gap between intention to complete a MOOC and actual behavior (i.e., intention-behavior gap). Using a small sample, they measured this gap to study dropout rates based on intention but they did not provide a single indicator for dropout.
This paper builds on this approach to calculate a context-and intention-based dropout rate. We first calculated persistence rather than the number of participant dropouts, since the term dropout, in the context of a MOOC, can have multiple definitions. It is indeed difficult to assess why certain participants quit after a few days or a few weeks, since their reasons for registering are often disparate. We calculated the assessment persistence index, based on the scoring system of the PM MOOC, and the number of assessments completed as presented in Table 3 below. Table 3 Assessment Persistence Scoring System Persistence score Week 1 assessment Week 2 assessment Week 3 assessment Week 4 assessment This detailed scoring system measures not the gross dropout rate (i.e., the number of learners who dropped out before the final exam), but the number of weeks validated by the participants. Scoring student persistence by considering their completed assessments constitutes a bias, since it excludes auditing participants who may be active, but they are not interested in passing exams or obtaining a certificate. Nevertheless, we sought to define a dropout variable that is as close as possible to the academic definition of dropout (as mentioned earlier in the Attrition and Dropout section) by including the course certificate as a goal. As mentioned above, student dropout rates should be measured in context, which requires considering participants' initial achievement goals or intentions. Hence, we calculated the difference between learners' achievement intention scores (i.e., the drive to obtain the course certificate) and their assessment persistence scores.
The dropout rate was defined as the distance between the students' formal learning goals, set at the beginning of the course, and students' actual achievements. The analysis was achieved by weighing each participant's achievement intention score out of 5 minus their assessment persistence score, which was measured out of 5 (4 weeks + final exam, Table 3). The resulting variable, which ranges from -5 to +5, is a new continuous variable for measuring student dropout based on 1) the achievement intention set at the beginning of the course (i.e., course certification), and 2) assessment persistence up to and including the final exam. Using our model, we can relate the minimum dropout score (-5) to an underestimated forecast of achievement, the 0 score to an accurate forecast of achievement, and the maximum score (5) to an overestimated forecast of achievement.

Participants' Final Grades and Demographic Backgrounds
The assumption of normality in participants' final grade scores was not met (DK-S = .239; p < .001).
Hence, non-parametric tests were used. Examining the final grade distribution, we observed a U shape, displaying high concentrations on both ends ( Figure 1).

RQ1: Final Grades and Age
Analysis shows a negative Spearman rank correlation between final grade and age (rs = -.250; p < .001).
Controlling for AC bias, partial correlation shows a lower index but with the same orientation and significance (rs = -.137; p < .001).

RQ2: Final Grades and Gender
Testing the effect of gender on the final grade with the Mann-Whitney U-test did not reveal any significant differences based on gender.

RQ4: Final Grades and Education
We observed a significant relationship between participants final grade and education (H (6) (2015) finding that "the higher the prior educational attainment, the greater the completion" (p. 7). As such, the enrollment system and participants' demographic backgrounds must be considered when analyzing the relationship between education and performance in a MOOC.

RQ5: Final Grades and Socio-Professional Status (SPS)
Significant differences were found between participants' final grades and socio-professional status

Participant Dropout Rates and Demographic Backgrounds
Dropout score distribution violated normality (DK-S = .151; p < .001). As a result, the statistical analyses were non-parametric.

RQ6: Dropout Rates and Age
Age was marginally correlated with dropout rate (rs = .172; p < .001). Excluding AC participants (who were forced to enroll and were less prone to attrition) from the analysis suppressed the significant relationship between the two variables, and results reveal that age had no effect on dropout rates. This finding contradicts Guo and Reinecke's (2014) and Morris et al.'s (2015) finding that older learners are less prone to attrition.

RQ8: Dropout Rates and Region
France Spanish-speaking developing countries, but they confirm Kizilcec et al.'s (2017) finding that MOOC completion is higher on average in more-versus less-developed countries. Could the nature of the course and the use of the French language, which is a second language in FSDC countries (Ngalasso, 1992), explain this difference? The dropout rate of FFSE participants is close to 0, which indicates a good forecast of achievement, whereas FSDC participants display a relatively high score of overestimated forecast of achievement.
Since, as results show, the geographical variable influences MOOC performance, we examined whether gender differences in dropout rates could be observed and better explained by dividing gender groups into FFSE and FSDC subsamples ( France. We found no significant gender differences in dropout rates in both FFSE and FSDC subsamples. We could argue that the only gender imbalance in dropout rates was caused by AC participants. Indeed, when considering the entire sample, male participant dropout rates were significantly higher than female participants, but when analyzing the geographical subsamples, the gender effect on dropout rates was no longer significant.

RQ9: Dropout Rates and Education
Controlling for AC bias, no significant relationships were found between dropout rates and prior education. Previous research found different relationships between prior education and MOOC completion. For example, Breslow et al. (2013) found only a marginal association between them, but Morris et al. (2015) found a significant link between higher degrees and MOOC completion. Our results are in line with the overall mixed results regarding education attainment and MOOC dropout rates.

RQ11: Characteristics of the Best MOOC Performers
Overall, the geographical factor was found to be a determinant of MOOC achievement and dropout in separate analyses. To answer the question on the most discriminant characteristics of the best performers, we conducted a tree analysis, with CHAID (Chi-square Automatic Interaction Detection) as an educational data mining method to examine the predictive variables of MOOC success using SPSS.
We used this method to determine whether the previous results, which were obtained separately by subsampling, could be verified through an automatic data mining method, such as CHAID analysis. As previously demonstrated, the demographic variables were strong indicators of MOOC performance.
Our predictive variables were region, gender, age, education, and professional status. We excluded the AC participants from our analyses as their presence in the sample would constitute a bias, since they were forced to enroll in the MOOC. If we had included them, the results would have been overly unbalanced between learners from FSDC and FFSE countries for MOOC performance and dropout rates as our previous results demonstrate. Results show that the main discriminant factor (the first node) of final grades and dropout rates is the region variable (Figures 3 and 4): FFSE participants had higher achievement scores than FSDC participants.    Finally, we verified learner performance based on the MOOC scoring system and instructional design (i.e., pass or fail). Our goal was to analyze only the achievement or non-achievement factor, without considering grade means. The final grades were mathematically divided into three categories ( Figure   5). • Group 1 (final grade between 0 and 19.99) is the dropout category: Less than four weekly evaluations were completed.
• Group 2 (final grade between 20 and 69.99) is the middle category: Weekly evaluations were completed and participants failed the final exam.
• Group 3 (final grade between 70 and 100) is the passing group: Weekly evaluations were completed and final exam was passed.
We transformed the final grade data into three discrete grade groups respecting this grading structure (Table 5).

Discussion and Conclusion Main Results
We found that the biggest gap in MOOC achievement, if we omit students who were forced to enroll in an institutional context, occurred between learners from European and low-and middle-income countries. A U-shaped grade curve was observed in all of our samples. Moreover, the better performance of students and job seekers among FFSE participants highlights the importance of time availability. The results regarding MOOC completion and performance and AC students show that formal for-credit learning is a key driver of MOOC success among participants from FFSE countries. These learners had higher achievement levels than learners who enrolled for professional development reasons, whether they were European or from LMIC.
The definition of dropout must also be considered in context. We chose to consider dropout rates in the context of achieving the learning goal to obtain a certificate, set at the beginning of the course. For other purposes, we could have chosen to weigh dropout rates against other learning intentions. This perspective underscores the multifactorial aspect of online course achievement: Motivation and time availability are necessary but non-sufficient factors in success. The lower grades and higher dropout rates of learners from LMIC emphasize the significance of social and economic determinants of achievement (e.g., learning environment and technology access). The CHAID analyses led us to predict that a specific subsample will underachieve compared to the global sample: Participants above the age of 27 years from LMIC. Based on results from this EDM method, we propose that instructional design for international professional development MOOCs should address the issues that this specific group encounters.
The Importance of Context for MOOC Design Elias (2011) highlights the challenges inherent to mobile learning in Africa. It is important to consider the access and connectivity problems African learners face (Kaliisa & Picard, 2017)  phones display a higher attrition rate (-28%) than European mobile connections (-23%) between the first and fourth week of class (Figure 7). One way to intervene effectively would be, for example, to plan lighter and more mobile-responsive online courses. Another aspect of interest is the content delivered. There is a lack of local and contextualized content in MOOCs and in online education in general, as many studies point out (Czerniewicz, Deacon, Small, & Walji, 2014;King, Luan, & Lopes, 2018;King, Pegrum, & Forsey, 2018;Nkuyubwatsi, 2014;Nti, 2015, as cited in Launois et al., 2019). The digital divide concerns not only access but also use (Zillien & Hargittai, 2009). Liyanagunawardena, Williams and Adams (2013) note that even when there is access to good Internet connectivity, poor digital literacy skills pose a barrier. As Richter and McPherson (2012) assert regarding open educational resources, MOOCs are "produced in Western industrialized countries [and] may not necessarily fit the needs of learners in developing countries" (p. 203). MOOCs are "primarily organized by universities and address topics on an academic level" (Rohs & Ganz, 2015, p. 9).

Study Limitations
Our conclusions draw upon student results in one session of the French PM MOOC. This is the main limitation of this research, although we included a relatively large and heterogeneous sample.
Nevertheless, this study can pave the way to broader studies involving comparative analyses among different geographical areas within the French-speaking world, since, as noted in the introduction, such studies are limited. Furthermore, we analyzed MOOC success through the prism of formal success (i.e., learners' final grade). It would be relevant to include among learning benefits participation itself, taking into consideration the cultural and economic context of the participants and their points of view (e.g., on their reasons for participating and self-assessed learning), as some researchers propose (Gamage, Perera, & Fernando, 2016;Guàrdia, Maina, & Sangrà, 2013;Liyanagunawardena et al., 2014).

Implications for Practice and Research
Many studies (Castillo, Lee, Zahra, & Wagner, 2015;Daniel, Vázquez Cano, & Gisbert, 2015;Nkuyubwatsi, 2014) suggest adapting online learning content to the local contexts of developing countries (Murugesan et al., 2017) and providing guidance and support to the learners (Patru & Balaji, 2016). In order to adapt the French PM MOOC to local contexts, we have implemented a set of interventions, including • Sharing project management tools on dedicated social network groups (e.g., Facebook; Figure   8), where African learners can share contextualized productions on a familiar platform.
• Setting up a discussion forum related to each course video, in which African participants can discuss local issues.
• Establishing a dedicated team track for each session. The GdP-Lab hosts five to 10 team projects, mostly from Africa.
• Encouraging student-to-student feedback (e.g., peer review of deliverables from a case study on the advanced track).
Finally, one third of the MOOC tutoring team is based in Africa. These methods could contribute to the high completion rate of African participants in this MOOC compared to most others. In conclusion, further research is needed to address the technology learners use to access MOOCs, learners' geographical and cultural context, and learners' demographic backgrounds in order to enhance the achievement rate of specific audiences, such as "older" participants from LMIC, as our empirical results show.