Exploring Demographics and Students’ Motivation as Predictors of Completion of a Massive Open Online Course

This paper investigates the degree to which different variables affect the completion of a Massive Open Online Course (MOOC). Data on those variables, such as age, gender, English proficiency, education level, and motivation for course enrollment were first collected through a pre-course survey. Next, course completion records were collected via the Coursera database. Finally, multiple binomial logistic regression models were used to identify factors related to MOOC completion. Although students were grouped according to their preferences, working in groups did not affect students’ likelihood for MOOC completion. Also, other variables such as age, the institution hosting the MOOC, academic program alignment with students’ needs, and students’ intention to complete the course all affected their probability of MOOC completion. This study contributes to the literature by indicating the factors that influence the probability of MOOC completion. Results show that older participants (age > 50 years old) have higher probability of completing the MOOC. Students’ MOOC completion also increases when the MOOC provides experiences that add to students’ current academic backgrounds and when they are hosted by institutions with a strong academic reputation. Based on these factors, this study contributes to research methods in MOOCs by proposing a model that is aligned with the most important factors predicting completion as recommended by the current MOOC literature. For the next phase of assigning learners to work in groups, findings from this study also suggest that MOOC instructors should provide assistance for group work and monitor students’ collaborative processes.


Résumé de l'article
This paper investigates the degree to which different variables affect the completion of a Massive Open Online Course (MOOC). Data on those variables, such as age, gender, English proficiency, education level, and motivation for course enrollment were first collected through a pre-course survey. Next, course completion records were collected via the Coursera database. Finally, multiple binomial logistic regression models were used to identify factors related to MOOC completion. Although students were grouped according to their preferences, working in groups did not affect students' likelihood for MOOC completion. Also, other variables such as age, the institution hosting the MOOC, academic program alignment with students' needs, and students' intention to complete the course all affected their probability of MOOC completion. This study contributes to the literature by indicating the factors that influence the probability of MOOC completion. Results show that older participants (age > 50 years old) have higher probability of completing the MOOC. Students' MOOC completion also increases when the MOOC provides experiences that add to students' current academic backgrounds and when they are hosted by institutions with a strong academic reputation. Based on these factors, this study contributes to research methods in MOOCs by proposing a model that is aligned with the most important factors predicting completion as recommended by the current MOOC literature. For the next phase of assigning learners to work in groups, findings from this study also suggest that MOOC instructors should provide assistance for group work and monitor students' collaborative processes.

Introduction
The Massive Open Online Course (MOOC) is a popular online learning platform in which millions of students enroll. MOOCs offer educational opportunities for people who otherwise could not afford a formal education (Dillahunt, Wang, & Teasley, 2014). Unfortunately, MOOCs are also known for their high attrition rates (Ho et al., 2014;Lim, Coetzee, Hartmann, Fox, & Hearst, 2014;Malan, 2013). According to Liyanagunawardena, Adams, and Williams (2013), most MOOCs have a completion rate of less than 10%.
Many studies have been conducted to identify factors that contribute to MOOC completion; however, the findings vary across those studies. Cisel (2014) indicated that learner performance in MOOCs was highly correlated with the learner's geographic location, employment status, and time constraints, and that unemployed learners from high Human Development Index (HDI) countries were more likely to complete the course. Other variables that have been examined for their effects on MOOC completion include years of education (Guo & Reinecke, 2014;Schulze, 2014), friends' performance in a MOOC (Brown et al., 2015), prior online learning experience (Morris, Hotchkiss, & Swinnerton, 2015), English proficiency (Engle, Mankoff, & Carbrey, 2015;Konstan, Walker, Brooks, Brown, & Ekstrand, 2015;Schulze, 2014), number of posts and number of videos watched (Bonafini, 2017;Bonafini, Chae, Park, & Jablokow, 2017), gender (Bayeck, Hristova, Jablokow, & Bonafini, 2018;Breslow et al., 2013;Konstan et al., 2015;Schulze, 2014), and age (Breslow et al., 2013;Guo & Reinecke, 2014;Konstan et al., 2015;Morris et al., 2015;Schulze, 2014;Zhang et al., 2016). Most of the studies agree that there is a positive relationship between age and MOOC completion rates. Zhang et al. (2016) concluded that learners with age over 40 years who intended to complete the course achieved higher MOOC completion rates. In addition, Morris, Hotchkiss, and Swinnerton (2015) found that unemployed and older learners who had higher levels of education and previous online learning experiences tended to achieve higher course completion rates.
In spite of the numerous studies conducted to study learner motivations for enrolling in MOOCs (Belanger & Thornton, 2013;Gil-Jaurena, Callejo-Gallego, & Agudo, 2017;Konstan et al., 2015;Macleod, Haywood, Woodgate, & Alkhatnai, 2014;Radford, Coningham, & Horn, 2015;Zhong, Zhang, Li, & Liu, 2016), only a few of them have investigated the influence of motivation on MOOC completion. Konstan, Walker, Brooks, Brown, and Ekstrand (2015) concluded that most of the reasons that learners enrolled in a MOOC, such as university/instructor-related reasons or access to educational institutions-related reasons, did not affect course completion, but that learners' self-reported intention of completing the MOOC was a significant predictor of course completion.
In consideration of the inconsistent findings for predictors of MOOC completion from the existing literature, this paper presents a MOOC completion model that includes relevant variables to identify the most useful predictors pertaining to MOOC completion. This model identifies relevant characteristics of MOOC completers and non-completers, which can further inform the design and development of future MOOCs.

Literature Review
Students' motivation for taking a MOOC has been identified as a crucial factor for course engagement, which keeps learners persisting in the course (Xiong et al., 2015). Motivation factors, which contribute to sustained student engagement, include interest in the topics (Dillahunt et al., 2014;Hew & Cheung, 2014;Kizilcec & Schneider, 2015), curiosity about MOOCs (Hew & Cheung, 2014;Zheng, Rosson, Shih, & Carroll, 2015), current job needs (Christensen et al., 2013), the opportunity to connect with others (Belanger & Thornton, 2013), preparation for future jobs (Kizilcec & Schneider, 2015;Zheng, Rosson, Shih, & Carroll, 2015), relevance to current academic programs, interest in earning a certificate, and interest in the professor or the institution that offers the MOOC (Kizilcec & Schneider, 2015). Xiong et al. (2015) categorize these motivations as intrinsic motivations (interest related), extrinsic motivations (external rewards related, e.g. earning course completion certificate), and social motivations (taking this course with friends and connecting with others). Upon finding out that intrinsic and extrinsic motivations are significant predictors of learner engagement, and learner engagement correlates positively with retention, Xiong et al. (2015) propose forming a student learning community and providing incentives (e.g., certificates) as motivation factors to enhance learner engagement and retention.
Plenty of studies suggest that the use of group work could improve learning interaction and engagement, and that it has potential to enhance learning in MOOCs (Arendale & Hane, 2014;Berger & Wild, 2016;Hiltz, 1998;Jones, 1997;Williams, Duray, & Reddy, 2006, Wen, 2016. By working with others in a MOOC, students could learn from and assist one another in the learning process (Yuan & Powell, 2013). In their study, Guàrdia, Maina, and Sangrà (2013) found that collaborative work and peer assistance and assessment were effective MOOC design principles. Kulkarni, Cambre, Kotturi, Bernstein, and Klemmer (2015) found that "the more geographically diverse the discussion group, the better the students performed" (p. 1126).
A number of grouping approaches in MOOCs have been implemented in recent years. These approaches can be summarized into two categories: random grouping and criteria-based grouping. Random grouping is done by assigning learners into groups randomly (Zheng, Vogelsang, & Pinkwart, 2015). Whereas criteriabased grouping is performed based on different grouping mechanisms. For instance, Wen (2016) formed teams based on the transactive discussion within a large community and further deployed an automated agent to support team discussion. Zheng, Vogelsang, and Pinkwart (2015) created MOOC groups based on learner's preferred collaboration media and demographic information, including gender, time zone, and language. Sinha (2014) proposed to assign MOOC students to teams based on their connections with other learners in a social network.
In addition to motivation and grouping factors, learner's intention for completing a MOOC was identified as a significant estimator of their actual completion of a MOOC Koller, Ng, Do, & Chen, 2013;Konstan et al., 2015). For instance, Koller, Ng, Do, and Chen (2013) concluded that learners with the intention of completing a MOOC achieved higher completion rates when compared to those who did not. Bonafini, Chae, Park, and Jablokow (2017) found that student's desire for certification had an amplifying effect on students' MOOC completion, as well as on the number of videos watched by the students. These studies inform us that a learner's commitment in completing the course plays an 143 importance role in terms of improving learner engagement and retention in MOOCs. In this sense, it seems that the higher level of goal commitment a learner sets for oneself when the tasks are achievable, the better performance the learner will achieve (Locke, 1982).
The existing literature lays the foundation for incorporating pertinent variables to build a MOOC completion model for a particular MOOC such as learner demographics, motivation for enrollment, intention of completion, and working in groups.

Methodology
The purpose of this paper is to develop a multiple binomial logistic regression model that distinguishes significant variables affecting MOOC completion. The completion level is treated as a binary dependent variable with the result of either completing the course or not. Independent variables include age, gender, education level, motivation for taking the MOOC, working in groups, and intentions of completing the course. Participants were recruited to work in small online groups by matching their grouping preferences, such as their preferred language, media to communicate, and intention of completing the course. Students who had group preferences that could not to be matched by their preferences were placed in the control group. This study investigates the following research questions: 1. What are the characteristics of MOOC learners who participated in this study? 2. What are the learners' preferences related to working in groups?
3. What are the learners' motivations for taking this MOOC? 4. Which demographics and motivational factors predict the probability of MOOC completion?

Participants
Participants in this study were recruited from a MOOC offered through the Coursera platform from July to August, 2014 (Jablokow, Matson, & Velegol, 2014). Prior to the beginning of the course, an invitation for participating in online groups was sent out to MOOC learners. Learners who responded with interest in working in online groups received a pre-course survey, which inquired about their demographic information, reasons for taking this course, and grouping preference, among other questions. Participants were assigned into groups following the order of their preferred language to communicate within a team, intention of completion, and mode of communication (synchronous text, asynchronous text, or synchronous video and audio) . Some of the synchronous groups were formed based on converted time zones. Participants whose grouping preference could not be satisfied or matched with others such as preferred language to speak in an online team or preferred time to work with others, were assigned into a control group. Students who were assigned to the control group received no instructional guidance or monitoring for group work.
After the online groups were formed, a general group work instruction email was sent out to the participants. In consideration of the large number of Chinese participants who volunteered for this grouping study, the email instruction was also translated into Chinese. Various online tools were suggested for different types of group communication such as MOOC discussion forums and email for learners' asynchronous communication, and Skype and QQ (a Chinese instant messaging tool) for learners' synchronous communication. Additionally, ZOOM (a video conferencing tool for large group discussions) was offered by the research team to learners for free use.

Data Sources
Pre-course survey. At the beginning of this course, a pre-course survey was sent to participants to collect their demographic information, such as gender, age, level of education, level of English proficiency, previous online learning experience, and employment status.
Post-course survey. At the end of this course, a post-course survey was sent to participants to gather feedback of their experiences of working in online groups in this MOOC.
Completion data. Learners in this MOOC were required to submit at least six assignments in order to obtain a certificate of completion. For learners who opted to earn a certificate of completion with distinction, twelve additional peer reviews were required. Learners who failed to meet these requirements were not awarded a completion certificate. Original course completion data was retrieved from Coursera with three levels of completion: none, normal, and distinction. These three levels of completion were recoded as a binary variable showing two levels of course completion: Complete (the combination of normal completion and completion with distinction) and Non-Complete.

Data Analysis
The pre-course survey data was exported from Qualtrics, students' completion records were collected through Coursera, and various data sets were retrieved and combined together in an SQL database. The data analysis and its graphical representation were computed using ArcMAP, SPSS, and R-Studio. R-Studio was used to run multiple binomial logistic regression models in order to identify the predictors that affect learners' MOOC completion. Within the model, MOOC completion is defined as a binary dependent variable, and all the independent variables are defined as categorical variables.
Independent variables were drawn from existing literature as shown in Table 1. We included learner demographics in our model as suggested by the research of Bayeck, Hristova, Jablokow, and Bonafini (2018), Breslow et al. (2013), Cisel (2014), and Engle, Mankoff, and Carbrey (2015), which include age, gender, education level, English proficiency, and employment status. We also included as parameter estimates: learners' motivations for taking MOOCs, as suggested in the research of Belanger and Thornton (2013), Brown et al. (2015), Dillahunt, Wang, and Teasley (2014), and Kizilcec and Schneider (2015).
Motivations for taking MOOCs included interest in the subject, interest in the institution and professor that provides the course, building social connection with others, employment opportunities, earning a certificate, and friends' taking the course. Other variables identified from the literature contained the intention of completing the course (Engle et al., 2015;Koller et al., 2013;Konstan et al., 2015) and 145 participation in online groups (Kulkarni, Cambre, Kotturi, Bernstein, & Klemmer, 2015;Sinha, 2014;Wen, 2016;Zheng, Vogelsang, & Pinkwart, 2015).

Learner Characteristics
To address our first research question "What are the characteristics of MOOC learners who participated in this study?" we analyzed participants' demographics. Demographics show that students who participated in this study (n = 655) came from all over the world (see Figure 1 for participants' locations on a world map).  Figure 1. Location of the participants.

Learner Grouping Preferences
To address our second research question "What are the learners' preferences related to working in groups?" we used the grouping preference question in the pre-course survey that asked participants to rank their preferences regarding working in groups by marking the most important factor as 1 and the least important as 9. Results synthesized in Table 3 show that participants' first preference for participating in online groups was to work with people whose native language was the same as theirs. Participants' second grouping preference was to be grouped with others who had similar intentions of completing the course (e.g., complete the whole course, most of the course modules and assignments, or none of those). Their third preference was to be grouped with others who had a similar availability to join group meetings. Although the researchers grouped learners according to their identified preferences, students indicated in their postcourse survey that many participants had difficulties in arranging online meetings due to the time zone differences and schedule conflicts.

149
To address our third research question "What are the learners' motivations for taking this MOOC?" we used the motivation question from the pre-course survey. The motivation question was stated as follows: "Please rate the importance of the following reasons for you to enroll in this course on a scale of 1-5 (1 as not at all important, 5 as absolutely critical) in the statements below." Statements listed included: "I am interested in taking a course from this particular institution;" "I am interested in taking a course from this particular professor(s);" I am interested in earning a certificate;" "I am interested in connecting with other students;" "I have friends taking this course;" "The course relates to my current academic program;" "The course relates to my current job;" and "The course will be helpful for me to get a new job." Results show that participants rated taking this course because of their friends as most the important reason, with a mean score of 4.2 as shown in Table 4. Other important factors that emerged from participants' responses were: because of the MOOC professors ( ̅ = 3.03), institution offering the MOOC ( ̅ = 2.37), and participants' personal interest ( ̅ = 2.27). Table 4 seems to suggest that learners tend to be more socially and extrinsically motivated since they enrolled in the course because their friends were also taking it.

Demographics and Motivation Factors Predicting the Probability of MOOC Completion
Stepwise binomial logistic regression was used to build answers to the fourth research question: "Which demographics and motivational factors predict the probability of MOOC completion?" In this procedure, an interactive process was used for variable selection. The investigators started by performing a saturated model to map out which factors may affect the probability of MOOC completion. Then, parameter estimates were removed when identified as nonsignificant (p-value greater than 0.05). After excluding these nonsignificant parameters, the model was refitted and the p-values of the remaining parameter estimates were rechecked to assure that all variables with significant p-value were included in the model. The lowest 150 Akaike Information Criterion (AIC) (Akaike, 1973) was used to decide for the model that contained the best predictor subset.
The saturated model contains demographics parameter estimates such as education level (Education), age (Age), gender (Gender), employment status (Employment: full time/part time/not working), and English proficiency (English_Level). The model also includes the parameter estimates: students assigned to work in groups according to their preferences (Groups), students' motivation for taking the MOOC such as personal interest (Personal_Int), interest in connecting with others (Connect_w_Others), course offered by a certain institution (Institution) or professor they like (Professor), relationship of MOOC content to their academic program (Academic_Pgm), relationship of MOOC content to their current job responsibilities (Current_Job), MOOC fostering a potential skill participants might need in their future job (Future_Job), intention of completion (Intent_Completion), participants' desire to earn a certificate (Earn_Certificate), and friends' participation in the same MOOC (Friends).
Results from the saturated model (Model 1) displayed in Table 5   The researchers reran the model with only the significant predictors labeled as Model 2 in Table 5. Results show all variables as significant (AIC = 718.12 and G 2 = 750.12) with exception of the variable Professor (p > 0.05), indicating that learners' desire of taking this MOOC with a specific professor is not a significant factor affecting course completion when compared to other factors such as student age, the institution hosting the MOOC, MOOC content related to the student's current academic program, and the student's intention to complete the course.
The investigators removed the variable Professor from the model and reran the analysis (Model 3). Results from multiple binomial logistic regression on Model 3 (Table 6)

151
Intent_Completion5 (p = 0.03448) as statistically significant when considering MOOC completion (AIC = 717.45 and G 2 = 750.12). Model 3 also presents an improvement of fit with a lower AIC when compared to previous models as shown in Table 5. As shown in Table 7, the odds of completing a MOOC for participants who are at Age5 (50 to 59 years old) over the odds of completing a MOOC for participants who are at Age1 (up to 19 years old) is exp (1.17022) 152 = 3.22, meaning that the probability of MOOC completion increases by a multiplicative factor of 3.22 for participants between the ages of 50 to 59 in comparison to participants less than 19 years old.. Likewise, the odds of completing a MOOC for participants who are at Age6 (above 60 years old) over the odds of completing a MOOC for participants who are at Age1 (up to 19 years old) is exp (1.40091) = 4.06, meaning that for older participants the probability of MOOC completion is even bigger, increasing by a multiplicative factor of 4.06. The odds of completing a MOOC for students who perceive that it is moderately important for the MOOC to be aligned with their academic program (Academic_Pgm3) over the odds of students who perceive that is not important at all for the MOOC be aligned with their academic program (Academic_Pgm1) is exp (0.53921) = 1.71. This means that each one-point increase in the scale of importance for an academic program is associated with the MOOC completion increasing by a multiplicative factor of 1.71.
The odds of completing a MOOC for students who strongly agree with the statement of intention to complete (Intent_Completion5) over the odds of students who indicated no intention to complete (Intent_Completion1) is exp (1.40575) = 4.08. This means that the probability of MOOC completion increases by factor of 4.08 for participants who are initially strongly committed with the intention to complete the course. The researchers also explored the interaction effect between the independent variables, however, none of these interactions were significant.

Discussion and Conclusion
This study shows that age, the institution hosting the MOOC, alignment with students' academic needs, and students' intention to complete the course can affect the probability of students' completion of a MOOC.
The results are in line with the literature (e.g., Morris et al., 2015;Schulze, 2014;Zhang et al., 2016) in showing that older participants tend to achieve a higher course completion rate. This study extends the literature by indicating that the age of participant relates to MOOC completion, and older students (age > 50 years old) present a higher probability of completing a MOOC when compared with young ones.
It also sheds light on the importance of MOOCs providing experiences that add to students' current academic experiences as well as the importance of MOOCs being hosted by institutions with high academic reputation. As the majority of MOOC students are college degree holders (Christensen et al., 2013;Despujol, Turró, Busquéis, & Cañero, 2014), it makes perfect sense that when students expect that a MOOCs content will add knowledge to their current academic experiences, it increases their probability of completing the MOOC. This result adds to the literature that points out that students tend to register in MOOCs to learn new things, gain understanding of the subject matter, and to develop professional skills (Belanger & Thornton, 2013;Christensen et al., 2013).
In order to fulfill students' desire to register in a MOOC that is aligned with their academic needs, this study suggests a focus on making MOOC goals and content as clear as possible for its audience. By doing so, a MOOC can attract students who are looking for an experience that is aligned with their academic expectations, avoiding simply curious enrollments, which may diminish subsequent students' dropout.
With this, MOOC providers should explicitly inform their potential students about the characteristics of MOOC content and how students may use the knowledge that will be acquired in that MOOC.

154
Based on the idea that "MOOCs enable learning with the best" (Davis et al., 2014, p. 6), it is intuitively known that an institution's reputation may motivate students' enrollment in MOOCs. However, this study advances the field by showing how much the reputation of an institution has the potential to affect the probability of students completing a MOOC. From an alternate perspective, it is also possible that the creation of a MOOC may enhance an institution's reputation as reported by Jansen and Schuwer (2015).
In addition to discussing the variables that relate to MOOC completion, it is also important to discuss the examined variables that did not influence the likelihood of completion. Results from the multiple binomial regression model show that variables such as gender, student personal interest, connection with others, friends and groups did not play a role on the probability of students' completion in this MOOC. Although "taking course because of friends who also took it" was rated by learners as the most important motivation factor for enrolling in this course, it did not appear as a significant predictor of MOOC completion. This factor may boost MOOC registration as suggested by Schulze (2014), but not MOOC completion as reported in this study.
Another surprising result from this study is the lack of effect on completion when students work in online groups. This result contradicts the literature (e.g., Kulkarni, et al., 2015;Williams et al., 2006), and to understand this result it is important to look at the design of the group work implemented in this MOOC.
The lack of support and monitoring of students' group work process may explain why group work failed to increase MOOC completion rates. Moreover, students' group work activities were not facilitated or assessed by MOOC instructors or researchers.
Assigning learners to work in groups in MOOCs presents many challenges because of the heterogeneity in learner population, such as differences in education levels, cultural backgrounds, and study schedules.
There seems not to be a perfect grouping mechanism that satisfies the needs of all learners. In the next stage of our research, we hope to record learners' interactions and learning behaviors as they engage in group activities in various social media applications (e.g. Skype and discussion forums) and use these data to understand how learners could benefit from MOOCs. These additional data could also help us to improve the grouping interventions, and eventually provide a better MOOC experience to the learners.
It is also worth noting that some learners didn't meet online with others regardless of being assigned into groups (as reported by participants in their post-course survey), a factor which may have contributed to the lack of effect that groupwork had on course completion. Feedback provided by students in the post-course survey informed the lack of monitoring students' group activity. We hypothesize that this could be one of the reasons why assigning learners to work in groups did not work in this study. Thus, further implications of this study suggest to MOOC instructors assigning teaching assistants (TAs) and/or group leaders to student groups as ways of providing assistance and monitoring their work process. These TAs could be recruited from learners who have completed the MOOC previously and are willing to assist others in taking the course. Another way to foster participants' group work would be assigning roles to each group member such as group leader and meeting coordinator. Meanwhile, data on the communication and interaction among team members in both synchronous and asynchronous media will be collected and analyzed to inform the design and facilitation of group work in the next phase of this grouping research. In the end, the authors expect that MOOC instructors and MOOC providers should be aware of students' motivations for 155 enrollment and the demographics that impact the likelihood of students' MOOC completion so that this information may be used to shape the course content and format to better support learners.

Limitations
Although this study identifies variables that impact MOOC completion, it is not possible to infer the reasons why those variables are significant. We can speculate that age plays an important role in affecting MOOC completion since older people may have more time to take the course and may have better time management and self-regulation skills. However, more investigation is needed to gain a deeper understanding of the reasons why older people have a higher probability of completing MOOCs.
Another limitation of this study is its small sample size compared to the large number of students who enrolled in this MOOC. Because of effect size and subject taught in this MOOC, the findings may not be generalizable to other MOOCs. This effect size could be overcome with studies comprising multiple MOOC cohorts. In the next phase of this study, the investigators aim to implement follow up interviews with students to collect feedback about their group work process and suggestions on how to improve their group work experiences. Future plans also include researching indicative variables of course completion as described in Pursel, Zhang, Jablokow, Choi, and Velegol (2016) such as course activities, number of videos watched, and number of posts made in the discussion forum as predictors of MOOC completion.