Initial Trends in Enrolment and Completion of Massive Open Online Courses

The past two years have seen rapid development of massive open online courses (MOOCs) with the rise of a number of MOOC platforms. The scale of enrolment and participation in the earliest mainstream MOOC courses has garnered a good deal of media attention. However, data about how the enrolment and completion figures have changed since the early courses is not consistently released. This paper seeks to draw together the data that has found its way into the public domain in order to explore factors affecting enrolment and completion. The average MOOC course is found to enroll around 43,000 students, 6.5% of whom complete the course. Enrolment numbers are decreasing over time and are positively correlated with course length. Completion rates are consistent across time, university rank, and total enrolment, but negatively correlated with course length. This study provides a more detailed view of trends in enrolment and completion than was available previously, and a more accurate view of how the MOOC field is developing.


Article abstract
The past two years have seen rapid development of massive open online courses (MOOCs) with the rise of a number of MOOC platforms. The scale of enrolment and participation in the earliest mainstream MOOC courses has garnered a good deal of media attention. However, data about how the enrolment and completion figures have changed since the early courses is not consistently released. This paper seeks to draw together the data that has found its way into the public domain in order to explore factors affecting enrolment and completion. The average MOOC course is found to enroll around 43,000 students, 6.5% of whom complete the course. Enrolment numbers are decreasing over time and are positively correlated with course length. Completion rates are consistent across time, university rank, and total enrolment, but negatively correlated with course length. This study provides a more detailed view of trends in enrolment and completion than was available previously, and a more accurate view of how the MOOC field is developing.

Introduction
In the past two years, massive open online courses (MOOCs) have entered the mainstream via the establishment of several high-profile MOOC platforms (primarily Coursera, EdX, and Udacity), offering free courses from a range of elite universities and receiving a great deal of media attention (Daniel, 2012). 2012 has been referred to as 'the year of the MOOC' (Pappano, 2012;Siemens, 2012), and some herald this as a significant event in shaping the future of higher education, envisioning a future where MOOCs offer full degrees and 'bricks and mortar' institutions decline (Thrun, cited in Leckart, 2012).
There are clearly great potential individual and societal benefits to providing universitylevel education free of some of the traditional barriers to participation in elite education, such as cost and academic background. However, it is not clear the extent to which MOOCs provide these benefits in practice. MOOCs may favour those who are already educationally privileged; Daphne Koller of Coursera has stated that the majority of their students are already educated to at least undergraduate degree level, with 42.8% holding a bachelor's degree, and a further 36.7% and 5.4% holding master's and doctoral degrees . A further study of Coursera students enrolled in courses provided by the University of Pennsylvania indicates a greater dominance of highly educated students, 83.0% of respondents being graduates and 44.2% being educated at the postgraduate level (Emanuel, 2012). The author concludes that MOOCs are failing in their goal to reach disadvantaged students who would not ordinarily have access to educational opportunities (Emanuel, 2013). In order to succeed in a MOOC environment, higher digital literacy may be required of students (Yuan & Powell, 2013), potentially exacerbating pre-existing digital divides. In theory MOOCs remove geographical location as a boundary to access, although a lack of internet access may prevent this from being realized in practice (Guzdial, 2013).
Although smallerscale, connectivist MOOCs have existed for several years, the development of largerscale MOOCs offered by elite institutions has propelled MOOCs into the mainstream. The earliest and perhaps most highly cited example is the Stanford AI class, which attracted 160,000 students (20,000 of whom completed the course) when it ran in autumn 2011 (Rodriguez, 2012). However, while this example is often used, it is unlikely to be representative of how the field is developing. A survey undertaken by The Chronicle of Higher Education in February 2013 suggested that the average MOOC enrolment is 33,000 students, with an average of 7.5% completing the course (Kolowich, 2013). Detailed studies of particular courses have emphasized that those who enroll upon courses have a wide variety of motivations for doing so (Breslow et al., 2013;Koller, Ng, Do, & Chen, 2013); however motivation does not predict whether a student will complete a course (Breslow et al., 2013). In examining completion and engagement with courses, studies have focused upon characterizing types of learners (Kizilcec, Piech, & Schneider, 2013;. Limitations of these studies are that they focus upon a small number of early MOOCs, and ascribe course completion primarily to student choice and motivation. There is a gap in the research literature here about what could be learnt about characteristics of courses themselves and their effect upon enrolment and completion, which this study sought to explore. Six-figure enrolment statistics have generated a good deal of interest in MOOCs in the higher education sector, and are frequently conflated with active participation or completion. However, the earliest courses are the most frequently cited examples and may not be representative of how the phenomenon is developing, and the extent to which enrolment numbers are indicative of completion has not been explored comprehensively. These issues are obscured to an extent by a lack of consistent data being made open to those outside of the MOOC platforms. For example, the Coursera data export policy gives individual institutions control over the data that is released about courses (Coursera, 2012), and in practice the extent of data sharing is highly variable and ad hoc. Now, over 18 months on from the advent of the large MOOC platforms, this paper seeks to synthesise the data that has found its way into the public domain in order to address some of the very basic questions associated with MOOCs. How massive is 'massive' in this context? Completion rates are reputedly low, but how low? From the available data, can we learn anything about factors which might affect enrolment numbers and completion rates?

Methods
The approach taken here drew together a variety of different publicly available sources of data online to aggregate information about enrolment and completion for as many MOOCs as possible. Information about enrolment numbers and completion rates were gathered from publicly available sources on the Internet. Given the media attention which MOOCs have garnered, and their 'massive' nature, there is a good deal of publicly available information to be found online, including news stories, university reports, conference presentations, and MOOC student bloggers. Issues of reliability associated with using this data are addressed below. Enrolment and completion figures were selected as the data to be collected for the courses, as these are the metrics which are most commonly available. Completion in this sense was defined as the percentages of students who had satisfied the courses' criteria in order to gain a certificate. The exact activities required to achieve this vary according to course. Where possible, data was also recorded about the number of 'active users' in courses. Information about the number of active users was available for 33 courses, although some did not provide any definition of the term. Those courses who did define active users characterized them as students who actively engaged with the course material to some extent (as opposed to those who enrolled but did not use the course at all). For example, this includes having logged in to a course, attempted a quiz, or viewed at least one video. Data was also collected about the date a course began, the course length in weeks, and university ranking (using the Times Higher Education World Rankings; THE, 2013) in order to explore whether these factors affect enrolment and completion.
The enrolment and completion data was collected in two ways: via internet searches and crowdsourcing information from students who participated in courses, by appealing via social media. Students contributed data which had been shared with them by the course instructor to the author's blog (Jordan, 2013). This yielded information about enrolment numbers for a total of 91 courses (32.6% of total potential sample), and completion for 42 courses (15.1% of total). For transparency, the sources used for all data items are included here. Details of courses for which only enrolment data was available are shown in Table 1; details of courses for which completion data was found are shown in Table 2  Data analysis was conducted using linear regression carried out with Minitab statistical software. Linear regression was chosen as the approach to analysis because at this stage the aim of the research was exploratory, to identify potential trends rather than being explanatory and seeking to fit a model. This would be a valuable goal for follow-up research particularly if more consistent data became available for MOOCs more broadly.
Linear regression analyses were carried out individually according to different factors of interest rather than as a single multiple regression due to issues of data consistency and availability; that is, data is not available for every field in Tables 1 and 2 for every course, so n varies according to different tests (see Results and Analysis section). Rather than discarding courses for which the full spectrum of data was not available and in order to gain the greatest insight possible into the different factors, a series of individual regression analyses were carried out.

Limitations
There are a number of limitations which must be borne in mind with the approach taken by this study, including issues of validity of data and reliability of the research instruments used.
In terms of validity, it should be noted that the accuracy of figures varies according to sources, with some institutions releasing highly accurate figures and others (particularly when releasing enrolment data through the press) are rounded figures. This reflects the fact that MOOC courses do not consistently release this information into the public domain, and most of the courses that would have been eligible for inclusion (67.4%) have not released any data. Of the institutions or instructors choosing to make data available, bias may be introduced according to their motivations for publicizing this information, which are unknown. There is also a degree of trust involved in the information provided by student informants via the blog.
It should be emphasized that the study sought to be exploratory in nature, identifying trends of interest in the data as a starting point for further research but not seeking to explain or model the phenomenon. Reliability of the approach is less contentious as the data have been collected via several rounds of internet searches during the data collection period (February 13 th to July 22 nd 2013) and shown in full in Tables 1 and 2 should others wish to reproduce the tests or carry out alternative analyses. By collating data 'in the open' at the author's blog (Jordan, 2013), this offered a platform for others (including course leaders) to scrutinize the data and provide more accurate figures in some cases.

Trends in Total Enrolment Figures
Total enrolment numbers draws upon the data in both Tables 1 and 2   A regression analysis was carried out, prior to which the data was subject to a Box-Cox transformation as the residuals do not follow a normal distribution. Regression analysis showed that date significantly predicted total enrolment figures at the 95% significance level by the following formula: ln(Enrolled) = 104.249 -0.00226915*StartDate (R 2 = 0.1719, p < 0.001). The relationship is a negative correlation, indicating that as time has progressed, enrolment figures have decreased. The relationship is relatively weak (time as a factor accounts for 17.2% of the variance observed, as R 2 is a measure of the fraction of variance explained by the model; Grafen & Hails, 2002), although the sample is sufficiently large that this is statistically significant (critical R 2 values decrease according to sample size, with an n of 91 being relatively large; Siegel, 2011). This highlights that a focus upon figures from early courses is misleading and not representative of how the field is developing.
The relationship between course length and total enrolments was also considered, and found to demonstrate a positive correlation between course length and total enrolment ( Figure 3).  Following a Box-Cox transformation, regression analysis showed that course length significantly predicted (at the 95% significance level) total enrolment figures by the following formula: ln(Enrolled) = 10.2248 + 0.0491206*Length (R 2 = 0.0545, p = 0.029). The correlation between the variables is positive, indicating courses that are longer attract a greater number of enrolments. The relationship is relatively weak, accounting for 5.5% of the variance observed, although the sample size is sufficiently large for this to be a statistically significant relationship. This positive correlation may suggest that prospective MOOC students prefer more substantial courses (however, see also the relationship between course length and completion rates).
In addition, the relationship between university ranking and enrolment figures was considered, although it was not found to be significant at the 95% level.

Trends in Completion Rates
Completion rates were calculated as the percentage of students (out of the total enrolment for each course) who satisfied the criteria to gain a certificate for the course. This information was available for 39 courses in the sample. Completion rates range from 0.9% to 36.1%, with a median value of 6.5% ( Figure 4). The data is skewed, so the higher completion rates are not representative, with completion rates of 5% being typical.  Figure 4. Histogram of completion rates for the sampled courses (n = 39).
As the residuals were not normally distributed, a Box-Cox transformation was again carried out before conducting regression analysis. No significant relationships were found between completion rate and date, university ranking, or the total number of students enrolled. Completion rates remained consistent across these factors. A significant negative correlation was found however between completion rate and course length, shown in Figure 5. Regression analysis showed that course length significantly predicted completion rate by the following formula: ln(PercentTotalCompleted) = 2.64802 -0.100461*CourseLength (R 2 = 0.2373, p = 0.002). The correlation in this case is negative, indicating that a lower proportion of students complete longer courses. Course length accounts for 23.4% of the variance observed, and the correlation is significant at the 95% significance level. While considering completion rate as the percentage of the total enrolment that complete the course is the type of data that is most readily available, a criticism of this characterization is that many students may enroll without even starting the course, and that completion rates would be better characterized as the proportion of active students who complete. This level of information is available for a subset of the sampled courses (39 courses with a number of active students and total enrolment; 33 courses with data about the proportion of active students who complete).
The number of active students is remarkably consistent as a proportion of the total enrolment of the course (with approximately 50% of the total enrolment becoming active students). This is shown graphically in Figure 6. Regression analysis showed that total enrolment significantly predicted the number of active students by the following formula: Active = 0.543336*Enrolled (R 2 = 0.9556, p < 0.001). The correlation is strong (accounting for 95.6% of the variance) and positive, showing a consistent relationship  When calculating completion rate as the percentage of active students who complete the course, completion rates range from 1.4% to 50.1%, with a median value of 9.8% ( Figure  7). While completion rates as a percentage of active students span a wider range than completion rates as a percentage of total enrolments, there remains a strong skew towards lower values. The differences here would be worthwhile to explore in further detail to explore features of course design that may account for the wider variation observed.  Figure 7. Histogram of completion rates as a proportion of active students for the sampled courses (n = 39).
No significant relationships were found between completion rate as a proportion of active users and date, university ranking, total enrolment, or (in contrast to completion rate as a percentage of total enrolment) course length. This may suggest that enrolled students may be put off starting longer courses, but this is less of an issue for those who do become actively engaged in the course.

Conclusions
The findings here demonstrate changes in the field since the concept of MOOCs entered the mainstream and the inception of the major MOOC platforms. It is misleading to invoke early enrolment and completion figures as representative of the phenomenon; six-figure enrolments are atypical, with the median average enrolment being 42,844 students, and decreasing over time as the number of courses available continues to increase. Although this is lower than the earliest examples, it emphasizes that it is inappropriate to compare completion rates of MOOCs to those in traditional bricks-andmortar institution-based courses.
The majority of courses have been found to have completion rates of less than 10% of those who enroll, with a median average of 6.5%. The definition of completion rate used here is the percentage of enrolled students who satisfied the courses' criteria in order to Initial Trends in Enrolment and Completion of Massive Open Online Courses Jordan Vol 15 | No 1 Feb/14 151 earn a certificate, and this definition was used because it is the type of information that is most frequently available. There are potentially many ways in which MOOC students may participate in and benefit from courses without completing the assessments. The wider range of completion rates (while still remaining quite low overall, with a median of 10%) observed when defining completion as a percentage of active learners in courses is interesting and warrants further work to better understand the reasons why those who become engaged initially do or do not complete courses.
This is not to say, however, that completion rates should be ignored entirely. Looking at completion rates is a starting point for better understanding the reasons behind them, and how courses could be improved for both students and course leaders. For example, the relationship between enrolments, completion, and course length is an interesting issue for MOOC course design, balancing the higher enrolments with the lower completion rates of longer courses. Figures about how many students achieved certificates obscure how many students attempted to gain a certificate but did not meet the criteria. Given that MOOCs are offered free of educational prerequisites, striving to improve teaching on courses so that students who wish to complete are assisted in doing so is an important pedagogical issue. The extent of understanding that can be gained outside of running a MOOC will continue to be constrained however as long as the release of detailed data about courses is limited.
This study has only considered relationships between enrolment and completion and a small number of general factors for which data is available publicly; various other factors would be worthwhile to explore. For example, it would be useful to look at in terms of the underlying pedagogy, whether differences emerged based on how transmissive (so-called 'xMOOCs') or connectivist ('cMOOCs') courses are. The impact of different assessment types, being necessarily linked to the criteria for achieving a certificate of completion, would also be a worthwhile area to consider in further detail. Along with the studies discussed in the introduction which focus upon links between student demographics or behaviours and completion (Breslow et al., 2013;Kizilcec et al., 2013;, a limitation of the approach used here is that the data neglects the student voice. While these approaches can identify trends and patterns, they are unable to explore in detail the reasons behind the trends observed.