A Systematic Analysis and Synthesis of the Empirical MOOC Literature Published in 2013–2015

A deluge of empirical research became available on MOOCs in 2013–2015 and this research is available in disparate sources. This paper addresses a number of gaps in the scholarly understanding of MOOCs and presents a comprehensive picture of the literature by examining the geographic distribution, publication outlets, citations, data collection and analysis methods, and research strands of empirical research focusing on MOOCs during this time period. Results demonstrate that (a) more than 80% of this literature is published by individuals whose home institutions are in North America and Europe, (b) a select few papers are widely cited while nearly half of the papers are cited zero times, and (c) researchers have favored a quantitative if not positivist approach to the conduct of MOOC research, preferring the collection of data via surveys and automated methods. While some interpretive research was conducted on MOOCs in this time period, it was often basic and it was the minority of studies that were informed by methods traditionally associated with qualitative research (e.g., interviews, observations, and focus groups). Analysis shows that there is limited research reported on instructor-related topics, and that even though researchers have attempted to identify and classify learners into various groupings, very little research examines the experiences of learner subpopulations.


Résumé de l'article
A deluge of empirical research became available on MOOCs in 2013-2015 and this research is available in disparate sources. This paper addresses a number of gaps in the scholarly understanding of MOOCs and presents a comprehensive picture of the literature by examining the geographic distribution, publication outlets, citations, data collection and analysis methods, and research strands of empirical research focusing on MOOCs during this time period. Results demonstrate that (a) more than 80% of this literature is published by individuals whose home institutions are in North America and Europe, (b) a select few papers are widely cited while nearly half of the papers are cited zero times, and (c) researchers have favored a quantitative if not positivist approach to the conduct of MOOC research, preferring the collection of data via surveys and automated methods. While some interpretive research was conducted on MOOCs in this time period, it was often basic and it was the minority of studies that were informed by methods traditionally associated with qualitative research (e.g., interviews, observations, and focus groups). Analysis shows that there is limited research reported on instructor-related topics, and that even though researchers have attempted to identify and classify learners into various groupings, very little research examines the experiences of learner subpopulations.

Introduction
The term Massive Open Online Course (MOOC) describes an evolving ecosystem of online learning environments that encompass a spectrum of course designs (Rodriguez, 2012). Between 2013 and 2015 a deluge of empirical research became available on the topic, and this research is available in disparate sources ranging from professional journals in a variety of disciplines, through annual conference proceedings, to the proceedings of newly-developed conferences and workshops focusing specifically on MOOCs (e.g., the Learning at Scale conferences). While the MOOC phenomenon has been subject to numerous interpretations in the mass media (Kovanović, Joksimović, Gašević, Siemens, & Hatala, 2015;Selwyn, Bulfin, & Pangrazio, 2015) and some analyses of the the state of the field have been published (e.g., Bayne & Ross, 2014;Ebben & Murphy, 2014), researchers currently lack a systematic synthesis of the empirical literature published on the topic. A collective research effort is required to fully understand the impact of MOOCs (Reich, 2015), and researchers will benefit from analyses and syntheses of the literature, especially because (a) the field has developed rapidly and sparked extensive conversations pertaining to education and educational technology, (b) meaningful research results appear to be sparse (Jona & Naidu, 2014), and (c) researchers studying MOOCs reside in diverse disciplines (Veletsianos & Shepherdson, 2015).
The goal of this paper is to address a number of gaps in the scholarly understanding of MOOCs and present a comprehensive picture of the literature by examining the geographic distribution, publication outlets, citations, data collection and analysis methods, and research strands of empirical research focusing on MOOCs. We tackle this goal by reviewing relevant literature and situating this study in the context of prior literature; presenting our research questions and describing the methods used to collect data; describing the data analysis methods used to answer each research question; and presenting our results. We conclude by discussing findings and making recommendations for researchers studying MOOC-related topics.

Review of Relevant Literature
We divide past literature in two sections. The first section examines and summarizes prior previous syntheses of the MOOC literature. The second section identifies specific gaps in the literature that are addressed by this paper.

Past Systematic Analyses of the MOOC Literature
A number of other researchers have attempted to analyze the MOOC literature, most notably, Ebben and Murphy (2014), Hew and Cheung (2014), Jacoby (2014), Kennedy (2014), and Liyanagunawardena, Adams, and Williams (2013). These reviews have focused on diverse aspects of the literature. For example, Hew and Cheung (2014) examined students' and instructors' perspectives, while Jacoby (2014) focused on the evidence for the role of MOOCs as a disruptive force. Despite this diversity between individual reports, a number of themes emerge across some or all of these reviews. We summarize the most salient of these themes: distinctions between cMOOCs and xMOOCs, impacts of MOOCs on education, demographics of MOOC users, and challenges for MOOCs.

Distinctions between cMOOCs and xMOOCs.
Each of the five previous reviews of the MOOC literature we identified made note of the distinction between two strands of MOOCs: cMOOCs and xMOOCs. cMOOCs are described as being "based on principles of connectivism, openness, and participatory teaching" (Jacoby, 2014, p. 76), and "[emphasizing] human agency, user participation, and creativity through a dynamic network of connections afforded by online technology" (Ebben & Murphy, 2014, p. 333). By contrast, xMOOCs are described as "follow[ing] a cognitivist-behaviorist approach" (Hew & Cheung, 2014, p. 50) and resemble "traditional teacher-directed course[s], yet automated, massive, and online" (Kennedy, 2014, p. 8). Early MOOCs tended to follow the cMOOC model, whereas more recently the number of xMOOCs delivered has been growing rapidly. This chronological categorization of cMOOCs and xMOOCs led Ebben and Murphy (2014) to describe them as two distinct phases of the MOOC phenomenon, while Kennedy (2014) noted that most of the nascent research on MOOCs had necessarily focused on the cMOOC variety. Most of the reviews focus on the philosophical (e.g., different approaches to openness; see next theme) and practical (e.g., different uses of technology; different forms of assessment) differences in the creation and delivery of cMOOCs and xMOOCs.
One distinction between cMOOCs and xMOOCs that was discussed in a number of the reviews was the concept of openness. Jacoby (2014) and Kennedy (2014)  course materials, open access to courses, and the "locus and practice of knowledge acquisition and production" (p. 337). These authors suggest that the last of these is a hallmark of cMOOC philosophy, whereas the first two are common to both cMOOCs and xMOOCs.
However, the review literature also indicates that there are common elements cMOOCs and xMOOCs share. For instance, instructors in both cMOOCs and xMOOCs tend to provide course outlines describing the general structure of the course. Where they differ in this respect is in the fact that the content to fill this structure is generally provided by the instructor for xMOOCs, but by the students for cMOOCs (Hew & Cheung, 2014). Liyanagunawardena et al. (2013) found that the literature published in 2008-2012 did not clearly define the different kinds of MOOCs, and Ross, Sinclair, Knox, and Macleod (2014) note that the differences between cMOOCs and xMOOCs are unclear. For the purposes of this paper, we will view MOOCs as an evolving ecosystem of online learning environments featuring open enrollment, characterized by a spectrum of course designs ranging from networks of distributed online resources (cMOOCs) to structured learning pathways centralized on digital platforms (xMOOCs).

Impacts of MOOCs on education.
A further issue discussed in many of the reviews is the potential impact of MOOCs on education more broadly. Jacoby (2014) specifically focuses on the disruptive potential of MOOCs, and so a large portion of her review is relevant to this issue. For instance, she identifies characteristics of some MOOCs such as their size, automation in grading, and their openness (particularly with regard to cMOOCs) as factors with the potential to affect approaches to teaching and learning. The size and openness of MOOCs are also highlighted by Kennedy (2014) in terms of their potential to " [disrupt] conventional thinking about the role, value, and cost of higher education" (p. 9). Ebben and Murphy (2014) discuss semantic shifts in the discourse around MOOCs (e.g., referring to students as participants) and suggest these could imply a diminution of the authority and importance of the educational leader (now instructor or facilitator rather than professor). On an institutional level, Jacoby describes impacts the spread of MOOCs may have on the business models of universities. She discusses the potential for new entrants to the higher education market to provide a product that is a suitable substitute for existing models of educational delivery, but also suggests that the collaboration of traditional institutions in creating and disseminating MOOCs may undermine this substitution.
Demographics of MOOC users. Early research on MOOCs was in the form of institutional reports, and these frequently reported learner enrollments and demographics (Gasevic, Kovanovic, Joksimovic, & Siemens, 2014). Some of the literature reviews identified presented information about the demographic characteristics of MOOC users as well. Ebben and Murphy (2014)  States. As a counterpoint to this, Liyanagunawardena et al. (2013, p. 217) state that in the research that has presented demographic information, "a large majority of participants were from North America and Europe," with a small minority being from Asia, South East Asia, or Africa. These authors suggest that the reasons for this may be technological and linguistic.
Challenges for MOOCs. All of the reviews identify challenges or potential challenges for MOOCs to overcome. One of the most salient of these relates to course completion. Ebben and Murphy (2014) review research suggesting that completion rates in MOOCs are less than 10%. They suggest that this may result from the fact that participation in MOOCs is free, leading users to participate in activities that are of interest to them without necessarily completing all parts required to complete a course.
However, Hew and Cheung (2014) are less sanguine about participants' reasons for non-completion, identifying the following list of reasons as relevant: "a lack of incentive, insufficient prior knowledge (e.g., lack of math skills), a lack of focus on the discussion forum (e.g., off-track posts), failure to understand the content and [having] no one to turn to for help, ambiguous assignments and course expectations, and a lack of time due to having other priorities and commitments to fulfil" (p. 49). Other than course noncompletion, other challenges mentioned in the reviews include economic challenges (e.g., the high cost of running a MOOC, or lack of a business model; Hew & Cheung, 2014;Jacoby, 2014), limitations of mass teaching methods (Kennedy, 2014), accreditation (Liyanagunawardena et al., 2013), and the assessment of complex writing such as essays (Ebben & Murphy, 2014).

Gaps in the Literature
While prior reviews of the literature provide a useful survey of the field, none has focused exclusively on empirical literature in MOOCs. This paper represents the first effort to review the empirical literature on MOOCs for a particular time period to understand various structural components of it, and as such fills a gap in the literature overall. While early writing on MOOCs was primarily conceptual (Kennedy, 2014), this is no longer the case, and the field will benefit from an early review of the status of the evidence-based literature. Furthermore, this research addresses five specific gaps that we identified in our examination of the current scholarly literature. These gaps can be resolved by a systematic review of the literature, and they are described next.
Geographic distribution. MOOCs have often been proposed as democratizing vehicles intended to provide free or inexpensive education. Liyanagunawardena et al. (2013), however, indicate that research arising predominantly from Western authors will largely serve countries where learners have access to digital technologies, understand the language, and identify with a Western learning culture.  Gasevic et al. (2014) found that the geographic distribution of authors was heavily concentrated in North America, Europe, and Asia with 96% of accepted proposals originating from those three regions. These results are similar to findings pertaining to distance education in general. For example, in a review of the research published in distance education journals in , Zawacki-Richter, Bäcker, and Vogt (2009 found that more than 80% of the articles are authored by individuals from five countries (USA, Canada, UK, Australia, and China). We were unable to identify further research examining the geographic location of MOOC authors.
Publication outlets. Even though the three ways that all disciplines predominantly use to disseminate research findings appear to be via journal articles, book chapters, and conferences (Sparks & House, 2005), the choice of publication outlets varies significantly across disciplines (Kling & McKim, 2000). Because the MOOC phenomenon has gained interest from academics in numerous disciplines, we were interested in examining which communicative forums were used by researchers conducting MOOC  (Saebø, Rose, & Flak, 2008), as well as due to the fact that some journals and conferences published special issues that focused on MOOCs, we expect to see (a) conferences proceedings, and (b) journals that have published special issues on MOOCs, featuring prominently.
Citations. Scholars use various tools to estimate the reach and impact of scholarship. Paper citation counts are frequently used for this purpose, and even though they are imperfect metrics of impact (e.g., they do not distinguish between positive and negative citations [Smeyers & Burbules, 2011;Togia & Tsigilis, 2006]), they are helpful as a way to begin examining the literature in the field, and identifying papers that, for one reason or another, are popular. In analyzing the distance education literature, Bozkurt et al. (2015) also argued that citation analyses may provide a reference guide and reading list to examine the field. Gasevic et al. (2014)

examined the papers cited most frequently in the MOOC Research
Initiative submissions and suggested that the most cited papers appear to be those that were most relevant to the call for awards. This investigation fills a gap in the scholarly literature by presenting the most highly cited papers in the field at the time of writing.
Data collection and analysis methods. The data collection and data analysis methods of MOOC research are areas that are poorly investigated in prior literature. The one paper (Gasevic et al., 2014) that examined research methods in the area categorized MOOC research proposals submitted for funding as using mixed (42.3%), quantitative (33.3%), or qualitative (24.4%) methods. While such a categorization provides a useful picture of the research in the field, it does not distinguish between data collection and data analysis methods. This is problematic because the type of data collected does not necessarily reflect the analytic methods used. For instance, qualitative data may be converted to quantitative data (e.g., counting the number of times a participant expresses a negative feeling about a course). Distinguishing between data collection and data analysis methods will add nuance to our understanding of how researchers have come to know what they know about MOOCs. While Bates (2014) highlighted the diversity of research methodologies that existed in a special issue on MOOCs, Veletsianos, Collier, and Schneider (2015, p. 574) argued that "ease of access to large data sets from xMOOCs offered through an increasing number of centralized platforms has shifted the focus of MOOC research primarily to data science and computational methodologies" and claimed that "the MOOC phenomenon experienced a surge of research using quantitative, clickstream and observational data." By investigating data collection and data analysis methods, this research will be able to empirically validate this claim.
Research strands. Liyanagunawardena et al. (2013) and Gasevic et al. (2014) examined the focus of MOOC-related literature, though those reviews did not focus exclusively on empirical research.
By identifying the major research strands in the empirical literature, this paper will enable researchers to ascertain the areas that have attracted the (a) greatest attention, and (b) least attention in the field. By identifying these two areas, we can describe the interests of researchers, the areas that may need greater attention, and the areas in which scholarly understanding of MOOCs is grounded on an evidence base.
Liyanagunawardena and colleagues categorized the papers they identified into eight categories:

Research Questions
To address these gaps in the literature, we pose the following research questions (RQ), each corresponding to a particular gap:

Methods
In this section we describe the approaches we took to answer the research questions. We describe the systematic methods we used to gather literature (data collection) and the analytic methods we used to examine the literature corpus we gathered (data classification and analysis).

Data Collection
Literature discovery searches were conducted using the key words "MOOC" or "Massive Open Online Course." To be included in the corpus, each identified document ought to focus on MOOCs, and ought to have been (1) empirical, (2) published in a peer-reviewed journal, in conference proceedings, or in Educause Review 2 , (3) published or was available online as in press between January 2013 and January 2015, and (4) written in English. We defined empirical papers as those that gathered and analyzed primary or secondary data in their investigation. Using this definition, conceptual and theoretical papers did not meet the inclusion criteria. The majority of the papers that we discovered in the literature search were not empirical (and were thus excluded).
Three trained researchers engaged in the literature discovery process. As each individual encountered a paper, she or he examined its abstract to determine whether it fit the inclusion criteria. If a determination could be made by examining the abstract, the document was added to a shared computer folder. If no determination could be made by examining the abstract, the paper was downloaded and the full paper was examined. All identified papers were examined by two researchers to ensure consensus that they fit the inclusion criteria. Even though Educause Review is a professional magazine (i.e. not a journal and neither a conference), it was included here because it was observed that it published empirical papers on MOOCs. While the quality of professional magazines may vary widely, Educause Review is regarded highly by educational technology scholars (Perkins & Lowenthal, in press).

Journal of Distance Education, the British Journal of Educational Technology, Distance Education, and
the Journal of Online Learning and Teaching. The first three journals are indexed by Scopus and thus any papers that fit the inclusion criteria would have already been identified. The Journal of Online Learning and Teaching was not indexed by Scopus at the time of writing. We also examined this journal using the search criteria described above and identified seven papers that fit the inclusion criteria. These were added to our corpus.
Next, we used the Summon search engine. Summon is a one-stop gateway to institutional resources.
While it is difficult to replicate Summon searches (each institution will likely have a different collection of databases feeding into its discovery layer), we were interested in collecting all the possible available literature on MOOCs, and saw Summon as yet another way to discover such literature. Summon generated 1337 results and out of those, we exported 505 results for further analysis. Of those, 10 new papers fit our inclusion criteria and were added to the corpus.
We followed that search with a Google Scholar search using the same keywords. Unlike Scopus, Google Scholar does not provide a definitive list of what exactly it indexes, but it still provides a source for locating grey literature that may have been difficult to locate through commercial publisher databases (e.g., Scopus), as well as literature beyond the scope of any one library's collection. We were able to locate 11 additional papers via this method. We ended our search at the 200th record as results were becoming increasingly irrelevant or redundant beyond that point.
Next, we searched two stand-alone libraries (EdITLib Digital Library and the Educause Library), both of which focus on educational technology materials. The EdITLib Digital Library provides access to an extensive library of conference proceedings. We were able to identify five relevant papers from the EdITLib Digital Library and 4 papers from the Educause Library that fit the aforementioned inclusion criteria.
Next, we engaged in a forward referencing process, to identify relevant papers that cited the papers that we had already located. This process was used by Gao, Luo, and Zhang (2012) in their examination of the microblogging literature and by Liyanagunawardena et al. (2013) and worked as follows: We located each one of the papers we already identified (original) in Google Scholar. This service provides information on how many times a paper is cited (Figure 1) and allows researchers to view all papers citing the original.
We identified the 120 papers that were in our corpus at the time, located one by one in Google Scholar, and examined all the papers that cited each original. If the paper fit our inclusion criteria, we included it in our database. This process returned 60 additional documents. We used Google Scholar instead of Scopus for our forward referencing check because the former appears to include more grey literature and provide greater coverage than the latter.  (17), conference proceedings (13), academic magazines (10), reports (3), and conference workshops (2).

Data Classification and Analysis
The 183 papers collected were classified and analyzed in both quantitative and qualitative ways. We describe each analytic method employed in relation to each particular research question posed.
To determine the geographical distribution of MOOC research (RQ1), we coded the affiliations of authors from our corpus (n = 460) in two ways: by the country in which their institution or organisation was located (or, if unaffiliated, by the country in which the author was located), and by the associated region.
To determine the publication outlets of MOOC research (RQ2), we classified each publication according to whether it was published in a journal or conference proceedings, and counted the times each outlet appeared in our corpus. To determine which studies were cited the most (RQ3), we identified each paper in our corpus on Google Scholar and noted its citation count (Figure 1).
To determine the data collection methods used in the identified corpus (RQ4), two researchers coded the corpus using an 8-item coding scheme. Tashakkori and Teddlie (2003) identified six data collection methods, and that was used as a basis of creating a coding scheme. Tashakkori and Teddlie identified the following methods: questionnaires and surveys, interviews, focus groups, tests, and observations. These authors also included secondary data as a data collection method, but based on our understanding of the literature and our understanding that trace data are increasingly used in digital contexts, we differentiated between the automated collection of secondary data (e.g., trace data collected by digital platforms) and human collection of secondary data (e.g., use of photographs). This raised our number of data collection methods to seven. To capture data collection methods that did not fit into any of the above categories, we used an eighth code named Other. Using these codes, independently, each researcher examined each paper and classified it according to its data collection methods. Papers were assigned between one and five data collection method codes. Researchers assigned the codes a total of 642 times. Inter-rater agreement was calculated to be 68.5%. Next, the two coders came together to reconcile differences. Each difference was discussed and resolved, until both coders were satisfied that the codes assigned appropriately described the data collection methods used in each manuscript. The final dataset consisted of the eight codes assigned 324 times.
To determine the data analysis methods used in the identified corpus (RQ4), two researchers used a coding scheme consisting of 11 categories. These categories were compiled by the researchers after consulting methodological resources (e.g., Merriam, 2002) and prior MOOC research (e.g., Gasevic et al., 2014). Each category described a data analysis method. The categories were: Basic qualitative study, grounded theory, phenomenology, ethnography, discourse analysis, experimental and quasiexperimental, correlational, natural language processing, social network analysis, descriptive statistics, and other. Again, each researcher examined each paper independently and classified it according to its data analysis methods. The two researchers assigned the 11 categories a total of 830 times. Interrater agreement was calculated to be 78.7%. The two researchers discussed and reconciled differences until consensus was reached on all codes. The final dataset consisted of the 11 codes assigned 439 times.
To determine the research strands in the identified corpus (RQ5), two researchers independently read and assigned emerging codes to each paper. The codes described the focus of each paper, and there were no pre-determined limits set on the number of codes to be assigned to each paper. The first researcher generated 25 codes and the second researcher generated 31 codes. Next, they met to discuss their findings and identify categories describing their codes. They identified five categories describing strands of extant MOOC research: student-focused, teacher-focused, design-focused, context and impact, and other. Next, they returned to the papers and independently assigned papers to each theme. Inter-rater agreement was calculated to be 77.9%. Next, they discussed discrepancies and resolved them, reaching agreement on all papers. The final dataset consisted of the five codes assigned for a total of 291 times.

Limitations
This study clarifies the state of the literature published at a particular point in time using a particular methodology. There are four limitations arising from the research context. First, this study draws upon less than three years of data and its findings are only representative of the research on MOOCs during this period-the literature will likely change over time in the same way that it has changed since Liyanagunawardena et al. (2013), published their own analysis. This limitation is inevitable as new work in the area is quickly emerging. Second, the data analysis methods used in this study do not allow us to judge the quality of the research reported. It should be recognized therefore, that the papers included in our corpus are of mixed quality. For instance, our reporting on the use of grounded theory does not necessarily examine whether the authors used the method correctly, rigorously, or even uniformly. Third, while our data reflect some of the content of the papers analyzed (e.g., what data collection methods were used), they do not reflect a full evaluation of the contents of the papers (e.g., what were the results reported in each research strand identified). Fourth, while non-English native speakers author MOOC papers in English, the choice to exclude papers written in languages other than English may have limited the size and diversity of the sample.

RQ1: How is Empirical MOOC Research Geographically Distributed?
This question was asked to identify whether the empirical research on MOOCs centered in one country or region, or whether it was a global phenomenon. The vast majority of authors came from North America and Europe, which between them accounted for over 82% of the author affiliations (Table 2).  Table 3 shows that more than half of the authors were affiliated with institutions from the USA and between 1% and 10% of the authors came from each of the United Kingdom (10%), Australia (7.7%), China (5.4%), Spain (4.8%) Canada (4.5%), Germany (2.2%), Switzerland (1.3%), and the Netherlands (1.1%).

3
Decimal values reflect authors with affiliations in multiple regions (e.g., 0.5 added to each region's count for an author with affiliations in two separate regions).
These nine countries represented 87.2% of the author affiliations. The other 12.8% of authors represented 29 other countries that had four or fewer authors each, with each country having less than 1% of the corpus.

RQ2: Is Empirical MOOC Research Usually Published in Journals or Conference Proceedings? In Which Journals and Conference Proceedings is MOOC Research Currently Being Published?
Ninety-eight papers were published in peer-reviewed journals and 85 were published in conference proceedings. Eighty publication outlets published one item each, 16 outlets included two items each, and four outlets included three items each. The rest of the outlets included 4 or more items and these are shown in Table 4. Three of these outlets are peer-reviewed journals that focus on online or distance education and published special issues on MOOCs during the period under investigation. Four of these outlets are conferences with one focusing specifically on learning at scale and MOOCs (L@S '14).

RQ3: Which Empirical MOOC Studies Are Cited the Most?
At the time of writing, of the 183 papers identified, eighty-seven (47.5%) were cited zero times. Seventytwo papers were cited one to ten times, with the majority being cited once (16), twice (12), or thrice (14).
Ten papers were cited 11 to 20 times. The rest of the papers (13) were cited 25 or more times. These are shown in Table 5. For comparative purposes, we are including a column that shows the number of times these papers were one year after the data were collected and right before this paper was published. Two of these papers were published in 2014 and the rest were published in 2013. Seven of the thirteen papers were published in conference or workshop proceedings and six were published in journals. While this analysis focuses on a different time period than that of Gasevic et al. (2014; i.e., the grant applications examined by Gasevic et al. were for grants awarded in late 2013, and therefore papers cited would have been published before then) and we limit our corpus to empirical papers, the two most cited papers identified here (Breslow et al., 2013;Kizilcec, Piech, & Schneider, 2013) were also highly cited in Gasevic et al.'s corpus.

RQ4: What Data Collection Methods and Data Analysis Methods Are Used in Empirical Studies of MOOCs?
Data collection methods. The majority of the papers used one (44.8%) or two (38.3%) data collection methods. The rest used three (13.1%), four (2.7%), and five (1.1%) data collection methods respectively. Automated collection of secondary data (e.g., trace data) was used most frequently (73.2% of papers used this method). The second most popular data collection method was questionnaires/surveys, which were used in 55.7% of papers. The rest of the data collection methods were used much less frequently: Interviews (13.7%), secondary data collected by humans (13.7%), tests (8.7%), observations (4.9%), focus groups (4.4%), and other (2.7%).
Automated methods of data collection were used as the sole data collection method in 26.8% of the corpus. They were also used in combination with one (31.7%), two (11.5%), three (2.1%), and four (1%) other methods. Questionnaires and surveys were used as the sole data collection method less frequently than automated methods (10.4%). They were used in conjunction with one (28.9%), two (12.5%), three (2.7%), and four (1%) other methods.
Data analysis methods. The majority of the papers used two (47%) or three (28.4%) data analysis methods. The rest used one (13.1%), four (8.7%), and five (2.7%) analytic methods. Table 6 shows the data analysis methods used in this corpus ranked in order of frequency. Descriptive statistics were reported in almost all papers, though follow up analysis showed that they were used as the sole method of analysis in only 7.7% of papers. Correlational, basic qualitative, and experimental or quasi-experimental methods were also used frequently.

RQ5: What are the research strands of empirical MOOC research?
We identified five categories describing the research reported in the corpus: student-focused; designfocused; context and impact; instructor-focused; and other (Table 7).  (2014) and Perna et al. (2014) found low completion rates (often less than 10% of registrants), and Reich (2014) showed that certification rates vary substantially among learners with different intentions. This line of research is often related to researchers' attempts to identify and classify learners into various groupings. Kizilcec, Piech, and Schneider (2013)  Common within this theme was research that investigated the utility of individual elements of MOOCs.
Other. Papers with content that could not be categorized into student-focused, design-focused, instructor-focused, or context and impact, were included in a final theme that we called other (9.8%). This theme included papers examining issues pertaining to MOOCs and institutions of higher education (e.g., O'Connor, 2014) and meta-research papers examining MOOC research and focus areas (e.g., Gasevic et al., 2014). All but two of the papers with content classified as other also contained content belonging to one or more of the four previously-described themes.

Discussion
We

There Is a Paucity of Research Examining Instructor-Related Topics
Analysis shows that there is limited research reported on instructor-related topics. This is a rich area for future research. Topics of interest in this area may include instructor motivations, experiences, and perceptions. Researchers could examine how instructors experience the design and development of these courses, why they choose to teach MOOCs, and how they perceive their relationship with MOOC learners (and whether that relationship differs from traditional student-learner relationships). Given that a number of MOOCs enlist the help of instructional assistants (e.g., Teaching Assistants and course alumni), research in this area could investigate the impact of instructional assistants on learning and support, as well as the experiences that instructional staff have in the delivery of the course.

Understanding Learner Subpopulations
Results show that a number of researchers have attempted to identify and classify learners into various groupings. For instance, the literature suggests that MOOC learners can be described as completing, auditing, disengaging, and sampling (Kizilcec, Piech, & Schneider, 2013), or no-shows, observers, drop-

cMOOCs vs. xMOOCs
While some of the papers that we identified alluded to distinctions between xMOOCs and cMOOCs, and some authors focused their research on a particular design, the current literature continues to reflect the findings of Liyanagunawardena et al. (2013): the majority of the literature does not clearly define the kinds of MOOCs studied. As the field continues to develop and change, the distinctions between xMOOCs and cMOOCs become increasingly unclear and problematic. In particular, courses include designs, artifacts, and philosophies that cannot be easily categorized into cMOOCs or xMOOCs. With continued attempts at exploring alternative MOOC designs-such as the dual-layer MOOC (Dawson et al., 2015) or the MOOC in which a teaching bot contributed automated instructional, procedural, and social support to learners (Bayne, 2015)-the cMOOC vs. xMOOC distinction belies substantive differences between individual courses.

The Geography of MOOC Research
Our geographical analysis of author affiliations showed that over half of the authors conducted their research in the USA, and over 80% of authors were affiliated with institutions in North America or Europe. By comparison, Zawacki-Richter et al. (2009) found that over 80% of first authors of papers published in distance education journals hailed from five countries: the USA, Canada, the United Kingdom, Australia, and China. This indicates that these fields of research are similarly concentrated in a few geographic regions. We compared these results to output from the SCImago Country Rankings tool (SCImago, 2007), which summarizes various indicators, including geographic origin, of all papers published in Scopus. In contrast, according to SCImago, of all citable documents published in 2013 across all disciplines only 18.6% came from the USA, and it takes the top 20 countries (including at least seven countries from outside of North America and Europe) to account for 80% of academic output. As such, while it appears that the MOOC literature and the distance education literature may be similar in this regard, this is not simply a reflection of geographical contributions to academic output in general. Future research in this area may examine whether and how research on MOOCs differs according to its geographical origins, especially as prior research indicates that conceptions of education as a discipline differ between regions (e.g., Biesta, 2011). While it is possible that our exclusion of literature authored in languages other than English limited the literature from other regions than North America and Europe, it is also possible that the preponderance of literature from these regions discovered in the research process described herein may be the result of other factors (e.g., filtering algorithms) that make some literature less visible than other literature. Future research may also examine whether the MOOC literature is biased in particular ways in favor of certain countries or regions (e.g., how visible is literature from other regions in Google Scholar searches?).

Conclusion
We reported the geographic distribution, publication outlets, citations, data collection and analysis methods, and research strands of empirical research focusing on MOOCs in 2013-2015. We hope that this systematic analysis enables researchers to make better sense of the empirical literature on MOOCs and its direction and limitations. There are many possibilities for future research in this area. Future systematic reviews of the literature may focus on synthesizing knowledge on particular areas of interest, (e.g., completion and retention in MOOCs; learner motivations in MOOCs) or examining whether research methods used to understand MOOCs follow standard methods of inquiry or follow methods that take into advantage the digital nature of learning and teaching in this context. Further, future research may compare how papers published since this paper was written fit into the picture described herein and engage in further categorization and cross-tabulation of the literature. Finally, we hope that our results highlight the need for a critical reflection on the part of researchers as to why they study the topics that they study and to why they use the methods that they do.