Investigation of Emerging Trends in the E-Learning Field Using Latent Dirichlet Allocation

E-learning studies are becoming very important today as they provide alternatives and support to all types of teaching and learning programs. The effect of the COVID-19 pandemic on educational systems has further increased the significance of e-learning. Accordingly, gaining a full understanding of the general topics and trends in e-learning studies is critical for a deeper comprehension of the field. There are many studies that provide such a picture of the e-learning field, but the limitation is that they do not examine the field as a whole. This study aimed to investigate the emerging trends in the e-learning field by implementing a topic modeling analysis based on latent Dirichlet allocation (LDA) on 41,925 peer-reviewed journal articles published between 2000 and 2019. The analysis revealed 16 topics reflecting emerging trends and developments in the e-learning field. Among these, the topics “MOOC,” “learning assessment,” and “e-learning systems” were found to be key topics in the field, with a consistently high volume. In addition, the topics of “learning algorithms,” “learning factors,” and “adaptive learning” were observed to have the highest overall acceleration, with the first two identified as having a higher acceleration in recent years. Going by these results, it is concluded that the next decade of e-learning studies will focus on learning factors and algorithms, which will possibly create a baseline for more individualized and adaptive mobile platforms. In other words, after a certain maturity level is reached by better understanding the learning process through these identified learning factors and algorithms, the next generation of e-learning systems will be built on individualized and adaptive learning environments. These insights could be useful for e-learning communities to improve their research efforts and their applications in the field accordingly


Introduction
Today, e-learning has become a very important topic, with applications in every field, as supportive training, lifelong learning modalities, and support tools, for all types of educational systems.Due to the effects of the COVID-19 pandemic on teaching and learning environments, research into e-learning studies has become even more critical.A recent study by Chavarría-Bolaños et al. (2020) reported the importance of e-learning in dental education, for example.E-learning studies can be considered as multidisciplinary, as several fields contribute to it from different perspectives.The roots of e-learning studies go back to the late 1950s, and therefore, there is a large amount of available literature detailing improvements and achievements in this field over the decades.Furthermore, as highlighted by some researchers, since 2000, the number of studies conducted on e-learning has significantly increased (González, 2010) and will likely accelerate in the current pandemic situation.By analyzing these studies, one can get a general overview of e-learning studies that can help us understand how the field is evolving and where it is going.Such studies are very critical in guiding future research and developments related to all kind of e-learning studies.In the literature, there have been several attempts to analyze earlier studies and provide a general overview of the field.As defined by Rowley and Slack (2004), systematic reviews aim to facilitate the definition, evaluation, and interpretation of studies in a specific field by examining the concepts, applications, and theories pertaining to it.These studies systematically review the literature to answer research questions to better understand and examine the key concepts in the field.Some of the previous studies on e-learning were conducted to provide insights into a specific area of e-learning, such as the Semantic Web for distance learning (Bashir & Warraich, 2020), virtual education (Fermín-González, 2019), educational data mining (Rodrigues et al., 2018), mobile learning in higher education (Krull & Duart, 2017), and machine-learning-based recommendation systems for e-learning (Khanal et al., 2020).Another group of studies were conducted on the implementation of elearning in specific fields, such as e-learning for training work corporations (Kaizer et al., 2020), e-learning in undergraduate dentistry education (Zitzmann et al., 2020), implications of e-learning for universities (Kibuku et al., 2020), and e-learning for mathematics teaching (Klingenberg et al., 2020).There are only a limited number of systematic review studies addressing e-learning studies in general.Among these, a systematic review conducted on 99 e-learning articles published between 2010 and 2018 reported four main themes in the field: educational systems, learning issues, student behaviors, and online learning tools (Rodrigues et al., 2019).Valverde-Berrocoso et al. (2020) also conducted a systematic review analyzing 248 articles published between 2009 and 2018 and discovered the following: that online students, online teachers, and curriculum-interactive learning environments were the three main nodes of e-learning; that MOOCs were the most researched e-learning modality; that the community of inquiry and the technological acceptance model were the most used theories in the analyzed studies; and, finally, that case studies were the most frequently used methodology.As these systematic reviews require a lot of researcher effort, they are usually conducted with a limited number of articles.
Another group of studies attempting to provide a bigger picture of e-learning studies were undertaken as bibliometric analyses in scientific and research fields to examine the properties and recorded information based on a number of indicators (Abramo et al., 2009;Patra et al., 2006).As these studies considered certain indicators as the basis for analysis, they were conducted on larger data sets.For instance, Hung (2012) examined 689 articles published between 2000 and 2008 through a bibliometric analysis.Similarly, Asadzandi et al. (2017) descriptively analyzed 23,805 e-learning studies through the categories provided by the Scopus database, such as date of publication, type of documents, language of the documents, source of articles, subject areas, authors, and their affiliations, concluding that there was a steady growth in the number of articles on e-learning studies, which was parallel with its development.Similarly, Tibaná-Herrera et al. (2018a)  Thus, bibliometric analyses were conducted on larger data sets and possibly provide a bigger picture of the field; however, as the analysis was based on a number of indicators, these bibliometric analyses missed out the details in the content of the published studies, which limits their contributions to the field.
All of these earlier studies are very valuable in providing a general perspective of the field of e-learning, despite limitations such as the limited number of articles, the narrow scope of the field, or limitations in the analysis methods (Çakiroğlu et al., 2019).As the number of articles in the field of e-learning is significantly increasing, it is becoming more difficult to conduct a manual analysis (Yang et al., 2016).Different methods are used for in-depth analysis of superficial description.In this context, various analyses can be performed using text/data mining methods with a large number of article sets.Today, different types of text analysis of a high volume of documents, such as word frequency analysis, text classification approaches, topic modeling analysis, and n-gram analysis, are being used extensively to gain a deeper understanding of specific domains and fields (Gurcan, 2019;Gurcan et al., 2021).For instance, in the field of distance education, Gurcan and Cagiltay (2020) recently conducted a text-mining-based review by analyzing 27,735 peer-reviewed journal articles published between 2008 and 2018 using n-grams, and they reported 10 main themes of the field.However, they applied a manual classification on the topics identified (Gurcan & Cagiltay, 2020).Recently, with improvements in machine learning and data mining techniques, significant developments have occurred in the areas of automatic topic determination, semantic information extraction from texts, and automatic analysis of very large data sets using text mining methodologies (Gürcan, 2009;Gurcan, 2018).These techniques open a wider window into understanding studies in the field and offer objective analysis methods.Accordingly, the study discussed in this article aimed to provide a wider perspective by analyzing 41,925 e-learning journal articles and reviews published between 2000 and 2019 using the latent Dirichlet allocation (LDA) algorithm (Blei et al., 2003).The methodology of the study was designed to investigate the following research questions (RQ):

Methods
The literature available on e-learning is very comprehensive.Since journal articles are subjected to a peer review process, this study considered only peer-reviewed journal articles.More specifically, only e-learningoriented journal articles published in English in the last 20 years (between 2000 and 2019) were included in this study.Since e-learning is an interdisciplinary field covering a wide spectrum of topics, an iterative strategy was followed to determine the search string for the study.Namely, first, a wide literature review was carried out in order to determine the synonym equivalents of e-learning expression in the literature.
Then, the opinions of field experts were obtained regarding the extracted terms.The final keywords were determined from the results of the examination by five field experts and the evaluation of the researchers.
The search query that met the search string and other criteria determined as a result of these processes was created as follows: TITLE-ABS-KEY (( "online learning" OR "e-learning" OR "distance learning" OR "mobile learning" OR "web-based learning" OR "online training" OR "e-training" OR "distance training" OR "mobile training" OR "web-based training" OR "online education" OR "e-education" OR "distance education" OR "mobile education" OR "web-based education" OR "online teaching" OR "e-teaching" OR "distance teaching" OR "mobile teaching" OR "web-based teaching" OR "MOOC" OR "online open course" ) ) AND ( PUBYEAR < 2020 ) AND ( PUBYEAR > 1999 ) AND ( LIMIT-TO ( DOCTYPE , "ar" ) OR LIMIT-TO ( DOCTYPE , "re" ) ) AND ( LIMIT-TO ( LANGUAGE , "English" ) ) The Scopus database was used to obtain articles suitable for the scope of the study since it covers more than 5000 publishers worldwide-including Elsevier, Emerald, IEEE, Sage, Springer, Taylor & Francis, and Wiley Blackwell-and this number is increasing daily (Gurcan et al., 2021;Mongeon & Paul-Hus, 2016).The query given above was run on April 5, 2020, to access the relevant articles from the Scopus database.The search brought up a total of 41,925 articles (2619 review articles and 39,306 research articles).The title, abstract, and author keyword information of these articles were added to the data set.
To prepare the e-learning corpus for probabilistic topic modeling, preprocessing tasks such as tokenization; removing meaningless words, symbols, and stop words; and stemming were implemented (Gurcan et al., 2021).Then, an e-learning document term matrix was created, in which each row represented an article and each column represented a unique word in the e-learning corpus.Afterward, LDA, a probabilistic topic modeling approach (Blei et al., 2003), was used for creating and fitting a topic model to the e-learning corpus and analyzing this corpus.
LDA is a generative approach used to discover hidden semantic patterns in a large, relatively unstructured document corpus (Blei, 2012).Text documents contain hidden semantic patterns called "topics," and each of these topics is defined by a probability distribution over a fixed set of words (Blei et al., 2003).Since LDA is an unsupervised method for topic modeling, it does not require any training set, tags, or metadata for learning, so large numbers of textual documents can be analyzed in a short time.The LDA model is frequently used in content analysis based on topic modeling (Blei et al., 2003;Blei & Lafferty, 2007).For these reasons, the LDA model was preferred over others and employed for topic modeling analysis of the elearning corpus in this study.This analysis revealed 16 topics at an optimal level.The top 20 words with the highest probability were identified for each topic and assigned to these topics.A suitable topic name was defined for each topic taking into account the first five words in the topics.Furthermore, the volumetric percentage rates and the temporal trends of the topics that modeled the entire e-learning corpus were revealed by calculating the distribution of topics per document and the word distributions per topic (Gurcan et al., 2021;Gurcan & Cagiltay, 2020).

Results
The results of the study are first presented descriptively by considering the number of yearly publications, the top subject areas and journals, and the top countries of the authors.Additionally, the top keywords found in these articles are also mentioned descriptively.Further, a detailed topic modeling analysis is presented to provide an overall picture of e-learning studies.

Descriptive Analysis
In order to describe the bibliometric characteristics of the e-learning field between 2000 and 2019 (RQ1), the descriptive analysis of the corpus is given below.The total number of articles published between 2000 and 2019 and their yearly distribution are given in Table 1, showing a total of 41,925 articles analyzed in the study.It should be noted that although there was a slight decrease in the number of articles in 2002 and 2010 compared to the other years, there was an overall linear increase in the number of publications each year.Figure 1 shows the top 10 subject areas addressed by the highest number of articles.The majority of the articles were published in the field of social sciences, including educational sciences (n = 23,150).As some studies were carried out in more than one discipline, they were classified under each of these subject areas by Scopus.The top 20 keywords of the analyzed studies are listed in Table 2, with the top five keywords being "elearning" (30.68%), "human" (27.35%), "education" (16.42%), "teaching" (12.88%), and "student" (12.01%).

Topic Modeling Analysis
In order to reveal the emerging topics in the e-learning field (RQ2), the results of the topic modeling analysis achieved by the LDA are given in this section.Using a LDA-based topic modeling procedure, 16 topics were discovered (see Table 3).The rate (%) of each topic was calculated by their volume, referring to the number of articles published on each topic.The top 20 keywords classified under each topic are also given by considering their volume rates.Table 3 shows that the most intensively studied topic by researchers was "MOOC" (10.13%), while the least read topic was "teacher education" (1.76%).Figure 4 shows the volume of the topics among all the articles considered in this study.Accordingly, the topics can be classified as high-volume topics having a ratio higher than 9.0%, medium-volume topics having a ratio higher than 5.4% and less than 9.0%, and low-volume topics having a ratio less than 5.4%.The topics having the highest ratio were "MOOC" (10.13%), "learning assessment" (9.86%), "distance education" (9.68%), "e-learning systems" (9.05%), and "learning algorithms" (9.02%), while those with lower ratios were "teacher education" (1.76%), "language teaching" (3.16%), "mobile learning" (3.74%), "training" (3.80%), and "information resources" (4.34%).According to these ratio differences, the discovered topics could be classified under three groups.Changes in the volume ratios were taken into account while classifying the discovered topics.There were sharp decreases and clusters in volume ratios below 9 and below 5.These groups were labeled by the researchers as high-volume (n = 5), medium-volume (n = 6), and low-volume (n = 5) topics.

Percentage Rates of Articles From 2000 to 2019 for Each Topic
To better understand the temporal trends of e-learning topics between 2000 and 2019 (RQ3), the developmental stages of these topics were analyzed in four-year periods as shown in Table 4, with the average number of articles published under each topic (n) for each time period being evaluated.Their percentages according to the total number of articles published each year were calculated, and their average value for each period (%) is also given.Their accelerations were calculated by subtracting the average percentage of articles from that of the previous years.The average acceleration values (A) for each period were also calculated and are presented in Table 4. Finally, the trends of the articles for each topic are presented graphically, considering their volume according to the percentages of the number of articles (%) and the acceleration graph through the calculated acceleration values (A).Table 4 shows that among the top volume topics, "MOOC" and "learning assessment" showed more steady behavior; however, for some topics, such as "distance education," there was a decrease and for other topics, such as "learning algorithms," there was an increase in the percentages of the number of periodical articles.Similarly, even though "teacher education" had the lowest volume, it had a steady acceleration resulting in a similar number of articles compared to the other topics.The recent trends of the topics and their acceleration values during the last period (2016-2019) are given in Figure 7. "Learning algorithms" had a significantly higher acceleration (1.21), and during the same period, the acceleration of the topics "e-learning systems" (−0.33) and "distance education" (−0.31) was the lowest.

Figure 7
Acceleration of Topics From 2016 to 2020

Discussion
In this study, the main trends of e-learning during the last 20 years (between 2000 and 2019) were determined by analyzing articles published in the field using a topic modeling analysis, and 16 main topics were discovered through the LDA-based analysis.The number of articles in this field showed a linear increase over the years (see Table 1), a result parallel with earlier work reporting that studies in the field of e-learning have started to increase and become widespread especially since the early 2000s (Tibaná-Herrera et al., 2018a).The results revealed that the top five subject areas were social sciences, computer science, engineering, medicine and business, and management and accounting.Considering that educational science is also under social sciences, our results were aligned with those of Tibaná-Herrera et al. (2018a), indicating educational science as being the major subject area for e-learning studies.Additionally, by highlighting "medical education" as one of the discovered topics (see Table 3), the results of the current study support earlier work suggesting that in recent years, e-learning studies in the field of medicine are in first place (Barteit et al., 2020).According to the results, in the e-learning corpus, the majority of the articles (975 of them) were published in the Computers & Education journal, which indicates that this journal creates a larger space for e-learning studies (see Figure 2).An examination of the origins of the articles showed that the United States was in the lead (see Figure 3; 12,024 articles), which supports the findings of Tibaná-Herrera et al. (2018b).In addition to these contributions, the results of the current study offer insights into e-learning studies, which are summarized under three main headings as follows:

Emergence of New Topics
Table 4 reveals that during the early years (2000)(2001)(2002)(2003) of the publication of e-learning studies, "distance education" (21.59%) had the highest volume ratio and can be considered as the main and oldest topic of elearning studies.In contrast, during this period, "mobile learning" (0.97%) and "training" (1.89%) had a lower volume ratio in terms of the percentage of articles; thus, they can be classified as having been very young and newly emerging topics in those years.When the acceleration values of these topics were analyzed, as seen in Figure 6, "distance education" had the lowest acceleration value (−1.01), an indicator that the emergence of these younger topics, such as "mobile learning" and "training," decreases the volume percentages of the older topics like "distance education."

Major Topics
The results of this study indicate that "learning algorithms," "learning factors," and "adaptive learning" were the major topics having the highest overall acceleration values (0.57, 0.28, and 0.22, respectively; Figure 6).
Additionally, Table 4 shows that the topic "MOOC" had the highest average volume (n = 849.74).These results seem to confirm the expectation of Graf et al. (2010) that MOOCs would occupy an important place in the future.In addition, Chiappe and Lee (2017) supported the view that MOOCs had an important place in e-learning, which is also consistent with the findings of Valverde-Berrocoso et al. ( 2020) that reported MOOCs as being the most researched e-learning modality.

Future of the Field
The analysis of the accelerations of the topics revealed that after 2008, "learning algorithms" and "learning factors" were also becoming dominating topics with higher overall (0.57 and 0.28, respectively; Figure 6) and recent (1.21 and 0.30, respectively; Figure 7) acceleration values.As in the current stage of e-learning systems a large amount of data is being collected from e-learning activities, studies on "learning algorithms" and "learning factors" will offer an understanding of the learning process, which will also create a baseline for its adaptation and individualization.As it is not easy to thoroughly create adaptive e-learning systems without developing appropriate learning algorithms and without a deeper understanding of the learning factors, the acceleration of the topic "adaptive learning" has recently dropped from an overall acceleration value of 0.22 (Figure 6) to −0.11 (Figure 7).However, after developments in topics such as "learning algorithms" and "learning factors," the acceleration of "adaptive learning" can be expected to show an increase in the following decades, with a similar trend for "mobile learning."

Conclusion
In this study, 16 main topics of e-learning studies were identified, and the results of the study are important in terms of determining the trends in the field of e-learning.Based on the results of this study, it can be concluded that "learning algorithms," "learning factors," "training," "language teaching," and "educational management" have been the highly accelerating topics during the last four years, and in the near future, they are expected to have an even greater impact on the field and create a baseline for more individualized and adaptive mobile platforms.Accordingly, it can be concluded that although the field is encompassing more adaptive e-learning systems, the developments for supporting adaptive e-learning platforms are not yet sufficiently mature, and during the next few years, the dominating topics will be those five topics.
However, after these five topics reach a level of maturity, "adaptive learning" and "mobile" can be expected to have higher acceleration.The results of the current study can offer support to researchers working in this field, as well as to decision-makers and practitioners.In future studies, similar analyses can be conducted to determine the changes in this field and perform comparative studies.Furthermore, the results obtained from this work can lead to more comprehensive studies on sub-topics based on both high-volume and fastaccelerating issues.
In this study, LDA-based topic modeling technique was implemented on 41,925 peer-reviewed journal articles.Even though this technique provides an opportunity to analyze large data sets, currently, it is not possible to conduct deeper analyses like systematic reviews through LDA.In the future, with improvements in topic modeling algorithms, deeper analysis of large data sets can also be performed, which could be expected to provide very important insights for the researchers in this field.
categorized e-learning as an emerging discipline consisting of 64 descriptors and 219 journals and congresses indexed by Scopus between 2012 and 2014.Another bibliometric analysis was conducted by Tibaná-Herrera et al. (2018b) on 39,244 documents published between 2003 and 2016 that were indexed by Scopus and SCImago Institutional Rankings.They reported the following: the majority of these studies were published by authors from the United States; the University of Hong Kong was the most productive institution; and the National Taiwan University of Science and Technology had the greatest collaboration.

RQ1.
What have been the bibliometric characteristics of e-learning research during the period between 2000 and 2019?RQ2.What have been the emerging topics in the e-learning field in the period between 2000 and 2019?RQ3.How have the topics of interest in e-learning studies changed from 2000 to 2019?RQ4.What are the future trends in the e-learning field?

Table 1
Yearly Distribution of the Articles

Table 2
Top 20 Keywords Addressed by E-Learning Articles

Table 4
Volume and Acceleration of Articles for Each Discovered Topic in Four-Year Periods