June – 2015

Are the Most Highly Cited Articles the Ones that are the Most Downloaded? A Bibliometric Study of IRRODL

Avello photo Anderson photo

Raidell Avello Martínez1 and Terry Anderson2
1University of Cienfuegos, Cuba, 2Athabasca University, Canada

Abstract

Publication of research, innovation, challenges and successes is of critical importance to the evolution of more effective distance education programming. Publication in peer reviewed journal format is the most prestigious and the most widespread form of dissemination in education and most other disciplines, thus the importance of understanding what is published and its impact on both researchers and practitioners. In this article we identify and classify the leading articles in arguably the leading peer reviewed journals in this discipline.

The journal The International Review of Research in Open and Distance Learning (IRRODL) is a peer reviewed academic journal that has been published since 2000. The journal has published between 3 and 6 issues annually with between 50 and 111 research articles per volume. In order to assess the general and the particular impact of highly cited articles this work describes the main bibliometric indicators of the IRRODL journal and these are compared with the total galley views in all formats, PDF, HTML, EPUB and MP3, that IRRODL publishes. In addition to identifying characteristics of the most widely cited articles this research determines if there is a correlation between the articles most highly cited by other publishing researchers and the number of views, indicating interest from both practitioners and research communities. The results show a significant and positive relationship between the total number of citations and the number of views received by articles published in the journal, indicating the impact of the journal extends beyond active publishers to practitioner consumers.

Keywords: International Review of Research in Open and Distance Learning, IRRODL; highly cited papers; Google Scholar; research trends; open and distance learning; altmetrics; webmetrics

Introduction

The International Review of Research in Open and Distance Learning (IRRODL) is a peer reviewed academic journal that has been published continuously since 2000. During that time the journal has published 812 research articles. The focus of IRRODL is international open and distance learning, with some special issues having a regional focus and other topical issues such as connectivism, right to education, mobile learning etc. Since 2006 the journal has been published using Open Journal Systems and thus all the views and download data from the published documents are logged as historical data and used in this research.

After a six year battle for acceptance (Anderson & McConkey, 2009) IRRODL was indexed by Thomson/Reuters Social Science Citation Index (SSCI) with an impact Factor of .687 (2010). IRRODL is extensively indexed by a number of services including abstracts and full and open source access in Elsevier SciVerse, Scopus, Directory of Open Access Journals (DOAJ), Open Archives Initiative, Cabell Publishing Inc., ERA Online, Ulrich’s Web, Genamics, Index Copernicus, ERIC, H.W. Wilson Company and Google Scholar.

This work describes the main bibliometric indicators of the IRRODL journal and then discusses and calculates the relationship between the impact as judged by citations by other researchers and by practitioners as judged by the number of total galley views (all formats: PDF, HTML, EPUB and MP3). Finally, a correlation between the total citations and views (downloads) as an indicator of altmetrics received by articles published in the journal is presented.

Citation Analysis

Citation analysis is a research method used to assess the impact of contributions of individuals, institutions, research groups and academic journals. This kind of analysis is applied in most scientific fields including social science and it allows us, as Kinshuk, Huang, Sampson, and Chen (2013) pointed out, to observe how frequently a document has been cited by other authors, providing one way to calculate the relevance and importance of an author, an idea or a particular document.

The use of citation analysis has grown into the most popular way to evaluate the impact and importance of published papers, books and other academic documents using the number of citations in peer-reviewed journals (Bornmann & Daniel, 2008) as an indicator. Nevertheless, this topic has been controversial and developing objective “scientific” measures of impact remains elusive. Thus, different ways to measure the number of citations and more fundamentally the value to a discipline and profession of different types of research have been developed.

In the literature we found many citation studies in different fields (Lee, Driscoll, & Nelson, 2004; Allen, Jacobs, & Levy, 2006; Aylward, Roberts, Colombo, & Steele, 2008; Blessinger & Hrycaj, 2010; Rezaei, Navidi, Rokni, & Pourmand, 2012; Kinshuk et al., 2013; Avello Martínez, Rodríguez, & Rodríguez, 2014) that offer an overview of the most important issues, trends, patterns, needs and areas of priority in their field of research. Moreover, these studies identify principal publications and the most influential papers and authors. Finally these studies propose questions and guidelines for future research.

Specifically, in the educational technology and distance education field, many related studies were found as well. For example, Klein (1997) studied 100 articles published in the journal Educational Technology Research and Development between 1989 and 1997. Rourke and Szabo (2002) performed a content analysis of 235 articles from the Journal of Distance Education focused on item type, topic, research method, and biographical information about first authors, and the main aspect to be highlighted here is that only one trend was found for the category item type in which an uncertain ascending trend is apparent in the proportion of empirical items, thus indicating a broad diversity of item types.

Zawacki-Richter, Anderson and Tuncay (2010) evaluated 12 distance education journals and carried out a systematic review of the number of citations per article (N = 1,123) and per journal issue between 2003 and 2008. They found the number of citations per journal and per article indicated little difference between open and non-open access journals, however the articles in open access journals were cited earlier than in non-open access journals. Kinshuk, Huang, Sampson, and Chen, (2013) analysed the research type, research topic, first author’s country, international collaboration, participant levels, learning domain, research method, and frequently appearing keywords in the top 20 highly cited papers in the Educational Technology and Society journal (ET&S) during 2003-2010. Halverson et al. (2014) determined the most frequently cited books, edited book chapters, and papers on blended learning. This identification of key articles serves as a filter to help others prioritize their reading and research and also helps users identify key players and ideas to investigate further. Furthermore, these studies illustrate a consistent interest in this field and in the value of such journal bibliographics.

Google Scholar

The release of Google Scholar (GS) in 2004 generated much media coverage and academic debate (Giustini, 2005). Google Scholar indexes a wide range of information including peer-reviewed articles, theses, books, chapters, conference proceedings, preprints and other documents from academic publishers (Gehanno, Rollin, & Darmoni, 2013). Thus, the materials it uses to calculate citation indexes is much broader in scope, though less exclusive, than other bibliographic systems which index only selected peer-reviewed journals. As Beckmann and Wehrden (2012) argue “the coverage of GS is increasing and, despite the fact that it is said to be not exhaustive, it is exhaustive enough for the studies that are considered of enough quality or relevance for systematic reviews”.

Google Scholar does not offer the authority structure or transparency of coverage that many librarians and bibliometricians expect from a scientific information resource. However, as Torres-Salinas (2009) propose, it might well be of considerable use for individual academics interested in citation analysis, as well as higher level bibliometric analyses such as citation analysis. Further, its broader coverage may make it more useful to practitioners, professionals and others who wish a broader perspective on their citizen science than that provided by tools designed for full time academics.

Although some features (notably the lack of transparency used by Google in selecting items for inclusion) of Google Scholar have been widely criticized, it has been shown that the journals listed in Google Scholar are more likely to have a greater number of citations received compared to its main competitor, Thompson/Reuters Web of Science. The main complaints against Google Scholar include poor standardization and/or duplication of retrieved articles, lack of control for self-citation and possibilities of “gaming the system”. However the extensive coverage of Google Scholar, and especially in emergent disciplines such as distance education, where new and open access journals are more common, have resulted in its wide use by both academics and professionals. For example, of the more than 15 peer reviewed journals focused on distance education or elearning, only two are indexed by Web of Science, whereas all are indexed by Google Scholar. Finally, Harzing (2010) and Ebrahim et al. (2014) calculated strong and positive correlations between Google Scholar and Thomson/Reuters Web of Science citation ratings.

Objectives and Research Questions

The purpose of this research is to undertake a cross-sectional study of the published papers of the journal IRRODL in the period 2008 to October 2013, taking into account both the number of citations per article from Google Scholar and the total galley views (TGV) supplied by Open Journal Systems. We identify the papers with the most citations by other researchers and viewed these by year of publication, principal authors and authors’ country of origin. Moreover, the correlation between citations and total galley views was also calculated, and the Pareto Principle (80-20 rule) is tested to see if it applies to both samples.

Three questions guided the study:

Sample and Method

The data were extracted from all articles published in IRRODL from 2008 to October 2013. IRRODL was selected because of its reputation as one of the most important and recognized journals in the field of open and distance learning (Zawacki-Richter, Bäcker, & Vogt, 2009). We also believe that journals that are published under open access licenses are accessible to both researchers and to practitioners, whereas, closed publications are usually accessible only to researchers through their institutional accounts. IRRODL has been published since 2000 and all the articles are available at http://www.irrodl.org. The journal published 447 articles from 2008 to 2013, the time frame covered by this research as shown in Table 1.

Table 1

The search period was arbitrarily set from 2008 to October 2013 so as to show recent activity but also to allow a few years for articles to be cited and viewed. The total number of times an article is cited and downloaded is related to the length of time since its publication, since totals are cumulative. 401 (92.1 %) of the articles published between 2008 and October 2013 (Vol 9 to Vol 14_3), were retrieved from Google Scholar confirming the fairly extensive coverage of Google Scholar (Harzing, 2010).

To examine IRRODL article citations in Google Scholar we used the software Publish or Perish (www.harzing.com/pop.htm‎) version 4.4.6 using the Journal Impact Analysis option (Figure 1) and selecting the date range 2008 to 2013; the queries were performed on October 19, 2013. The exact title of the journal “International Review of Research in Open and Distance Learning” was used in the journal option text box. Overall, the h-index (h papers that have at least h citations each) in the journal was 30, which is a rough indicator of the impact of the journal as a whole.

Publish or Perish is freeware software created by Anne wil Harzing (http://www.harzing.com/) that retrieves and analyses citations using Google Scholar and Microsoft Academic Search. This tool presents the results in an organized and friendly manner that can be exported to other applications such as MS Excel or SPSS (Harzing & van der Wal, 2008).

Figure 1

During the period of time selected for this study, the number of citations per paper ranged from a low of zero to a high of 134 with an average of 8.38 citations/paper. In order to identify the most cited papers by year of publication, principal authors and authors’ country, we selected papers that were cited at least 30 times; this resulted in a selection of 33 papers.

Finally, the total galley views (all formats: PDF, HTML, EPUB and MP3) data published by IRRODL was correlated with the number of citations per article. For this task the sample was enlarged to the 100 most cited articles in order to include other articles with important citations data, the range of citations increased to 9 – 134. This additional gauge is introduced as a way to determine if the interest in an article by other researchers (thus the citation in a peer reviewed journal) is correlated with interest from practitioners as measured by the number of times the article was downloaded. We acknowledge that the download numbers include requests from both researchers and practitioners, whereas citation counts represent interest by that subset of distance educators who are active researchers. If the articles are meeting the needs of both groups, we would predict strong correlations between them. In addition, the Pareto principle was used to test if 20% of the papers published by IRRODL are responsible for 80% of the citations.

Results and Discussion

General Bibliometric Indicators

The main general bibliometric indicators yielded from the query using Publish or Perish software from the years 2008 to October 2013 are detailed in Table 2. It is important to note that IRRODL’s h-index of 30 is relatively high in comparison with other open access journals in distance education such as the Turkish Online Journal of Distance Education (12), Asian Journal of Distance Education (5) and European Journal of Open, Distance and E-Learning (EURODL, 5). Similar results from the period of 2003-2008 were published by Zawacki-Richter, Anderson and Tuncay (2010) and these included h-factors for both open and subscription based distance education journals. Given that citation analysis is often perceived as the most important measure of impact of a journal (at least by researchers), we can conclude that IRRODL has an important influence in the open and distance education field.

Table 2

Table 3

In order to enrich and illustrate the journal growing in other more selective databases such as Elsevier’s Scopus, we found IRRODL has an h-index of 21 and an average of 1.14 citations per article. Moreover, other indicators that Google Scholar does not provide such as average number of references per document (32.4) are close to the general mean of all Scopus indexed journals of 33.1. Also, in Figure 2 we show graphically the incremental growth of the citation count in 2, 3 and 4 previous years. All these data and charts can be freely accessed at the SCImago Journal & Country Rank portal, which offers journal and country scientific indicators established from information in Scopus.

Figure 2

Identification of the Main Characteristics in the Highly Cited and Viewed Articles

After selecting the 33 papers with 30 or more citations we found that the citation counts of these articles ranged between 30 and 134. The range of total galley views of the selected papers ranged between 8,020 and 70,441.

There were 64 different authors who contributed to the 33 articles in this study – an average of 1.93 authors per highly cited paper, which is very close to the average of 2.01 authors for all published articles (from Table 2). These highly cited articles came from 11 different countries as seen in Figure 3 based on authors’ affiliations. The most common countries identified in the highly cited articles were: United States (24), United Kingdom (12) and Canada (11) with more than 10 authors, followed by Germany, Israel, Turkey, Norway, Italy, Denmark, Bahrain and Australia. There were three contributors each with two articles in the highly cited selection: David Wiley from the United States, Olaf Zawacki-Richter from Germany and Rita Kop from Canada.

Figure 3

Five of the 30 (17%) highly cited articles were co-authored by researchers from multiple countries: United Kingdom and Canada (2008); Canada, Germany and United States (2009); Denmark and Norway (2009); Turkey and Canada (2009); and Bahrain and United Kingdom (2009).

Figure 4 shows different research methodologies used in the highly cited articles, this classification was extracted from metadata information provided at the IRRODL site. Among highly cited articles there were 12 case studies, 8 literature reviews, 6 theoretical studies, 2 survey, 2 mixed methods, 2 historical analysis, and one technical evaluation report.

Figure 4

Tables 4 and 5 show the 10 most cited and most viewed articles. Seven articles are in both, indicating that the most cited articles are some of the most total galley viewed. A more detailed relationship is calculated in the next section.

Table 4

Table 5

Figure 5 shows the total number of citations and total galley views (divided by 1,000 to get similar data) by year of publication respectively, with the highest number of citations and views to articles inside the 100 most cited selected matches in all years; 2009 with 970 (432,638 TGV) and 2008 with 837 (330,928 TGV) were the top years which represent 72% of the total of cites, followed by 2010, 2011 and 2012 representing 27 %.

Figure 5

As expected, Figure 5 shows that the older articles are both generally cited and viewed more times in total than more recent articles.

Is There a Relationship Between Cites Received and Total Galley Views?

The relationship between the two variables citations and total galley views can be examined visually in Figure 6. The scatterplot enables us to assess graphically the degree of relationship between the characteristics being measured, and in this case we can appreciate a median to high relation between the number of citations and the total galley views.

Figure 6

Using Pearson-product moment correlation, a statistically significant correlation was found between citations received and total galley views (r = 0.621, p = 0.01); this indicates a positive and strong relationship between the two measures. Starting from the premise that most coefficients of correlation in social research are around +0.50 or less (Cohen, Manion, & Morrison, 2007) the relation could be considered high.

The Pareto Principle: 80-20 Rule

Finally we were interested in investigating if citations per paper were very unequally distributed across articles and if they followed the Pareto principle. The Pareto principle (also known as the 80–20 rule, the law of the vital few, and the principle of factor sparsity) states that, for many events, roughly 80% of the effects come from 20% of the causes” (Newman, 2005).

Figure 7

In our example, that Pareto principle applies as seen in Figure 7, in that 80 of the 245 with at least one citation in the 401 papers published (20%) account for 2,714 of 3,371 (80.5%) of citations as is proposed by Newman (2005). This phenomenon means that, as in many other contexts, the vast majority (over 80%) of the total scientific production of the IRRODL journal is accounted for by only 20% of articles published. The total galley views follows the Pareto principle as well – 177 of 812 (21.7%) articles, reviews, notes and editorials published account for 1,973,864 of 2,466,137 (80.0%) views. Fortunately, however the long tail of Internet access (Anderson, 2004) coupled with extensive online search services, allows identifying and retrieving all of the hundreds of articles produced – even those with few citations or downloads.

Conclusion

According to Shih, Feng and Tsai (2008), “articles with more citation frequencies are usually those that are better recognized by others in related fields. They probably present more fundamental ideas about the issues for future research” (p. 960). Thus, this research identifies these most cited articles as being important works as measured by their citation by other researchers. As importantly, the data shows a strong correlation between this rate of citations by researchers and interest and popularity as shown by number of downloads by both researchers and practitioners. Given the large (and growing) interest in distance education globally and the importance of research in this field to both researchers and to practitioners, it is both interesting and reassuring to note the strong correlation between the two measures of importance.

We used Google Scholar for this study rather than the more established commercial indexes such as Social Science Citation Index or Scopus, as we feel that the broader coverage of Google Scholar into literature and conference proceedings not indexed by these others represents real interest by our practitioner community of distance educators. Moreover, researchers suggest a significant and positive relationship between both citation in Google Scholar and Web of Science (Ebrahim et al. 2014). We hope that Google Scholar will make efforts to be more transparent about the ways in which citations are counted, but feel that in this context, it represents the most accurate measure of an article’s impact in both the research and practitioner communities.

Although this study confirmed previous work (Zawacki-Richter, Bäcker, & Vogt, 2009), showing the majority (73%) of papers were produced from a few relatively wealthy nations, the results also indicate that scholars from many different countries are being cited in the literature and further are being viewed internationally. In addition the results show influential articles are of many types and use a variety of research methods suggesting the eclectic nature of research in this discipline. Finally, adherence to the 80/20 or Pareto principle is confirmed by this research. Twenty percent of the articles account for roughly 80% of both downloads and citations. Researchers are also increasingly involved in international collaborative projects as demonstrated by five influential papers which represent 15% of most cited papers. We hope this article helps authors from all countries to recognize the type, format, topic and data collection and analysis methods of the most influential papers, so that the quality of all articles can be improved.

Acknowledgment

Advice and comments from Dr. Tony Bates, President and CEO of Tony Bates Associates Ltd, is acknowledged with appreciation.

We want to thank the editors of the IRRODL journal for providing the internal data of article statistics.

References

Allen, M., Jacobs, S., & Levy, J. (2006). Mapping the literature of nursing: 1996-2000. Journal of the Medical Library Association, 94(2), 206-220.

Anderson, C. (2004). The Long Tail. Wired, 12(10). Recuperado a partir de http://www.wired.com/wired/archive/12.10/tail.html

Anderson, T., & McConkey, B. (2009). Development of disruptive open access journals. The Canadian Journal of Higher Education, 39(3), 71-87.

Avello Martínez, R., Rodríguez, P., & Rodríguez, M. (2014). Nivel de citación en Google Académico de las investigaciones pedagógicas publicadas en la revista Medisur, período 2008 a octubre 2013. Medisur, 12(1). Retrieved from http://www.medisur.sld.cu/index.php/medisur/article/view/2661/1371

Aylward, B. S., Roberts, M. C., Colombo, J., & Steele, R. . (2008). Identifying the classics: An examination of articles published in the Journal of Pediatric Psychology from 1976-2006. Journal of Pediatric Psychology, 33(6), 576-589.

Beckmann, M., & Wehrden, H. (2012). Where you search is what you get: literature mining – Google Scholar versus Web of Science using a data set from a literature search in vegetation science. J Veg Sci, 23(6), 1197–1199.

Blessinger, K., & Hrycaj, P. (2010). Highly cited articles in library and information science: An analysis of content and authorship trends. Library and Information Science Research, 32, 156-162.

Bornmann, L., & Daniel, H. (2008). What do citation counts measure? A review of studies on citing behavior. J Doc, 64(1), 45-80.

Cohen, L., Manion, L., & Morrison, K. (2007). Research methods in education. Abingdon, Oxon: Routledge.

Ebrahim, N.A., Salehi, H., Embi, M.A., Danaee, M., Mohammadjafari, M., Zavvari, A., Shakiba, M. & Shahbazi-Moghadam, M. (2014). Equality of Google Scholar with Web of Science citations: Case of Malaysian engineering highly cited papers. Modern Applied Science, 8(5), 63-69.

Gehanno, J., Rollin, L., & Darmoni, S. (2013). Is the coverage of Google Scholar enough to be used alone for systematic reviews. Medical Informatics and Decision Making, 13(7). Retrieved from http://www.biomedcentral.com/1472-6947/13/7

Giustini, D. (2005). How Google is changing medicine. BMJ, 33(1), 1487–1488.

Halverson, L., Graham, C., Spring, K., Drysdale, J., & Henrie, C. (2014). A thematic analysis of the most highly cited scholarship in the first decade of blended learning research. Internet & Higher Education, 20, 20-34.

Harzing, A. (2010). The publish or perish book: Your guide to effective and responsible citation analysis. Melbourne, Australia: Tarma Software Research Pty Ltd.

Harzing, A., & van der Wal, R. (2008). Google Scholar as a new source for citation analysis. Ethics in Science and Environmental Politics, 8(1), 62-73.

Kinshuk, Huang, H.-W., Sampson, D., & Chen, N.-S. (2013). Trends in educational technology through the lens of the highly cited articles published in the journal of educational technology and society. Educational Technology and Society, 16(2), 3-20.

Klein, J. (1997). ETR&D - Development: An analysis of content and survey of future direction. Educational Technology Research and Development, 45(3), 57-62.

Lee, Y., Driscoll, M., & Nelson, D. (2004). The past, present, and future of research in distance education: Results of a content analysis. The American Journal of Distance Education, 18(4), 225-241.

Newman, M. (2005). Power laws, Pareto distributions and Zipf’s law. Contemporary Physics, 46.

Rezaei, E., Navidi, I., Rokni, M., & Pourmand, M. (2012). Assessing the effect of highly cited papers on the impact factor of journals in the field of public health. Iranian Journal of Public Health, 41(12), 84-85.

Rourke, L., & Szabo, M. (2002). A content analysis of the journal of distance education 1986-2001. Journal of Distance Education, 17(1), 63-74.

Shih, M., Feng, J., & Tsai, C. C. (2008). Research and trends in the field of e-learning from 2001 to 2005: A content analysis of cognitive studies in selected journals. Computers & Education, 51, 955-967.

Torres-Salinas, D., Ruiz-Pérez, R., & Delgado-López-Cózar, E. (2009). Google Scholar como herramienta para la evaluación científica. El profesional de la información, 18(5).

Wikipedia. (s. f.). Pareto principle. Recuperado 21 de diciembre de 2013, a partir de http://en.wikipedia.org/Pareto_principle

Zawacki-Richter, O., Anderson, T., & Tuncay, N. (2010). The growing impact of open access distance education journals: A bibliometric analysis. Journal of Distance Education, 24(3).

Zawacki-Richter, O., Bäcker, E., & Vogt, S. (2009). Review of distance education research (2000 to 2008): Analysis of research areas, methods, and authorship patterns. International Review of Research in Open and Distance Learning, 10(6).

Appendix

List of top 33 most cited articles

Please see Supplementary files on the right side of the screen under the heading, Article Tools.

© Avello Martínez and Anderson