Not All Rubrics Are Equal: A Review of Rubrics for Evaluating the Quality of Open Educational Resources

The rapid growth in Internet technologies has led to a proliferation in the number of Open Educational Resources (OER), making the evaluation of OER quality a pressing need. In response, a number of rubrics have been developed to help guide the evaluation of OER quality; these, however, have had little accompanying evaluation of their utility or usability. This article presents a systematic review of 14 existing quality rubrics developed for OER evaluation. These quality rubrics are described and compared in terms of content, development processes, and application contexts, as well as, the kind of support they provide for users. Results from this research reveal a great diversity between these rubrics, providing users with a wide variety of options. Moreover, the widespread lack of rating scales, scoring guides, empirical testing, and iterative revisions for many of these rubrics raises reliability and validity concerns. Finally, rubrics implement varying amounts of user support, affecting their overall usability and educational utility. Keyword: Rubrics, Quality, Open Educational Resources, Content, Development Process, Application Context, Support


Introduction
Open Educational Resources (OER) are online teaching, learning, and research resources that can be freely accessed, adapted, used, and shared to support education (U.S. DoE, 2010). Fueled in part by the rapid growth in Internet technologies, a broad range of OER has become widely availability, providing a content infrastructure with the potential for greatly enhancing teaching and learning (Atkins, Brown, & Hammond, 2007;Borgman et al., 2008;Porcello & Hsi, 2013). For example, OER can support teachers in gaining and sharing content and pedagogical knowledge, and can provide learners access to a wide variety of resources for extending their knowledge and skills (Haughey & Muirhead, 2005;Kay & Knaack, 2007;Khanna & Basak, 2013). In addition, OER can radically change the way information is presented and the way learners engage with information. Some OER contain images, videos, or interactive content, for instance, helping to make abstract concepts more concrete, while other OER can be adapted to fit learners' different needs (Kay & Knaack, 2007).
In response, several researchers and educational organizations have been developing rubrics to help guide the judgment of OER quality. As described below, these rubrics vary widely along a number of critical dimensions. For example, the Learning Object Review Instrument (LORI) from Nesbit, Belfer, and Leacock (2007) is designed to evaluate a wide variety of OER, while the Learning Object Evaluation Instrument (LOEI) from Haughey and Muirhead (2005) is designed for school contexts. The Educators Evaluating the Quality of Instructional Products (EQuIP) from Achieve (2014) focuses on the alignment of OER with educational standards, while the rubric from Fitzgerald and Byers (2002) is targeted at inquirybased science resources. As a final example, the Achieve organization, which developed the rubric for Evaluating Open Education Resource Objects (OER rubric) in 2011 and the EQuIP rubric in 2014, provides extensive training materials for users of these two rubrics, while developers of some other rubrics do not.
In sum, different rubrics possess different characteristics and emphasize different aspects, which can lead to confusion when deciding which rubric to use for OER evaluation. Therefore, in an attempt to synthesize the state of the field, this article provides a review of existing quality rubrics for OER evaluation, and compares them along key characteristics and the kinds of support provided to users.

Defining Rubrics
A rubric provides a scoring scheme to help guide a user in judging products or activities (Moskal, 2000).
For example, rubrics are widely used in education to help guide people's evaluation of a variety of constructs, including students' writing performances, the quality of research projects, and the quality of educational resources (Bresciani et al., 2009;Custard & Sumner, 2005;Rezaei & Lovorn, 2010). Among them, a number of rubrics have been developed to evaluate the quality of OER, as people increasingly need assistance in identifying high-quality resources available on the Internet (e.g., Custard & Sumner, 2005;Haughey & Muirhead, 2005;Porcello & Hsi, 2013).
However, the use of rubrics does not always lead to improved evaluation (Rezaei & Lovorn, 2010). Indeed, an important consideration is a rubric's validity and reliability (e.g. Bresciani, et al., 2009;Jonsson & Svingby, 2007;Rezaei & Lovorn, 2010). Validity is the extent to which a rubric measures what it is purported to measure, while reliability is the extent to which the results from a rubric are consistent over time and across different raters (Kimberlin & Winterstein, 2008;Moskal & Leydens, 2000). Further, utility is another measure of the quality of a rubric, and has been promoted by many researchers (e.g., Ross, 2006;Willner, 1997).
Researchers and developers have taken different approaches to improving the performance of rubrics, such as evaluating the validity and reliability of rubrics through empirical testing, and improving the utility of rubrics by providing user support (Colton et al., 1997;Moskal & Leydens, 2000;Wolfe, Kao, & Ranney, 1998). Thus, to better understand the performance of different rubrics for evaluating the quality of OER, a review and synthesis of these rubrics along these different dimensions is needed.

Characterizing Rubrics
A rubric typically focuses on specific content, follows a particular development process, and targets at a particular application context (e.g., Arter & Chappuis, 2006;Moskal, 2000;Moskal & Leydens, 2000). Thus, we analyze OER quality rubrics following these three aspects.
The content aspect focuses on how the rubric deconstructs overall OER quality. In some rubrics, OER quality is defined in terms of multiple dimensions, where, in turn, each quality dimension can be comprised of one or more quality indicators. For example, the rubric from Pérez-Mateo, Maina, Guitert, and Romero (2011) first identifies three dimensions of OER quality -content, format, and process, and then proposes 42 indicators for these three dimensions. In contrast, other rubrics only identify quality indicators (e.g., the LORI). The content aspect also addresses whether rubrics have a ratings scale, and/or provide a detailed scoring guide.
Second, the rubric development processes examines whether rubrics have reported empirical testing results, and whether they have been iteratively improved. Third, the application context aspect examines whether rubrics apply to a variety of OER, or are specific to a particular website or discipline, and whether the rubrics were designed for human or machine use.

Exploring Support for Rubric Users
Simply creating a rubric is not sufficient. A rubric is also expected to be usable and to improve evaluation.
The utility and effectiveness of a rubric depends on a variety of factors. For example, Colton et al. (1997) and Wolfe et al. (1998) noted that appropriate training about features of the rubric and strategies for using a rubric can improve its validity and reliability. Similarly, Rezaei and Lovorn (2010) argued that without appropriate user support, the use of rubrics may not necessarily improve the reliability or validity of assessment.

Methods
We conducted a search for rubrics designed to evaluate OER over a six-month period, ending in April 2014. In particular, we searched ERIC, Education Full Text, Library Literature and Information Sciences, and Google Scholar, using different combinations of the following descriptors: quality, open educational resources (OER), rubrics, evaluation, judgment, and assessment. We also identified articles by reviewing the reference lists of existing rubrics, and getting recommendations from other researchers. These different strategies led to more than 200 articles. However, as many resulting articles did not actually propose a rubric and some resulting rubrics were not designed for evaluating OER quality, we established the following inclusion criteria.
To qualify for inclusion in this review, each quality rubric had to meet four criteria: 1) it evaluated quality instead of relevance, or other features; 2) it focused on the OER instead of associated programs, services, or instruction; 3) it included a concrete rubric; 4) it provided an introduction or an explanation of the rubric; and 5) it was published later than 2000. Fourteen rubrics were ultimately included in our analysis.
The 14 rubrics were analyzed in two stages. The first stage examined rubrics in terms of the content, development process, application context framework. The second stage examined how rubrics provided user support.

Rubric Characteristics
The Appendix shows a summary of the 14 selected rubrics following the three aspects described above, and Table 1 shows frequencies of the various aspects in the selected rubrics. These results are explained in more detail below: Second, all rubrics choose to deconstruct the quality construct into many dimensions and/or indicators. Specifically, as shown in Table 1, about one third of the rubrics (36%) deconstruct quality into dimensions  Third, these rubrics share some common indicators. For example, many rubrics include "content quality" as a dimension for evaluating OER. Several rubrics elaborate this construct into indicators such as 22 completeness, clarity, and accuracy (Fitzgerald et al., 2003;Nesbit et al., 2007;Leary, Giersch, Walker, & Recker, 2009). In contrast, some rubrics also contain unique indicators. For example, the rubric from Haughey and Muirhead (2005) uniquely includes "value" as an important quality indicator of OER, emphasizing that OER should be appropriate for students with diverse needs, languages, and cultures. In addition, the rubric from Wetzler et al. (2013) is the only rubric that considers "sponsor" of OER as an important factor comprising OER quality.
Fourth, the use of rating scales in these rubrics differs. Among the 14 rubrics, five do not specify a rating scale at all. For the remaining nine rubrics, two adopt a binary (yes -no) rating scale. The other seven rubrics adopt a four-point or five-point rating scale, which may provide users with more discriminating power when rating OER.
Fifth, only the OER rubric from Achieve (2011), the EQuIP rubric from Achieve (2014), and the LORI from Nesbit et al. (2007) provide detailed scoring guides. These scoring guides list the steps to carry out the assessment, identify different requirements for different points on the scale, and/or provide examples in order to help users provide more accurate ratings. Note that the provision of rating scales and scoring guides can facilitate users' application of these rubrics and thereby improve the performance (e.g., validity, reliability) of these rubrics (Barkaoui, 2010;Yuan, Recker, & Diekema, , 2015).

Rubric development process.
The rubrics were all developed through reviewing existing materials and prior studies, and thus not designed in a void. For example, Vargo, Nesbit, Belfer, and Archambault (2003) design and propose the LORI by reviewing work aimed at evaluating the quality of textbooks, courses, and instructional programs. However, among the 14 rubrics, only six have reported empirical results regarding their validity, reliability, or utility. Such studies can help increase the credibility of these rubrics as well as provide data for iterative improvement of the rubrics.
Finally, three rubrics -the OER rubric from Achieve (2011), the LORI from Nesbit et al. (2007), and the EQuIP rubric from Achieve (2014) -have been revised several times, while the remaining have not.
Application context. The 14 rubrics can be categorized in terms of their application contexts.
As shown in Figure 1, seven rubrics (e.g., the OER rubric from Achieve (2011)) are very generic, and thus can be used to evaluate a variety of OER across different resource types and different subject domains. Another five rubrics are specific to particular websites (e.g., the rubric from Leary et al. (2009) applies to the Instructional Architect website). Two rubrics are specific to particular subject domains. In particular, the rubric from Fitzgerald and Byers (2002) is specific to the science domain, while the EQuIP rubric from Achieve (2014) is specific to math, literacy, and science.
Another important consideration is the target user: human or machine. Two rubrics -the rubrics from

23
In summary, despite of some commonalities, these rubrics show a wide diversity in terms of their content, development processes, and application contexts, which give users many choices. However, a lack of rating scales, scoring guides, empirical testing, and iterative revisions for many rubrics calls into question issues surrounding rubric reliability, validity, and utility.

Rubric Support for Users
Our analysis revealed that the rubrics provide a variety of support structures for users. First, many rubrics make themselves easily accessible. For example, after its release at no cost, the LORI was cited by many educational organizations, and consequently used by more teachers (Akpinar, 2008;Nesbit & Li, 2004).
Further, the LORI was referred to and studied by many researchers, which in turn lead to suggestions for revisions of this rubric, as well as a series of new rubrics (Haughey & Muirhead, 2005).
Second, many rubric developers solicited user input. For example, the rubric from Pérez-Mateo et al.
(2011) asked potential rubric users to suggest a set of indicators that can be used to evaluate OER quality. In particular, based on 114 participants' responses to an online survey, and validated with existing literature, Pérez-Mateo et al. (2011) identified 42 quality indicators, including indicators such as adequacy, consistency, and effectiveness. As another example, the EQuIP rubric from Achieve (2014) asked teachers to submit teaching resources, and a review panel reviewed and provided feedback using their rubric.
Third, some rubrics provide training materials. The Achieve organization provides training materials in PDF, PowerPoint, and video format for both the OER and the EQuIP rubrics. For example, the training materials associated with the OER rubric include a detailed handbook, which introduces the rubric, lists the steps for using the rubric, and provides links to examples. Additionally, rubric developers provide videos explaining the rating scale and showing how to apply rubrics in authentic situations.
Fourth, some rubric developers received support from government and other educational organizations.
The Achieve organization, for example, collaborated with several U.S. states (e.g., California, Illinois). In particular, the Achieve organization introduced the basic concepts of OER to different states, developed recommendations for states on the use of rubrics for evaluating OER, helped them develop relevant strategies, and assisted them in implementing these strategies. In this way, the Achieve increased the awareness of OER in these states, which made the use of its rubrics very popular.

Conclusion
This article first reviewed existing quality rubrics in terms of their content, development processes, and application contexts. For content, even though these rubrics were all comprised of a set of quality indicators and shared similar indicators, different rubrics had different emphases and some contained unique indicators. Over 35% of the reviewed rubrics did not provide a rating scale, and only a small 24 proportion offered detailed scoring guides. In terms of development processes, over 50 % of the reviewed rubrics had not reported results from empirical testing, and only a small proportion of the rubrics had been revised several times. Lastly, the application contexts of rubrics differed in that some rubrics were more generic than others, and some rubrics were designed to support automated evaluation.
Thus, the research revealed a complex picture. On the positive side, rubrics showed great heterogeneity in the three aspects, providing users with multiple options and for diverse educational applications. On the negative side, only some rubrics provided rating scales, scoring guides, or empirical testing results. As shown in previous studies, the use of rating scales and scoring guides can improve users' evaluation reliability, and empirical testing and the iterative revision of rubrics allowed rubric developers to increasingly improve rubrics ' validity, reliability, and usability (Akpinar, 2008;Barkaoui, 2010;Moskal & Leydens, 2000;Vargo et al., 2003). Thus, the absence of rating scales, scoring guides, empirical testing, and iterative revisions in many rubrics raises concerns about their overall utility.
Additionally, this article reviewed the kinds of user support provided by these rubrics, and revealed that these supports came in various forms, such as providing training materials and soliciting user input. These supports were intended to increase the application scope and educational value of these rubrics, and thus are recommended as an important component of future rubric development.
The significance of this research lay in the following aspects. First, it revealed the current state of existing rubrics for evaluating OER quality, showed the characteristics of these rubrics, and identified what supporting structures were provided to users. Thus, it provided a basis for future research on rubrics.
Second, this review indicated that the quality and educational utility of OER depended not only on the content quality (e.g., accuracy, clarity) of resources, but also on the pedagogical values contained in these OER. In particular, these rubrics highlighted some pedagogical guidelines, such as aligning with standards, identifying appropriate learners, and showing the potential to engage learners. Thus, when using these rubrics to choose appropriate OER, users need to consider these pedagogical guidelines.
Third, the findings could help users in selecting appropriate rubrics for OER evaluation tasks. In particular, users can choose rubrics whose content, development processes, and application contexts align most with the purpose of their evaluation. For example, users may consider whether they intend to evaluate resources for a particular discipline or for multiple disciplines, whether they wish to focus more on the reusability of resources or on the alignment of resources with educational standards, whether they need detailed scoring guides or more freedom to make choices, and whether they need training and/or support from rubric developers.
Fourth, this research could facilitate the future development of rubrics by clarifying what is common and what is lacking in existing rubrics. For example, rubric developers should consider testing and iteratively revising their rubrics as well as provide a rating scale, scoring guide, and other training material.

25
Finally, this research contributed to the improvement of OER quality. In particular, our research distilled common indicators for evaluating quality in OER, and helped illustrate what dimensions/aspects of quality have not been met by a certain resource, thereby providing suggestions on how to improve the OER.
Rubric to 8 quality • The content of the learning object is accurate.
• The use of technology is appropriate for this content.
• The content is presented clearly and professionally (spelling/grammar, et cetera).
• Appropriate academic references are provided.
• Credits to creators are provided.
• There are clear learning objectives.
• The learning object meets the stated learning objectives. •