Cross Validating a Rubric for Automatic Classification of Cognitive Presence in MOOC Discussions

As large-scale, sophisticated open and distance learning environments expand in higher education globally, so does the need to support learning at scale in real time. Valid, reliable rubrics of critical discourse are an essential foundation for developing artificial intelligence tools that automatically analyse learning in educator-student dialogue. This article reports on a validation study where discussion transcripts from a target massive open online course (MOOC) were categorised into phases of cognitive presence to cross validate the use of an adapted rubric with a larger dataset and with more coders involved. Our results indicate that the adapted rubric remains stable for categorising the target MOOC discussion transcripts to some extent. However, the proportion of disagreements between the coders increased compared to the previous experimental study with fewer data and coders. The informal writing styles in MOOC discussions, which are not as prevalent in for-credit courses, caused ambiguities for the coders. We also found most of the disagreements appeared at adjacent phases of cognitive presence, especially in the middle phases. The results suggest additional phases may exist adjacent to current categories of cognitive presence when the educational context changes from traditional, smaller-scale courses to MOOCs. Other researchers can use these findings to build automatic analysis applications to support online teaching and learning for broader educational contexts in open and distance learning. We propose refinements to methods of cognitive presence and suggest adaptations to certain elements of the Community of Inquiry (CoI) framework when it is used in the context of MOOCs. This research offers several theoretical and practical implications. We have reported on a process of classifying cognitive presence using more coders and a larger dataset to cross validate an adapted coding rubric in a target MOOC. The overall result reveals good inter-rater reliability, indicating that the adapted rubric remains stable for classifying cognitive phases in MOOC discussions by more coders and with larger datasets. We have then dug deeper into the messages where coders disagreed between adjacent cognitive phases. The possible causes of the ambiguous categorisation between adjacent cognitive phases could be theoretical insufficiencies of the CoI, MOOC learners’ informal writing styles, and the changes of data size in MOOCs. We envisage that negotiation may be the additional phase between exploration and integration where most disagreements occurred. Our findings can inform the ongoing refinement of the CoI framework and provide a foundation for an approach to developing automatic analysis of educator-learner dialogue at scale.


Introduction The Problem
In this paper, we argue that the existing empirical inquiries and theoretical frameworks for analysing learning engagement in conventional for-credit university courses are limited when analysing learning engagement in massive open online courses (MOOCs), due to the differences in their educational contexts.
MOOCs differ from traditional, smaller-scale online courses in terms of the course design, with shorter course durations, limited direct educator involvement , a wide range of learner profiles, and diverse learner motivations (Alario-Hoyos et al., 2017). During the COVID-19 pandemic, MOOCs gained much more attention in addressing the limitations of remote learning, as they provide learners with a diverse range of educational experiences (Buchem et al., 2020;Cha & So, 2020). MOOC educators require support to monitor and moderate learner progress in these massive audiences, and MOOC learners require responsive, high-quality feedback and remediation from educators. Automatic classification of cognitive presence (to supplement automated feedback which is currently the norm) can provide such pedagogical support. A vital foundation for such a classification system is a reliable theoretical basis.
The asynchronous discussion forum in MOOCs offers a virtual zone for participants to interact mutually through written dialogue. These written conversations, or messages, provide educators and researchers with meaningful insights into learners' critical discourse (i.e., critical thinking, higher-order thinking, and cognitive presence). However, experimental studies are required to validate and improve the methods of critical discourse analysis in online discussion forums for broader educational contexts, such as MOOCs (Amemado & Manca, 2017;Kaul et al., 2018). Garrison et al. (1999Garrison et al. ( , 2001 proposed the community of inquiry (CoI) framework and its coding rubrics to evaluate cognitive presence and two other dimensions in online transcripts by content and textual analysis methods. Over two decades, this framework has been broadly used to assess students' learning and guide learning designs in traditional, online, for-credit, smaller-scale courses (Liu & Yang, 2014;Sadaf & Olesova, 2017). There are still shortcomings for the CoI coding rubrics to reliably assess critical discourse in MOOCs. For example, clear instances of the cognitive presence phases have not been elucidated by Garrison et al. (2001); therefore, researchers have had to revise the rubric each time they used it (Rourke & Kanuka, 2009). Also, the coding rubric was initially developed as a descriptive, qualitative analysis method in smaller-scale courses rather than as a quantitative, inferential procedure (Garrison, 2007). Moreover, online discussion consists of an informal and conversational flow, which is relatively chaotic and does not fit into the coherent patterns in the CoI framework (Xin, 2012).

The Significance
The validation of a cognitive presence rubric in MOOCs is a crucial foundation for developing automatic approaches for real time learning support and remediation. Preparing reliable and valid machine learning data sets is an essential prerequisite for training automatic classifiers (Ullmann, 2019). Analysing the common language patterns in the data sets is beneficial for selecting appropriate machine learning algorithms and predictive features (Mladenić, 2010). Automatic evaluation of learners' cognitive presence in MOOCs can help educators monitor the learners' progress in real time and provide personalised feedback at scale. From a learner's perspective, effective and efficient feedback from both educators and peers encourages high participation in MOOCs, assisting students to achieve their learning goals (Phan et al., 2016).

Purpose and the Research Questions
This study aims to cross validate the use of an adapted coding rubric (Hu et al., 2020) to categorise online discussion transcripts from a target MOOC into phases of cognitive presence. A larger dataset and more coders were involved in examining whether the inter-rater reliability could still reach excellent agreement, as reported in the previous study using less data and fewer coders (Hu et al., 2020). The disagreements between coders were also deeply analysed to gain insights about the feasibility of the cognitive presence phases in the CoI framework. Our main research question was: Is the adapted coding rubric a reliable tool to classify cognitive presence in MOOC discussions? The following sub questions were included to guide the main research question:

SQ1.
What are the inter-rater reliability values when we classify the discussion messages from a target MOOC with more coders and a larger dataset than in the previous study (Hu et al., 2020)?
SQ2. What is the proportion of disagreements across all cognitive phases between coders, and what are possible causes of the disagreements?

Cognitive Presence
The Community of Inquiry (CoI) framework proposed by Garrison et al. (1999) has been most widely used to analyse learning in online discussions for over two decades. It describes the educational experience that occurs in a learning community, in which "a group of individuals who collaboratively engage in purposeful critical discourse and reflection construct personal meaning and confirm mutual understanding" (Garrison & Anderson, 2011). The CoI framework has three dimensions, called presences: cognitive, social, and teaching presence. Cognitive presence, a primary dimension of the CoI, represents the critical reflection of knowledge (re)construction and problem-solving processes in the learning community (Garrison et al., 2001). Social presence manifests as social communication and emotional interaction between participants, which enriches learning outcomes. Teaching presence, the third dimension, describes the purposeful activities that direct and intervene with the learner's knowledge construction.
This study focused on analysing cognitive presence in MOOC discussion transcripts since the cognitive presence is the "primary issue" of students' learning evidence to be explored before other dimensions (Rourke & Kanuka, 2009). The other two presences of the CoI will be investigated in further research. We adopted the associated four phases of cognitive presence, (a) triggering event, (b) exploration, (c) integration, and (d) resolution from Garrison et al. (1999Garrison et al. ( , 2001. The four phases (Figure 1), called the practical inquiry model, were borrowed from Dewey (1933) who originally described the steps of a complete thought. The definition of the four cognitive phases corresponding to the analysis of discussions in our target MOOC is explained below.

Analysis of Cognitive Presence Phases
Scholars have applied the classification rubric developed by Garrison et al. (2001) and the revised version (Park, 2009) to assess the quality of critical discourse in multidisciplinary online courses. More recent studies have used transcripts data from for-credit, online university courses, but few so far have used transcripts data to analyse cognitive phases in MOOCs. Note. a The Park study included two trials. Garrison et al. (2001) first proposed their CoI model and reported their manual classification rubric with data from two online graduate-level courses. One course was on workplace learning and the other on health promotion. The former dataset (21 messages) was used to fine tune the measurement rubric, and the latter (24 messages) was used to report the inter-rater reliability result. They reached an agreement of 83.33% and Cohen's κ coefficient (Cohen, 1960) of 0.74 between two coders. It indicated that this classification rubric could be used to evaluate the quality of cognitive presence. However, the sample size was too small (i.e., under 100 messages) and would need further verification with larger, more diverse learner datasets to be sufficiently generalisable.
Two studies made their measurement unit at the message level following Garrison et al.'s (2001) method.

Differences Between the Prior Work and This Study
The cognitive presence analysed in most of the studies we viewed was in the context of smaller-scale, forcredit university courses. Garrison et al.'s (2001) classification rubric preceded MOOCs by seven years. The first MOOC was developed in 2008 (Siemens, 2013). The wide range of learner demographics and diverse learner motivations cause the differences between MOOCs and traditional university courses. The typical MOOC audiences are mature adult learners who are employed and have tertiary qualifications (Dillahunt et al., 2014). Their motivations for learning are updating knowledge, personal curiosity, and upskilling themselves professionally (Alario-Hoyos et al., 2017). These differences may also impact the language they use in online discussions. For instance, students tend to write formally when participating in discussion forums in smaller-scale courses and in professional development MOOCs that are credit-bearing. In contrast, many MOOC learners tend to use a more conversational style of writing when they engage with MOOCs for less formal purposes than professional development or accreditation. We wondered whether the differences in educational contexts would impact the analysis of cognitive presence in discission transcripts.
The reliability reported in most of the reviewed studies was based on two coders. Although six coders were employed in McKlin's (2004) study, they only labelled 100 same messages. Similarly, in Neto et al.'s (2018) study, a third coder was only responsible for resolving disagreements (129 messages) between the other two coders. Although the previous study (Hu et al., 2020) reported excellent inter-rater reliability between two coders, the agreement outcome was reported after the coders' negotiations. We wondered whether the coding rubric of cognitive presence could be reliably applied to analyse MOOC discussions, when we enlarged the dataset, invited more coders to become involved, and reported the outcomes before coders' negotiations.

Data Description
The MOOC discussion data used in our study was from an archived run of the Logical and Critical Thinking (LCT) MOOC on the FutureLearn platform (University of Auckland, n.d.). It was an introductory undergraduate philosophy course designed and taught by a course design team at our university. This course taught basic concepts in logical and critical thinking (e.g., premises, arguments, etc.), linking those concepts with life experiences. The average number of enrolled users was approximately 11,000, and the discussion transcripts (comprising posts and their replies) included approximately 12,000 messages per course run. There were eight weekly topics with learning tasks in each course run. Firstly, sixteen tasks (two for each week) were evenly and randomly selected. Then, a sample of approximately 100 to 200 messages was randomly selected from each of the 16 tasks. We kept the entire sequential structure of each selected conversation instead of segmenting them to achieve an exact number. Totally, 1,917 messages were selected for this study.
The three coders were postgraduate students from the Philosophy Department, who were also the teaching assistants for the LCT MOOC. They were trained round by round (50 new messages for each round) before reaching agreements over 80% independently without negotiations. They reached an 81% agreement in the third round, so they were allocated to classify the 1,917 messages manually and independently based on the adapted rubric (Hu et al., 2020) The overall study proposal received ethical approval from the University Human Participants Ethics Committee.

Definition of Cognitive Presence Phases in This Research
The five categories of cognitive presence, including four processing phases and the "other" phase, are listed in Table 2. We provide a brief definition and a message example from the LCT MOOC for each category.
These definitions are derived from Garrison et al. (2001), Hu et al. (2020) study, and learner messages in the LCT MOOC, and therefore influenced by the disciplinary context of this course, which is philosophy.
More details about the definitions can be found in Hu et al. (2020) study.

Classification Process
After they were trained, the three coders used our rubric of cognitive phases to classify the sample data (1,917 messages) independently. The unit of analysis was on the message level since the classification on the theme and sentence level may have ignored the contextual information before and after the segment (i.e., theme or sentence) within an entire message. Multiple phases of cognitive presence sometimes existed in one message simultaneously, for example, when a learner stated, diagnosed, and resolved a question in a single post. Our coders labelled each message with the highest cognitive phase in that message, even when lower phases were also represented. An example message of this is: Initially, I too felt the conclusion might be that a revolt was required. However, the letter states that a revolt would be a way to get the council to listen. It's the same as cider vinegar would be a way to get rid of your dogs fleas. Therefore, it's a statement rather than an argument.
The first sentence indicates the learner's difficulty, which can be categorised as triggering event. The second sentence illustrates that the learner provided more information for diagnosing the difficulty, which can be the exploration phase. Then, the last two sentences made an analogy and drew a conclusion supported by the reasons stated previously, which can be labelled integration. In this case, the message was classified into the highest phase, integration, rather than the other two phases. More details about the coding rubric and message examples can be found in the adapted rubric of cognitive presence in MOOCs (Hu et al., 2020).

The Overall Analysis of Cognitive Phases
The overall percentage agreement was 77.15%, where all three coders independently agreed on the labels for 1,479 of the 1,917 messages. The average Fleiss' κ (a statistical measure for categorical ratings between more than two coders) was 0.763, shown in Table 3. Across five categories of cognitive presence, the triggering event phase accounted for the highest agreement (κ = 0.828), followed by the phases "other" and exploration. There was less agreement among the coders on the higher cognitive phases of integration and   Table 4 illustrates the proportion of the five cognitive phases in the messages of agreement between coders.
The phase of exploration accounted for most of the messages (55.46%), which far surpassed triggering event and integration. The highest phase, resolution (2.43%), and the lowest "other" phase (5.75%) had the smallest proportion of messages.

Disagreements Between Coders
In addition to agreements between coders, the distribution of disagreements (i.e., messages that were labelled differently by the coders) is worth considering. Table 5 describes the proportion of the disagreements between the three coders across different combinations of cognitive phases. Most of the disagreements had two labels rather than three (the latter was less than 1.5% in total). The proportion of inter-rater disagreements (96.13%) between adjacent cognitive phases far surpassed the non-adjacent combinations (i.e., exploration and resolution, and "other" and exploration). Another way to understand the distribution of the agreements and disagreements data is by using the five contingency tables in Figure 2, which align with Table 4 and Table 5. Figure 2 describes the distribution of messages classified by coder 1 and coder 2 when coder 3's labels ranged from the "other" phase to resolution, respectively. The blue cell on the diagonal of each table (e.g., 85 on the first table) represents the number of agreements between the three coders in each cognitive phase. The red cells (e.g., 38, 3, and 2 on the first table) demonstrate the number of disagreements.

Distribution of Five Cognitive Phases Between Three Coders
Note. Blue indicates the number of agreements among the 3 coders. Red indicates the number of disagreements. The red-colour scale is used to represent the disagreement cells. The larger the number, the darker the cell.

Validation of the Adapted Coding Rubric-SQ1
The results (agreement of 77.29% and Fleiss' κ of 0.763) answer our first sub-question and show that a reliable inter-rater agreement was reached between the three coders in this research, as over 0.75 represents excellent agreement (Fleiss et al., 2003). The exploration phase accounts for most of the messages, and the resolution phase accounts for the least. This distribution rate is similar to the proportional results of cognitive phases from the reviewed studies (Kaul et al., 2018;Kovanović et al., 2014;Neto et al., 2018;Park, 2009). These discussion messages were reliably classified through the manual categorisation process, providing us with a clean training data set to develop automatic classifiers in our future work.

Disagreements Between Coders and What Caused the Disagreement-SQ2
Analysing the disagreements between coders (438 messages) can help to answer our second sub-question.
Most of the three coders' disagreements appeared on adjacent phases of cognitive presence (Table 5 and Figure 2), in line with the findings in the previous study (Hu et al., 2020) with two coders. We analysed the common language patterns which may have caused disagreements between the coders.

Common Language Patterns of Messages Between Adjacent Phases
After reviewing the 66 messages (Table 5) which the coders labelled as "other" and triggering event, we came up with two possible reasons that may have caused these disagreements. First, messages with incomplete segments or concise sentences may have made it difficult to grasp the writer's purpose and meaning. There were ambiguous interpretations from different coders, even when they checked the previous and subsequent messages. The message instances were (a) "That works for me," and (b) "I am guessing…". Second, sentences using the structure "I like…" may have caused confusion. The verb like could be defined as either "I agree with you" or "appreciate". The instances were (a) "I like your worded comment.
Nice!" and (b) "I like the questions you mentioned". The former could have been an indicator, "simple agreement," of a triggering event, whereas the latter could have been a predictor of social expression, which is part of the "other" phase.
The most common pattern reflected in the messages with both triggering event and exploration labels (78 messages as shown in Table 5) was the use of questions to deliver outside information or make personal claims. These language patterns confused our coders. "Ask questions" is a core indicator of the triggering event phase; however, learners can also propose ideas using sentences ending with question marks, such as However, in online discussions in smaller-scale, for-credit courses, the opposite can be true, and the number of question marks in a message may be reliably used as a predictor when building automatic classifiers (Kovanović et al., 2016).
The central debates in the 227 messages (Table 5) labelled both exploration and integration had two aspects.
First, messages that contained conclusions with reasons raised a dispute about whether the supporting ideas were sufficient. A significant criterion to differentiate integration from exploration is that the message should reach "a coherent conclusion" by offering "sufficient substantiation" in the classification rubrics (Garrison et al., 2001;Park, 2009). This criterion is subjective and domain specific. Messages that provide solutions and implicit conclusions ending with a tentative phrase imply more of a "suggestion for consideration" (which should be labelled exploration), rather than sufficiently supported integration. For example, in our study, we saw this message: "I do not think it is an argument because a rates revolt is only a suggestion. The writer states that it is one way to make the councillors listen but does not say this strategy should be adopted". Two coders thought this message firstly disagreed with the previous message, and that the writer then proposed his/her opinion ("not an argument" with the supporting reasons [rest of the message]), and therefore, it should be labelled integration. However, the third coder thought it was just a personal opinion as suggestion for consideration without sufficient support, and labelled it exploration.
Second, messages with misleading language patterns could have impacted coders' decisions. Messages with language patterns, such as "consequently" or "both sides of" might indicate a conclusion or a "convergence" denoting the integration phase. However, such patterns could also be interpreted as a "leap to a conclusion" or a claim without supporting ideas, meaning that the messages should be classified in the phase of exploration. An example of such a message is, "Consequently, both sides of the arguments are equally compelling but have their share of fallacies. It depends on each person's confirmation bias to weigh a particular argument heavier than the other." These misleading language patterns tell us that some phrases and expressions can only be a possible predictor but not absolute evidence for classifying cognitive phases.
Most of the messages that were labelled as part of both the integration and resolution phases disputed whether the supporting ideas of new constructs were sufficient enough. This debate is very similar to the debates on distinguishing integration from exploration as discussed in this section.

Understanding the Reasons for the Disagreements
We found the bulk of the disagreements occurred between the exploration and integration phases. It may be because: (a) the proportion of messages in these two categories was much larger than in the other categories; (b) exploration and integration appear during the middle of a critical thinking activity, which tends to greater uncertainty, rather than at the beginning (awareness of a question) or the conclusion (outcomes after evaluation) stage; or (c) the criteria and instances of these two categories are ambiguous in the CoI framework, which is consistent with Rourke and Kanuka's (2009) critique about the lack of clear instances in Garrison et al.'s cognitive presence rubric. These reasons can also be connected with other critiques of the CoI. Garrison et al. (2001) borrowed from Dewey's (1933) five steps in reflective thinking to propose the four phases of cognitive presence. Still, they did not develop and elaborate on the theoretical foundations of Dewey's model (Jézégou, 2010). Garrison et al. (2001) merged the second ("diagnosis of a question") and third step ("suggestion of possible solution") from Dewey's (1933) model into the exploration phase. They renamed the fourth step ("elaboration of an idea by reasoning") as integration and the last step ("corroboration to form a concluding belief") as resolution. With respect to Dewey's model, the ambiguity of disagreements between exploration and integration occurred mainly when trying to distinguish "a suggestion of possible solution" (assigned to exploration) from the "elaboration of an idea by reasoning" (assigned to integration). In contrast, messages in the "diagnosis of a question step" (assigned to exploration) were easier for the coders to identify. Thus, we question whether the exploration phase should be separated back into the diagnosis step and a suggestion of possible solution step as defined in Dewey's (1933) model. Henri and Lundgren-Cayrol (2005) proposed three phases of a collaborative learning approach for knowledge construction (e.g., exploration, elaboration, and evaluation), which intersect with the cognitive presence phases (Jézégou, 2010). The elaboration phase is positioned between exploration and evaluation, which is similar to the integration phase in cognitive presence. Henri and Lundgren-Cayrol (2005) also proposed two subcategories in the elaboration phase: negotiation and validation. The negotiation subphase refers to the learning processes that consider and collect other people's ideas to form diverse proposals of knowledge, and the validation sub-phase denotes consensus on the knowledge, reflecting multiple views (Henri & Lundgren-Cayrol, 2005). In this regard, the validation sub-phase is equivalent to the integration phase in cognitive presence. Interestingly, most of the ambiguous messages between exploration and integration in our sample data could be assigned into a negotiation sub-phase. For example, one learner compared two opinions from previous comments and generated her/his own statements in a message, but the statements had not been supported by sufficient reasoning, which meant the message was more exploration and not yet integration. This example fits well into negotiation. Therefore, there may be a negotiation sub-phase between a "considerable solution" step (assigned to exploration) and a "consensus idea by reasoning" step (assigned to integration). This would be an additional phase in the cognitive presence schema.
Apart from ambiguities of language patterns in MOOC discussions and insufficiencies of the CoI framework, another significant factor that caused the disagreements between all the adjacent phases was the increase in the data sample size from a small scale to a vast magnitude. Using the taxonomies to categorise cognitive processes works well on a smaller scale. In comparison, the likelihood of outliers increases when researchers apply the taxonomies developed from a smaller-scale dataset to classify vastly larger samples (Mayer-Schönberger & Cukier, 2013). Also, online discussions have an informal, conversational flow that is relatively messy, and does not fit into the ordered phases in the CoI (Xin, 2012). We assume that investigating the general trend of cognitive processes within the messiness of communication in the myriad MOOC transcripts is more valuable than using rigid classification methods.
Categorising learners' discussion transcripts into single-label cognitive phases tends to be subjective and inaccurate. One possible solution is to label the MOOC discussion messages into multiple cognitive presence phases with confidence levels. Another solution would be to label the messages by multiple models of learners' critical discourse simultaneously. For example, Farrow et al.'s (2021) study applied both cognitive presence and the ICAP framework (Chi & Wylie, 2014). These methods could provide a richer portrait for the interpretation of learners' dialogue by different coders using different frameworks and would reflect the diverse variation in the discourse more authentically.
In response to our main research question (Is our adapted coding rubric of cognitive presence a reliable tool to classify MOOC discussions?), we conclude that although the adapted rubric of cognitive presence is a statistically reliable tool to classify the discussion messages from the LCT MOOC by three coders, an additional phase (negotiation) could be included to improve the rubric to accommodate the predominant disagreements between coders.

Limitations
We acknowledge the limitation that a classification rubric of cognitive presence developed for one discipline might not be generalisable to other domains. There are disciplinary differences in the expression of critical reflection and its assessment in the pedagogical designs of different courses. The evaluation of cognitive presence that is mainly based on textual information can be highly domain specific. We are aware that the research findings might only be reliable and valid for the classification of cognitive presence in our target MOOC, done by three coders.

Conclusion and Implications
This research offers several theoretical and practical implications. We have reported on a process of classifying cognitive presence using more coders and a larger dataset to cross validate an adapted coding rubric in a target MOOC. The overall result reveals good inter-rater reliability, indicating that the adapted rubric remains stable for classifying cognitive phases in MOOC discussions by more coders and with larger datasets. We have then dug deeper into the messages where coders disagreed between adjacent cognitive phases. The possible causes of the ambiguous categorisation between adjacent cognitive phases could be theoretical insufficiencies of the CoI, MOOC learners' informal writing styles, and the changes of data size in MOOCs. We envisage that negotiation may be the additional phase between exploration and integration where most disagreements occurred. Our findings can inform the ongoing refinement of the CoI framework and provide a foundation for an approach to developing automatic analysis of educator-learner dialogue at scale.
This study also has practical implications. For preparing the machine learning datasets, we suggest using multiple-label instead of single-label classification to analyse learners' cognitive presence in MOOC discussions. This takes into account learners' informal language usage in MOOC discussions. It provides learning analytics researchers with some hints for choosing algorithms and predictive features in the study of automatic cognitive analysis. For example, better prediction performance may be achieved using corpora that include both informal speech and formal writing texts to train and generate the numeric representations of discussion messages that are fed into machine learning algorithms (e.g., neural networks). Also, the computational linguistic tools (e.g., Coh-metrix), which were created to assess formal essay writing (McNamara et al., 2012), may not be appropriate for analysing MOOC discussion messages.
The application of a reliable and smart automatic classifier for analysing the processes of critical discourse in open and distance learning at scale can potentially (a) enable learners to self-evaluate their learning, to complement the automatic learner grading systems, (b) be used to inform the design and adaption of course content, and (c) assist the assessment of educator-learner online dialogue efficiently in real time.