A Construct Revalidation of the Community of Inquiry Survey: Empirical Evidence for a General Factor Under a Bifactor Structure

The study revisited the community of inquiry (CoI) instrument for construct revalidation. To that end, the study used confirmatory factor analysis (CFA) to examine four competing models (unidimensional, correlated-factor, second-order factor, and bifactor models) on model fit statistics computed using parameter estimates from a statistical estimator for ordinal categorical data. The CFA identified as the optimal structure the bifactor model where all items loaded on their intended domains and the existence of the general factor was supported, essentially evidence of construct validity for the instrument. The study further examined the bifactor model using mostly model-based reliability measures. The findings confirmed the contributions of the general factor to the reliability of instrument scores. The study concluded with validity and reliability evidence for the bifactor model, supported the model as a valid and reliable representation of the CoI instrument and a fuller representation of the CoI theoretical framework, and recommended its use in CoI-related research and practice in online education.


Introduction The CoI Theoretical Framework
The community of inquiry (CoI) theoretical framework was first laid out by Garrison et al. (2000). The framework identified elements critical for understanding the dynamics of an online learning experience and for structuring and supporting the process of online teaching and learning as well as related research (Kozan & Caskurlu, 2018;Olpak & Kiliç Çakmak, 2016;Shea & Bidjerano, 2009).
The CoI framework consisted of three interconnected constructs of collaborative constructivist learning: (a) teaching presence (TP); (b) social presence (SP); and (c) cognitive presence (CP). Here, the term presence referred to fidelity: how real were the learning and the environment where it occurred (Dempsey & Zhang, 2019). The greater the presence, the greater the fidelity, and accordingly the more realistic the learning experience was perceived to be. Each presence overlapped with the other two and all three combined within a community of inquiry (Diaz et al., 2010;Kovanović et al., 2018;Kozan, 2016;Shea & Bidjerano, 2009).

Cognitive Presence
Cognitive presence was described as a developmental model articulating the dynamics of a worthwhile educational experience (Garrison et al., 2010). It referred to the extent to which students in a community of inquiry were able to construct meaning through sustained communication and reflected the process of inquiry and learning (Bangert, 2009;Garrison et al., 2000). When operationalizing CP, Garrison et al. (2000) used the practical inquiry (PI) model reflecting the critical thinking process for creating CP (Olpak & Kiliç Çakmak, 2016). They expanded the PI model into a cycle of four phases (subconstructs) in the inquiry process which the CoI framework subsumed as categories: (a) triggering event, (b) exploration, (c) integration, and (d) resolution.

Social Presence
Social presence has focused on important issues that mold social climate in the online learning community and on the level of recognition (e.g., ability of learners to identify with the community, purposeful conversation in a trusting environment, development of interpersonal relationships) among learners during the process of communication (Garrison & Arbaugh, 2007;Kovanović et al., 2018). SP played an important role in creating an online learning environment that encouraged critical thinking (Bangert, 2009;Garrison et al., 2000). Studies have consistently shown that SP had a strong influence on students' satisfaction with online courses and with the instructor, and their perception of learning in online courses (Caskurlu, 2018).

Teaching Presence
Teaching presence referred to designing, facilitating, and directing of cognitive and social processes by the instructor to create meaningful personal learning and valuable learning outputs, and also focused on learner ratings of those actions by the instructor (Olpak & Kiliç Çakmak, 2016;Shea & Bidjerano, 2009).
There has been a growing recognition of the importance of TP for successful online teaching and learning, especially when critical thinking and discourse were required (Garrison et al., 2000). A community of inquiry provided important support to critical thinking and meaningful learning. After combining all elements of this community, TP should have served to facilitate critical discussion and learning in this environment. Finally, TP consisted of three subconstructs that the CoI framework subsumed as categories: (a) design and organization, (b) facilitating discourse, and (c) direct instruction.

The CoI Instrument and its Validation
The CoI framework was operationalized by Arbaugh et al. (2008) through developing and validating a CoI instrument which consisted of three subscales, each of which addressed one of the three presence constructs/factors in a domain. Each CoI item measured the extent to which an online course characteristic was present. Their results supported the instrument as being a valid, reliable, and efficient measure of the CoI framework. In their study, the Cronbach's α for TP was .94, that for SP .91, and that for CP .95.
Since the development of the CoI instrument, its refinement has been constantly called for and therefore continuous over the past 10 plus years in various online settings Dempsey & Zhang, 2019;Kozan & Caskurlu, 2018;Kozan & Richardson, 2014). In many refinement studies, the original correlated, three-factor structure of the instrument was largely recovered and revalidated through either exploratory, confirmatory, or both statistical methods (Kovanović et al., 2018).

Exploratory Approach
Under an exploratory approach, the validation study typically implemented either an exploratory factor analysis (EFA) or a principal component analysis (PCA) with an oblique rotation to allow the extracted factors measuring latent constructs to be correlated. Among such studies were Swan et al. (2008), Bidjerano (2009), Díaz et al. (2010), Garrison et al. (2010), and Kovanović et al. (2018). Swan et al. (2008) and Garrison et al. (2010) both conducted a PCA of the CoI responses ( = 287 master's and doctoral students and = 205 master's students, respectively) with an oblimin rotation; the results supported a three-factor solution congruent with Garrison et al. (2000). Similar results were obtained in Shea and Bidjerano (2009) with = 2,159 online students. Instead of a PCA, they used an EFA under principal axis factoring with an oblimin rotation. Díaz et al. (2010) used an enhanced version of the CoI instrument that provided a second way of rating each item and conducted a PCA with an oblimin rotation of the data from 412 students. The enhanced instrument evaluated both the extent to which an online course characteristic existed (i.e., the first rating responses to the original CoI item statements) and the importance of that characteristic, as based on the second rating responses. A new score was created by multiplying the two sets of ratings. After analyzing the multiplicative scores under the PCA, the three CoI factors were successfully recovered.
Finally, Kovanović et al. (2018) proposed tweaks to the original factor structure. They used an EFA under principal axis factoring with an oblimin rotation to analyze a large sample of 1,487 students in a massive open online course setting. They largely recovered the original three-factor structure. In their analysis, item 28 in the CP subscale cross-loaded on the SP subscale. Fortunately, the removal of this item had only a minor impact on the loadings of the other items and the overall model statistics.

Confirmatory Approach
Under a confirmatory approach, the validation study typically implemented a confirmatory factor analysis (CFA) to assess the original correlated-factor model. Among such studies were Caskurlu, (2018), Dempsey and Zhang (2019), Kozan (2016), and Ma et al. (2017). Kozan (2016) applied a CFA to validate the CoI instrument using the responses from 338 participants who were mostly online master's students. Kozan used a slightly adapted version of the survey, and the CFA model that contained the three-factor structure was successfully revalidated without any re-specification. Ma et al. (2017) added to the CoI instrument a new, learning presence proposed by Shea and Bidjerano (2010) and measured by an additional set of 14 items. The sample was 325 undergraduate students in a blended learning environment. Their findings supported the fit of the four-factor model after two rounds of CFA which led to the deletion of one item. Caskurlu (2018) conducted a CFA to investigate the factor structure of each individual presence using a dataset of 310 participants, and established that each presence itself was multidimensional and thus a higher-order construct. The analysis was run separately for each individual subscale without examining the dimensionality of the instrument as a whole.
Dempsey and Zhang (2019) revalidated the CoI instrument using a dataset from 579 online MBA students.
They experimented with multiple structures: (a) the original three-factor structure, (b) a 10-factor structure by allowing each of the three presences to be multidimensional, and (c) a higher-order factor model with three lower-order factors. The study concluded the higher-order factor model provided the best fit to the data.

Both Exploratory and Confirmatory Approaches
There are also validation studies which used both an EFA/PCA and CFA (Bangert, 2009;Kozan & Richardson, 2014;Olpak & Kiliç Çakmak, 2016;Yu & Richardson, 2015). Such studies typically had a large sample which was randomly split into two subsets with each one still being large enough for either an EFA/PCA or CFA. Then, one random subset was analyzed to explore and discover the underlying factor structure of the instrument, and the finding was next further assessed under CFA using the second random subset.
Bangert (2009) used a sample of 1,173 undergraduate and graduate students enrolled in fully online and blended courses to validate the CoI instrument. One half of the sample was randomly selected to conduct a PCA with an oblique rotation which was followed by a CFA using the remaining half of the sample. The EFA process largely recovered the original three-factor solution. Next, this recovered factor structure was confirmed by CFA with all fit statistics being satisfactory. Kozan and Richardson (2014) collected their data from master's and doctoral students who were either fully online, or face-to-face but also taking online courses. They had 219 participants for EFA and 178 participants for CFA. During EFA, they selected the three-factor solution from the promax rotation. Next, the CFA process experimented with multiple models using the second dataset and the final model exhibited a good fit. Except for several correlated errors, this final model concurred with the CoI framework. Yu and Richardson (2015) collected data from 995 undergraduate online students and split them into two approximately equal subsets. During the EFA (promax rotation) process, they experimented with two models before and after removing two items, and ended up selecting the second, three-factor model where each item loaded on its intended subscale under the CoI framework. In the CFA, the fit of the 32-item model was confirmed with excellent values on model fit statistics. Olpak and Kiliç Çakmak (2016) collected the data from 1,150 students enrolled in online courses and randomly split them into two equal groups. Under EFA, they successfully recovered the original three-factor structure. Then, in the CFA, the fit of the three-factor model was assessed multiple times and was confirmed to be excellent after, per the modification indices, allowing several item error covariances to be freely estimated.
In the end, a more complete summary of research on the validation of the CoI instrument was found in Kozan and Caskurlu (2018) and Stenbom (2018) which provided systematic reviews of such studies.

Issues with Existing Validation Work
Many CoI refinement studies have shared similar limitations. A primary limitation has been that the correlated-factor model most studies have universally relied on cannot fully describe the CoI framework by Garrison et al. (2000). Furthermore, there is room for improvement in the estimation method by which the estimates of model parameters have been derived.
First, the correlated-factor model has addressed only part of the CoI framework. Although it explicitly allowed the presences to be correlated in pairs, it did not include an intersection of all three presences. Such an inadequacy was unfortunate because the literature has repeatedly emphasized the importance of the interaction of all three presences which represents an online learning or educational experience (Caskurlu, 2018;Diaz et al., 2010;Garrison et al., 2000;Garrison & Arbaugh, 2007;Garrison et al., 2010;Kozan, 2016;Kozan & Richardson, 2014;Olpak & Kiliç Çakmak, 2016;. The literature has recommended the examination of competing models when applying CFA to scale validation (Gignac & Kretzschmar, 2017;Rodriguez et al., 2016b). There are several competing models for handling multidimensional data: (a) M1: unidimensional, single-factor model; (b) M2: correlated-factor model; (c) M3: second-order factor model; and (d) M4: bifactor model (Chen et al., 2012;Gignac & Kretzschmar, 2017;Reise, 2012;Reise et al., 2010;Reise et al., 2007;Rodriguez et al., 2016aRodriguez et al., , 2016b.  (Garrison et al., 2000;Reise, 2012). Second, corresponding to Figure 1b which consisted of three separate circles overlapping in pairs but did not include an intersection of all three circles, both the correlated-factor and the second-order models allowed each pair of presences to be correlated either explicitly (M2) or implicitly (M3). However, neither model had a factor underlying all items, therefore both lacked the area in the CoI framework shared by all three presences which represented an online educational experience.
Finally, corresponding to Figure 1c which consisted of three separate circles overlapping in pairs and an intersection of all three circles, the bifactor model incorporated both the overlap of each pair of presences and the intersection of all three presences into the general factor underlying all items, thus indicating the bifactor structure was more aligned with the CoI framework than were the other models.
Even though it could be of interest to further distinguish the pairwise correlations between presences from the interaction of all three presences, doing so would have caused additional complications to the bifactor model. Should any pair of presences be correlated, this would suggest the existence of additional, unmodeled general factors. Also, many statistics (e.g., model-based reliability statistics) used in this study would have been challenging to implement, and any improvement in model fit would have been offset by losses in model interpretability (Reise, 2012). Therefore, this study specified the bifactor model in the usual way consistent with the literature where the general and the presence domain factors were all uncorrelated with each other, without attempting to separate the overlap between each pair of presences from the intersection of all three presences by introducing additional pairwise correlations between presences (Chen et al., 2012;Reise, 2012;Rodriguez et al., 2016aRodriguez et al., , 2016b.

Alignment of the CoI Theoretical Framework and Four Competing Models
Second, previous validation studies routinely used statistical methods unable to factor into consideration the rating scale structure of the CoI item responses (i.e., ordinal categorical data). They treated the responses as if they were continuous. Such a practice has been known to have undesirable consequences: inflated 2 statistic, under-estimate of the standard error, and so on (Byrne, 2010;Kline, 2016). These problems were exacerbated when the number of categories was small (four/five categories or less) and/or the data exhibited serious skewness and kurtosis (outside range of -1.00 to +1.00 for skewness, -1.50 to +1.50 for kurtosis). Generally, to properly address the rating scale structure of such data as the CoI responses, the robust weighted least squares (WLS) method and its variants have been recommended (Rosseel, 2012).

Purpose of Study
To address the inadequacies/limitations, this study began with a CFA to construct-revalidate the CoI instrument by examining the four competing models (DiStefano & Hess, 2005). After estimating the four models and identifying the one providing the optimal fit as evidence of construct validity, the study computed more statistics for the optimal structure to complement the construct-revalidation results from the CFA. Therefore, the study proposed and addressed the following two research questions: 1. How well does each of M1 through M4 fit the CoI data as measured by commonly used model fit statistics?
2. What are the psychometric properties (e.g., validity, reliability) of the optimal model as identified above?
When addressing the two questions, the study factored into consideration the rating scale structure of the CoI responses.

Methods
The CoI survey in this study consisted of 34 five-point Likert items (see Table 1; Arbaugh et al, 2008): 1 for strongly disagree, 2 for disagree, 3 for neutral, 4 for agree, and 5 for strongly agree. The 34 items make up three subscales: (a) teaching presence (13 items); (b) social presence (9 items); and (c) cognitive presence (12 items).

Item
Item statement Subscale 01 The instructor clearly communicated important course topics. TP 02 The instructor clearly communicated important course goals. TP 03 The instructor provided clear instructions on how to participate in course learning activities.

TP 04
The instructor clearly communicated important due dates/time frames for learning activities.

TP 05
The instructor was helpful in identifying areas of agreement and disagreement on course topics that helped me to learn.

TP 06
The instructor was helpful in guiding the class towards understanding course topics in a way that helped me clarify my thinking.

TP 07
The instructor helped to keep course participants engaged and participating in productive dialogue.

TP 08
The instructor helped keep the course participants on task in a way that helped me to learn.

TP 09
The instructor encouraged course participants to explore new concepts in this course.

TP 10
Instructor actions reinforced the development of a sense of community among course participants.

TP 11
The instructor helped to focus discussion on relevant issues in a way that helped me to learn.

TP 12
The instructor provided feedback that helped me understand my strengths and weaknesses.

TP 13
The instructor provided feedback in a timely fashion. TP 14 Getting to know other course participants gave me a sense of belonging in the course.   (Figure 2d). To estimate the four models, the study used the package in R which offered a WLS estimator for handling ordinal categorical data (Rosseel, 2012).

Figure 2
CoI Dimensionality Analysis Under Four Competing Models Using CFA After securing the required Institutional Review Board approval, the study obtained a convenience sample from the participating university in the southeastern US. The sample had a total of 909 graduate students taking online courses in the fall semester of 2014. In January 2015, these 909 students were invited by email to participate in the study through Qualtrics.
To address the common, low response rate issue with online surveys, the study contacted research participants multiple times by e-mail. In the beginning, a massive pre-study notification e-mail was sent to all 909 students, informing them of an upcoming solicitation to participate in a research project on their online learning experiences in Fall, 2014. After the data collection started, additional e-mails followed to remind the participants of responding to the survey. This continued until the data collection came to an end in April 2015.
After addressing the missingness in the responses through listwise deletion, there were 238 participants left who provided complete responses to all 34 CoI items, which led to a student-item ratio of about 7:1, satisfying the criterion that, for stable results, the sample size should be at least six times the number of items (Mundfrom et al., 2005).

Results
Based on the descriptive statistics of the responses, nearly 40% of the CoI items had skewness statistics outside the acceptable range of -1.00 to +1.00, and five items had kurtosis values above the acceptable upper limit of +1.50. It was justified for this study to apply the robust WLS method, instead of treating the categorical responses as if they were continuous (Byrne, 2010;Kline, 2016). The CFA results are found in Tables 2 through 4. of .080 and the lower bound of .074 for its 95% confidence interval was lower than the threshold of .08 for an adequate fit (Byrne, 2010;MacCallum et al., 1996;West et al., 2012). Notably, the thresholds used here for deciding whether a mode fit was adequate have been traditionally designed for the normal-theory maximum likelihood estimation with continuous data. By contrast, this study implemented a WLS estimator with ordinal categorical data. Although there are known methodological issues related to the application of these traditional thresholds to a research context like this, the practice has been widely accepted in the literature and will continue until better alternatives are proposed and established (Xia & Yang, 2019).
Evidently, out of the four models, the bifactor structure showed the best fit as assessed by highest values of CFI, TLI, and AGFI as well as lowest values of SRMR and RMSEA. Next, a Satorra-Bentler scaled 2 difference test was run to compare the bifactor structure with each of the other competing structures nested within the bifactor model. The results indicated the bifactor model was statistically significantly better in fit than each competing structure.  Table 3 contains the standardized estimates of the bifactor structure. For most items (particularly, CP items), the common variance was explained more by the general factor than by the corresponding subscale It is important for a subscale to reflect a conceptually, relatively narrow psychological trait but the construct should not be a mere artifact of asking the same question repeatedly in slightly different ways (Reise, 2012).  Table 4 presents multiple statistics measuring primarily the reliability of the CoI instrument scores under the bifactor model (Reise, 2012;Rodriguez et al., 2016a;2016b). Among them are model-based reliability measures, measures of construct reliability, and dimensionality.

Coefficient
Coefficient for the scale represented the proportion of the variance in the scale total score that was attributable to all sources of common variance (i.e., variance from all common factors: general factor and all domain factors). A high score on the scale statistic indicated a highly reliable multidimensional structure that reflected variation on the weighted combination of all common factors. Here, the statistic for the total score was .992, indicating as high as 99.2% of the scale total score variance was due to all common factors.
Coefficient for the subscale measured the proportion of the subscale total score variance that was attributable to both the general factor and that domain factor. A high value on the subscale statistic indicated a highly reliable multidimensional structure consisting of both the general factor and the domain factor. Here, the subscale ′ were .984 for TP, .966 for SP, and .984 for CP, suggesting that 96.6% to 98.4% of the subscale total score variance was due to the general factor plus the domain factor.

Coefficient
Coefficient for the scale represented the proportion of the scale total score variance that was due to the general factor only. A high score on the scale statistic indicated the scale total score predominantly reflected the general construct and allowed users to interpret the scale total score as a sufficiently reliable measure of the general factor. Here, was .914, indicating as high as 91.4% of the scale total score variance was attributed to the general factor after accounting for all domain factors.
Coefficient for the subscale measured the proportion of the subscale total score variance that was attributed to the domain factor only. A high value on the subscale statistic indicated the subscale total score predominantly reflected the domain construct and allowed users to interpret the subscale total score as a sufficiently reliable measure of the domain factor. Here, the subscale ′ were .234 (TP), .388 (SP), and .036 (CP), suggesting the subscale total scores mostly did a poor job of reflecting the domain factor and therefore each score was mostly due to the general factor, instead of the domain factor.

Construct Reliability
Construct reliability H represented the proportion of variability in the latent construct explained by its indicator items. A high score on the H statistic indicated the construct was well represented by its indicator items. Here, H = .778 for TP, .819 for SP, .478 for CP, and .988 for the general factor. An H statistic should be at least .70 (or higher) in order for the corresponding latent construct to be adequately represented by its indicators. Therefore, with an H of .988, the general factor was nearly perfectly represented by its 34 indicators and the subscale domains of TP and SP were also represented adequately by their respective indicators. Finally, the domain of CP was not specified reliably given its low H value of .478, indicating the CP domain was not reliably measured by its indicators, and that its results could be unstable and thus not be replicable across studies.

Dimensionality of CoI
The explained common variance (ECV) statistic for the general factor measured the proportion of the common variance explained by both the general and all domain factors which was attributed to the general factor only, and assessed the relative strength of the general factor among all common factors (i.e., essentially measuring the degree of unidimensionality). Here, the general factor ECV statistic was .775, suggesting 77.5% of the common variance across items was explained by the general factor and the remaining 22.5% was spread across the three domain factors. Because the ECV statistic was lower than .85, the instrument was not adequately unidimensional to justify the use of a one-factor model. Finally, the ECV statistic for an individual item (i.e., IECV) assessed the proportion of item common variance that was explained by the general factor. The closer an IECV was to 1, the stronger the item measured the general construct. If an IECV was greater than .50, the item reflected the general construct more than a domain construct. Here, the IECV statistics ranged from .417 (item 19) to .996 (item 29). The average IECV was .773 with a standard deviation of .161. Two out of the 34 items had an IECV below .50: .417 for item 19 and .477 for item 20, and they measured the domain construct more than they did the general construct. The other 32 items all had an IECV greater than .50 and therefore measured the general construct more than they did the domain construct. Further, eight items had an IECV above .90, indicating they were very strong measures of the general construct.

Discussion
The study conducted a construct-validation of the CoI instrument under CFA followed by further evaluation of the optimal model using primarily model-based reliability measures. The bifactor model identified as being optimal provided a fuller representation of the CoI framework modeling the intersection of all three presences as well as the overlap of each pair of presences. Here, two research questions were proposed and addressed and evidence of construct validity for the instrument was identified.
Regarding the first research question on the fit of each model to the data, the study examined the four competing structures based on commonly used model fit indices. The bifactor structure (M4) was unanimously the optimal one as measured by all five model fit statistics; the other three failed on either one (M2) or two (M1 and M3) of the five criteria. Besides, the CFA results identified items 17, 18, 19, and 20 which may be further examined for content redundancy.
Regarding the second research question on the psychometric properties of the optimal model, the study investigated M4 using primarily model-based reliability measures. Various and statistics provided support for the general factor and demonstrated that the bifactor structure was highly reliable. This finding was echoed by that from various ECV/IECV statistics that showed the general factor played a more important role than did the domain factors in the bifactor model. Finally, the three H statistics for the scale and TP and SP subscales indicated that they were each adequately measured by its indicator items. By contrast, with an H statistic of only .478, the measurement of CP needs more scrutiny.
The finding about CP has both methodological and substantive implications. First, the CP items were probably not measuring cognitive presence effectively. After adjusting for the general factor, the CP factor could hardly continue to exist (Chen et al., 2012). Therefore, the CP items should probably be revised, with the support of subject matter experts, to cover cognitive presence more in-depth. Second, the CP factor scores measuring students' level of cognitive presence should be used carefully. Given high loadings on the general factor but low loadings on the CP factor, it is the general factor scores alone that should be reported (DeMars, 2013) and the domain factor scores could be misleading (Reise et al., 2010;Reise et al., 2007). If policy considerations mandate the reporting of the CP factor scores, users should be reminded that it is the general factor scores that are reliable and meaningful.
The study had limitations which can be grounds for future research. First, the study did not investigate the invariance of the bifactor structure across different groups as specified by common covariates of interest (e.g., gender, course discipline). A future extension could examine if the same bifactor structure continues to hold across those groups (e.g., Dempsey and Zhang, 2019). Second, the study did not test hypotheses on the structural relationships among the common factors of the bifactor model. Another future extension could examine these relationships (e.g., Kozan, 2016). Finally, the study did not assess the predictive validity of the common factors of the bifactor model. Still another future extension may evaluate their predictive validity measured by the associations between the common factors and one or more outside criterion variables such as students' satisfaction with online learning, their academic achievements, and so on (e.g., Rockinson-Szapkiw et al., 2016).

Conclusion
The study conducted a construct revalidation of the CoI instrument for a more refined understanding of its underlying factor structure. The study identified empirical evidence supporting the bifactor model as the optimal structure for providing a reliable and valid representation of the CoI instrument and a fuller representation of the CoI theoretical framework. Therefore, the study recommended the application of the bifactor model to CoI-related research and practice in online education.