An Item Response Theory Analysis of the Community of Inquiry Scale

The aim of this study is to examine validity and reliability of Community of Inquiry Scale commonly used in online learning by the means of Item Response Theory. For this purpose, Community of Inquiry Scale version 14 is applied on 1,499 students of a distance education center’s online learning programs at a Turkish state university via internet. The collected data is analyzed by using a statistical software package. Research data is analyzed in three aspects, which are checking model assumptions, checking model-data fit and item analysis. Item and test features of the scale are examined by the means of Graded Response Theory. In order to use this model of IRT, after testing the assumptions out of the data gathered from 1,499 participants, data model compliance was examined. Following the affirmative results gathered from the examinations, all data is analyzed by using GRM. As a result of the study, the Community of Inquiry Scale adapted to Turkish by Horzum (in press) is found to be reliable and valid by the means of Classical Test Theory and Item Response Theory.


Résumé de l'article
The aim of this study is to examine validity and reliability of Community of Inquiry Scale commonly used in online learning by the means of Item Response Theory. For this purpose, Community of Inquiry Scale version 14 is applied on 1,499 students of a distance education center's online learning programs at a Turkish state university via internet. The collected data is analyzed by using a statistical software package. Research data is analyzed in three aspects, which are checking model assumptions, checking model-data fit and item analysis. Item and test features of the scale are examined by the means of Graded Response Theory. In order to use this model of IRT, after testing the assumptions out of the data gathered from 1,499 participants, data model compliance was examined. Following the affirmative results gathered from the examinations, all data is analyzed by using GRM. As a result of the study, the Community of Inquiry Scale adapted to Turkish by Horzum (in press) is found to be reliable and valid by the means of Classical Test Theory and Item Response Theory.

Introduction
Nowadays, online learning has become one of the most common applications used in distance learning. 75.9% of the institutions which have 7.1 million students taking at least one online course reported that online learning is critical as a long-term strategy (Allen & Seaman, 2013).
There is a need to effectively plan, implement, and manage online learning which is a highly common application (Moore & Kearsley, 2012). To provide these, Garrison, Anderson and Archer (2000) designed Community of Inquiry (CoI) model as a guide to the provision of effective teaching in online learning studies and applications and to the qualities of learning outcomes.
While CoI model helps to organize a theoretical frame of learning process in online learning environments , it displays the quality of a basically ideal education experience. Multi-elements like cooperation between the participants, interaction, and observable instructional indicators supporting inquiry are described within this experience (Bangert, 2009).

CoI Model and Components
The framework of CoI model was designed on the idea that the emergence of collaborative information configuration in online learning would occur through a community of inquiry (Shea, 2006). In this regard, it is a process-oriented model (Arbaugh, 2008;Arbaugh, Bangert, & Cleveland-Innes, 2010). CoI model highlights the significance of the fact that in order to enable sustainable deep learning and learning outcomes like critical inquiry in online learning environments, social interaction is not sufficient on its own, unless it is supported and integrated with cognitive and instructional elements (Garrison, Anderson, & Archer, 2000).
Within the framework of CoI, an environment to help reach a common point, diagnose misunderstood points, and enable responsibility in learning is created (Garrison & Anderson, 2003). The focus in CoI model is on the presence and belonging to a group (Joo, Lim, & Kim, 2011). The model includes three components, cognitive (CP), social (SP), and teaching presence (TP), which are emphasized as being the significant factors for the formation of a community.
One of the components of the model is TP, which is a key factor in terms of online teaching skills of the model (Garrison & Arbaugh, 2007), methods of the instructors (Bangert, 2009), and the behaviors necessary for creating a productive community (Shea, Li, Swan, & Pickett, 2006). By creating an appropriate teaching environment, collaboration of active participants in the community of teaching is aimed in the TP. Hence, design, facilitation, and direct instruction categories (Anderson, Rourke, Garrison, & Archer, 2001) are prioritized for online learning. The focus points are direct teaching activities like teaching process including the use of online learning tools, subject matter and outcomes, design and organization of the learning activities and tasks in the design category; providing participation of the learners, focusing on new terms, and simplifying the discussions to have consensus in the facilitation category; and relating the information from different sources, resolving misconceptions, and providing feedback in the direct instruction category (Shea, Fredericksen, Pickett, & Pelz, 2003). Not only the TP, but online 208 agents and automatic messages are also prioritized in online learning environments (Joo, Lim, & Kim, 2011). TP will be able to construct a bridge for the transactional distance between learner and instructor (Arbaugh & Hwang, 2006) by ensuring a deep learning process for the learner.
SP, another component of the model occurs with the learner's feeling of belonging to a course when taking part in online learning activities (Picciano, 2002). For SP, interaction is not enough on its own; thus, it should be supported with various interactional activities like developing interpersonal relationships and communicating purposefully in a trusting environment (Garrison, 2007). Emotional expression, open communication, and group cohesion are required to possess SP in online learning environments. The focus points of these categories can be stated as follows: emotions supporting inter-personal communication and relationships, use of affective response for humor and self-disclosure in the emotional expression category; recognition, encouragement, and interaction in the open communication category; addressing by name, salutations and use of inclusive pronouns in the group cohesion category (Garrison & Anderson, 2003). SP can be directly affected by the environment it is in and the tools used. SP of learners can decrease in a text-based environment, an agent application in which real people do not take place, or not individualized automatic text messages. Learners realizing their presence in the environment will create a positive effect and contribute to constructing a cooperative environment.
CP, another component of the CoI model, reflects learning and inquiry processes (Garrison, Cleveland-Innes, & Fung, 2010). The term CP based on the notion of Dewey's Practical Inquiry indicates critical and creative thinking process (Shea, Hayes, Vickers, Gozza-Cohen, Uzuner, Mehta, Valtcheva, & Rangan, 2010); thus, it is the reflection of meta-cognitive skills (Garrison, & Anderson, 2003). CP consists of four categories: triggering event, exploration, integration, and resolution. In the category of triggering event, a chaotic situation is created with the help of a problem to start an inquiry process. Exploration category includes generation of knowledge by brainstorming, clarifying the chaos and defining the problem. Sharing the generated knowledge, exchanging ideas, and reaching consensus are taking place in the category of integration. For the final phase, the implementation of the generated knowledge occurs in order to solve the problem (Akyol, Garrison, & Ozden, 2009;Joo, Lim, & Kim, 2011;Shea & Bidjerano, 2010). Use of reflective questions is highly significant for CP (Bangert, 2008). CP enables a learner to possess the meta-cognitive skills, which makes the learner more active and successful in a cooperative CoI.
When all three components of CoI are present, constitution of individual comprehension in learning and possession of the knowledge in a social process take place (Cleveland-Innes, Garrison, & Kinsel, 2007). In this regard, all three components of the model are interrelated elements that enhance each other (Anderson, Rourke, Garrison, & Archer, 2001;Arbaugh, 2008;Archibald, 2010;Conrad, 2009;Garrison & Cleveland-Innes, 2005;Kozan & Richardson, 2014b;Traver, Volchok, Bidjerano, & Shea, 2014). These studies also support the theoretical framework of the model. In an online learning environment, the occurrence of the three components effects the learning outcomes positively (Akyol, & Garrison, 2008;Horzum, In Press;Ke, 2010;Swan & 209 Shih, 2005). Consequently, measuring the formation of the components of CoI model in online learning environments is important.

Measurement of CoI Model's Components
In the studies related to the measurement of CoI model's components, the use of different qualitative and quantitative tools as measurement tools is observed. While the studies based on quantitative method include scales, qualitative studies consist of interviews and transcripts. It is possible to state that the studies using scales as measurement tools are in great numbers. In some of the studies using scales, components are measured separately while all components are measured in the other studies. For instance, the related literature includes not only the studies on separate components like SP (Kim, 2011), CP (Shea, & Bidjerano, 2009), and TP (Ice, Curtis, Phillips, & Wells, 2007), but also the studies on all components (Arbaugh, 2008;Arbaugh, Cleveland-Innes, Diaz, Garrison, Ice, Richardson, & Swan, 2008;Bangert, 2009 Use of exploratory and confirmatory factor analyses based on classical test theory is observed in the scale development studies and re-examinations of validity and reliability of the scale (Arbaugh, Bangert, & Cleveland-Innes, 2010;Bangert, 2009;Diaz, Swan, Ice, & Kupczynski, 2010;Swan, Shea, Richardson, Ice, Garrison, Cleveland-Innes, & Arbaugh, 2008). In another study, the structure is tested by asking the significance of items in the scale and components of the model to the learners (Diaz, Swan, Ice, & Kupczynski, 2010). However, absence of a study based on item response theory examining the scale in terms of substantive results from the group is noticed.

Item Response Theory
Item Response Theory (IRT) was developed to resolve the deficiency of Classical Test Theory (CTT). The basis of IRT, classical measurement models have some limitations. These limitations can be summarized as follows: • Ability of examinee and characteristics of test cannot be separated in CTT. Ability of the examinees depends on the test items and whether an item is hard or easy depends on the 210 ability of the examinees. In other words in CTT test items are group dependent and ability of examinees are test-dependent (Hambleton, Swaminathan & Rogers, 1991).
• In CTT reliability is defined as "the correlation between test scores on parallel forms of a test". But, in practice it is nearly impossible to have parallel tests. So that, reliability coefficients that are procured by CTT are lower bound (Hambleton & van der Linden, 1982).
• Standard error of measurement is assumed to be the same for all examinees in CTT. But scores on any test cannot be equal for examinees of different ability. Therefore, the same standard error of measurement for all examinees is implausible (Lord, 1984).
• Classical test theory is test oriented rather than item oriented.
Because of these limitations, some alternative theories and models have been sought. According to Hambleton, Swaminathan and Rogers (1991), this alternative theory would include: (a) item characteristics but not group-dependent, (b) examinee ability scores but not item-dependent, (c) reliability but does not require test to be parallel, (d) a measure of precision for each ability score.
According to IRT, there is a correlation which can be expressed mathematically between unobservable abilities in a certain area or features of individuals and answers of the test items related to these areas. IRT which has superiority over CTT can be used for test development, test equating, identification of item bias, CAT and standard-setting in the studies.
IRT has different models for binary and polytomous data. Numerous measurement instruments especially in attitude assessment include items with multiple ordered response categories. For this kind of data, polytomous item response models are needed to represent trait level. Likerttype scales can be analyzed by Graded Response Model (GRM). GRM is a kind of polytomous IRT model that can be used when item responses are ordered categorically just like in Likert-Scales. In this research, we use GRM to have item and scale parameters.
In online learning, the most widely used scale for measuring the components of CoI was developed by Arbaugh, Cleveland-Innes, Diaz, Garrison, Ice, Richardson, and Swan (2008) (Horzum, In press). The scale was adapted to different languages, mainly Turkish. The problem of this research is to determine whether the form of the Turkish version of this CoI scale is a valid and reliable scale when used IRT.

Aim of the Research
Whether the students perceive as a part of their community, teaching, cognitive, and social presence level could be calculated by using the CoI scale to the students in online learning applications. Making this calculation will reveal areas that need improvement in applications. In 211 this way, the online learning programs will be able to create a sense of community. Furthermore the data obtained from the scale for which categories of presence are missing will offer tips on both the content design, the components used in the learning management system and precautions that tutorials and administrators should take. With the findings from the scale, designing, planning and implementation to give more effective online learning outputs will be provided. Therefore it is important to have evidence that is valid and reliable in terms of the qualifications that the scale measures.
Exploratory and confirmatory factor analyses are focused in the validity and reliability studies of CoI model whereas another study examines the opinions of learners on the significance of the items. The aim of this study is to examine item and test characteristics of the scale used for components of CoI model by the means of IRT.

Method Participants
The study group of this research consists of 1,499 learners in online learning programs provided by Sakarya University distance education center. 587 (39.2%) of these learners are female whereas 912 (60.8%) of them are male. Age of the participants ranges from 18 to 57, the mean age (M) is 27.48, and the standard deviation (s) is 6.70.

Process
Item and test features of the scale were examined by the means of Graded Response Theory. In order to use this model of IRT, after testing the assumptions out of the data gathered from 1,499 participants, data model compliance was examined. Following the affirmative results gathered from the examinations, all data was analyzed by using GRM.

Instrument
CoI was used in the study as a measurement tool. CoI scale was developed by Arbaugh, Cleveland-Innes, Diaz, Garrison, Ice, Richardson, & Swan in 2008. In this study, 34-item scale and the structure of 3 sub-factor components of CoI was scrutinized and analyzed by using exploratory factor analysis. Turkish adaptation of the scale was developed by Horzum (in press). For the adaptation study, construct validity of the scale was examined by exploratory and confirmatory factor analyses. In the exploratory factor analysis, it was found out that the scale had three factors structure and total variance explained 67.63%. Subsequently, as a result of confirmatory factor analysis, the fit index of 34-items and three factors structure was found as χ 2 /df=1.74, RMSEA=0.071, CFI=0.98, NFI=0.96, and NNFI=0.98. The first factor of CoI scale SP consists of 9 items, the second factor CP contains 12 items, and the last one TP includes 13 items. There are in total 34 items and 3 sub-factors in the scale, implementation of which takes 10 to 30 minutes.

212
The 5-Likert scale includes questions requiring the answers as in the rank from 'I Completely Disagree' (1) to 'I Strongly Agree' (5). Cronbach's Alpha factors meaning the internal consistency of scale reliability was 0.97 for overall of the scale, 0.90 for SP, 0.94 for CP, and 0.94 for TP (Horzum, In press).

Data Collection and Analysis
With the aim of collecting data, firstly Sakarya University Distance Education Center was consulted for getting necessary permissions related to the implementation of the scale. Following the permission process, CoI scale was changed into an online form, and published on learner management system to which the learners were registered. The scale was filled voluntarily, and no credential information was required.
There are three subtitle in the analysis of this research. First assumptions of model then modeldata fit checked, later items were analyzed. In the first step of analysis, unidimensionality and local independence were checked as model assumptions. Factor analysis was applied for the assumption of unidimensionality. To define first factor as dominant, eigenvalues and scree plot was considered (Önder, 2007). To convince unidimensionality assumption a dominant first factor is needed (Hambleton et al., 1991). Confirmatory Factor Analysis (CFA) was conducted in LISREL 8.80 (Jöreskog & Sörbom, 1999) program to see if the scale verifies unidimensional structure.
Pursuant to the results of CFA, model-data fit was computed by using the CFA, RMSEA and NNFI values. RMSEA has a value between 0 and 1, and perfect fit occurs when it approaches to 0 (Bollen & Curran, 2006;Tabachnick & Fidell, 2007). Having higher values than 0.90 from CFI and NNFI which are among other fit indexes indicates perfect fit (Tabachnick & Fidell, 2007).
Another assumption of IRT is "local independence". Local independence points to the statistical independence of the responses given by sub-groups of a certain ability level to an item. This assumption is verified if an individual's performance on an item does not affect the performances for other items. Local independence means that ability is not sufficient by itself to explain the relationships between the items (Hambleton & Swaminathan, 1985). Violation of this assumption leads to the violation of the unidimensionality, as well. Therefore, it can be acknowledged that unidimensional 34-item scale verifies local independence.
For the second step of data analysis, model-data fit was examined for GRM which is used for attitude items. At this stage, compliance levels of attitude items were analyzed by the means of the differences between observed and expected frequencies. The differences between observed and expected frequencies are also referred as "residuals". Embretson and Reise (2000) state the residuals approaching zero (<0.1) proves to be a solid criterion for the goodness of model-data fit.

Checking Model Assumptions
Polytomous models as well as binary models are required to meet the local independence and unidimensionality assumptions of IRT (Tang, 1996). Principal components factor analysis was conducted on CoI scale to determine if it satisfies the unidimensionality assumption. Findings of eigenvalues and variance proportions related to the conducted factor analysis are displayed in Table 1.  According to Figure 1, the slope forms a plateau after the second point, which means that the contributions made by factors after this point are small and almost the same. For this reason, it is thought that the number of factors available is one. In this case, it might be said that the scale is unidimensional.
Another assumption to be checked is local independence. If the assumption of local independence is interrupted then also assumption of unidimensionality is interrupted too. Thus, the assumption of unidimensionality for the 34 items in CoI scale accomplishing could be said as accomplishing the local independence assumption.

Model-Data Fit
Negative log likelihood (-2*LL) value is found as 89129.2 as a result of the calibration of the data gathered from Community of Inquiry Scale Instrument with Graded Response Model. Negative likelihood value in maximum likelihood estimation indicates the degree of data divergence from the model Maximum likelihood estimation (Embretson & Reise, 2000). Marginal reliability coefficient is found as 0.9768. Marginal reliability represents total reliability obtained from the average of the expected conditional standard errors of the students from all competency levels (DCAS 2010-2011, Technical Report).
Item-data fit level can be scrutinized by the means of the differences between observed and expected proportions. The differences between observed and expected frequencies are also referred as "residuals". Embretson and Reise (2000) state the residuals approaching zero (<0.10) proves to be a solid criterion for the goodness of model-data fit. The highest difference between the observed and expected frequencies gathered from the data is 0.0453. When explored the differences between the observed and expected frequencies gathered from each subcategory of 34 215 items in the scale, it can be observed that all residuals are lower than the value of 0.10. Based on these findings, it was concluded that the Graded Response Model (GRM) was congruous in terms of model-data fit.

Item Analysis
Both CTT and IRT parameter estimates for 34 items and three subscales are shown in Table 2.
For each item, the mean, SD, and Corrected Item-Total Correlation (CITC) obtained from CTT, item difficulty and a parameter as the item discrimination gathered from IRT are clearly presented. Although there is no certain cutoff criterion for a parameter, admissible value can be said as 1 (Zickar, Russel, Smith, Bohle, & Tilley, 2002). But also if there are fewer items in a scale higher discrimination coefficient may be needed (Hafsteinsson, Donovan, & Breland, 2007). In our scale there is between 9 and 13 items per subscales, so we defined a values as moderate quality if the values are between 1.0 to 2.0 and as high quality if the values are more than 2.0.
Over the 34 item set a parameter values are between 1.81 to 3.59 and only two items' a value is below 2. While in the teaching presence subscale discriminant value is between 2 and 3 for eight of the items (1, 2, 3, 4, 5, 10, 12, 13), it is over 3 for five of the items (6,7,8,9,11).
In the nine-item subscale of social presence, two items (15, 16) have a value slightly less than 2; the four items (14,17,21,22) are in the range of 2 to 3. Besides, three items (18,19,20) in the social presence subscale have discriminating value over 3. While seven items (23,24,26,27,32,33,34) of cognitive subscale presence including 12 items in total have a value in the range of 2 to 3, other five items (25,28,29,30,31) have discriminating over 3. There are values of IRT parameter and the corrected item total correlation for each item of subscales in Table 2. There is a linear positive relation between CITC and a. For example in Table   1 item 16 "Online or web-based communication is an excellent medium for social interaction" has 217 lowest a (1.81) and CITC (0.621) value. Similarly, item 29 "Combining new information helped me answer questions raised in course activities" has the highest values as an a of 3.59 and a CITC of 0.807.
Nevertheless, a values give much more detailed information than CITC values (Scherbaum et al., 2006). For example, Items 6 and 32 both have the same value of CITC (0.776), but the a for item 6 is 3.12 whereas for item 32 it is 2.99.
For each item, the between category threshold parameters (b) are ordered, which must occur in GRM. These parameters determine the location of the operating characteristic curves and where each of the category response curves for the middle response options peaks. TP has the highest information at the level of -1.0; SP has the highest information at the level of -

Discussion
The main purpose of this study is to compare the results of the CoI scale which has teaching, social and cognitive presence subscales by the means of CTT and to define the item and test parameters with IRT.
Firstly, a is the item discrimination parameter which gives information about item quality. Zickar et al. (2002) suggest that acceptable discriminability for a parameter should be higher than 1.0.
However, according to Hafsteinsson et al. (2007)  Scale adaption studies give information about how the study is done properly in order to identify the better results of items in which the ability ranges. In this study, the parameters of scale and item are obtained and compared with both CTT and IRT. As a result of the comparison, it is seen IRT analysis has given similar results with CTT, but more detailed information has been reached on IRT. In this case, it is concluded that IRT analysis would be more appropriate in this scale study. Throughout examination of TP subscale it was revealed that, four items which are 6th, 7th, 220 8th, and 9th, have higher values and they form one of TP's indicators which is facilitation.