International Review of Research in Open and Distributed Learning What If It’s All an Illusion? To What Extent Can We Rely on Self-Reported Data in Open, Online, and Distance Education Systems?

What If It’s All an Illusion? To What Extent Can We Rely on Self-Reported Data in Open,


Introduction
Surveys are one of the most convenient ways to collect data in social science research.Self-reported learner reflections are considered essential for studying most psychological processes related to human learning, such as motivation, emotions, and metacognition (Pekrun, 2020).They are also used to evaluate the accountability efforts of educational institutions or to inform further policy decisions.
With the increase in Internet access worldwide, conducting online surveys has become one of the most preferred ways to collect data from large populations in a very short period of time.Several factors make online surveys a practical research tool, including the ease of data collection and entry (Evans & Mathur, 2005), the elimination of lack of motivation and low response rates, especially for confidential questions (Gregori & Baltar, 2013), and the ability to expand the geographical scope of the target population and study hard-to-reach individuals (Baltar & Brunet, 2012).Due to the intensive use of technology in the delivery of educational content, open online and distance education processes are often studied through online surveys.
While concerns have often been raised about the decline in the amount of robust educational intervention research (Hsieh et al., 2005;Reeves & Lin, 2020;Ross & Morrison, 2008), systematic reviews of educational technology and distance learning show that researchers often adopt survey design and use questionnaires or scales as a data collection tool and then use the results for descriptive or correlational analyses (Bozkurt et al., 2015;Kara Aydemir & Can, 2019;Küçük et al., 2013;Zhu et al., 2020) with an heavy reliance on positivist paradigm (Kara Aydemir & Can, 2019;Mishra et al., 2009).
It is certainly tempting to reach many participants with little effort; however, in some cases, the results of survey designs do not necessarily reflect actual situations.While constructing reliable and valid scales is considered central to robust measurement practices, respondents themselves can be a potential source of measurement error.That is, they may provide inconsistent responses (Castro, 2013), exert insufficient effort in responding (Huang et al., 2015), or alter their responses in socially desirable ways (Chesney & Penny, 2013), all of which result in low-quality data that can bias further hypothesis testing steps (DeSimone & Harms, 2018).In many cases, the proportion of inattentive participants or inconsistent responses within a dataset can be negligible, which does not change the inferences or conclusions of the study (Iaconelli & Wolters, 2020;Schneider et al., 2018).However, there are also cases where pronounced effects on reliability have been found (Chesney & Penny, 2013;Maniaci & Rogge, 2014).
According to Albert Bandura's social cognitive theory, the dynamic and reciprocal interaction of personal factors, environmental factors, and the nature of behavior can predict human learning and development (Bandura, 1977).For example, lack of motivation or effort on the part of participants may lead them to simply provide satisfactory answers rather than answering all survey questions optimally, as this may require considerable cognitive effort (Krosnick, 1991).The primacy and recency of selfreport questions (Chen, 2010) or participants' anchoring and adjusting behaviors (Zhao & Linderholm, 2008) may further explain response inconsistencies.More specifically, participants' initial responses to self-report measures may serve as anchors for their subsequent responses, as their memory for the context may be flawed (Chen, 2010).Such an explanation related to poor learner reflections has been observed in the learning analytics literature as well (Zhou & Winne, 2012).Another explanation for inconsistency may be related to the issue of ideal self-presentation.That is, respondents may strategically alter their self-presentation during a psychological assessment in order to present themselves more favorably relative to social norms (Grieve & Elliott, 2013).Differences between the extent and impact of response inconsistency may arise depending on the context in which the study is conducted, the characteristics of the target audience, and the sensitivity of the questions asked.For example, almost half of the participants (46%) responded inconsistently to questions about personal information such as age, gender, and educational status in an online gaming setting (Akbulut, 2015) while the degree of insufficient effort responses varied between 12% and 16% in an educational setting (Iaconelli & Wolters, 2020).In this regard, formal data collection environments may be less prone to low quality data than anonymous online environments.In terms of participants' personal characteristics, a recent empirical study suggested that respondents assigned to the careless responder class are more likely to be male, younger, unmarried, college-educated, and have higher incomes (Schneider et al., 2018).In other studies, personal interest in the research topic (Keusch, 2013) or higher academic and cognitive ability (Rosen et al., 2017) predicted better response quality.The sensitivity of the research topic has been highlighted in several papers.For example, although students gave candid responses about their course-taking patterns, their responses did not adequately reflect the truth about sensitive topics (Rosen et al., 2017).An interaction between gender and topic sensitivity was also observed in terms of the extent of inconsistent responses.Male participants, for instance, tend to underreport physical problems in order not to appear weak (Yörük Açıkel et al., 2018), whereas female participants chose to underreport their behavior when the topic is socially sensitive (Akbulut et al., 2017;Dönmez & Akbulut, 2016).
There are several methods to address low quality data resulting from inconsistent or careless responses (DeSimone et al., 2015;DeSimone & Harms, 2018).For example, direct assessment of response quality can be achieved by including validation items in a survey.Self-reported effort questions (e.g., I read all items carefully), sham items (e.g., I was born in 1979), or instructed items (e.g., Please mark strongly disagree for this item) can be used to weed out inconsistent responders; however, these are easily detected by participants who read all items and intentionally provide false responses.On the other hand, unobtrusive methods that are less likely to be detected by participants can be used during survey administration.That is, instead of modifying the survey with validation questions before the study, the response time or the number of consecutive and identical responses can be checked.However, determining the cutoff response time or number of consecutive identical responses to eliminate the flawed data is a tedious process (DeSimone et al., 2015).Finally, statistical methods can be implemented to deal with low quality data, such as checking for outliers or individual consistency across synonymous questions.
Discrepancies or small associations between student self-reports and objective data derived from learning management systems have received recent attention (Gasevic et al., 2017).While the use of self-reports has been the dominant approach to addressing student engagement in instructional settings (Azevedo, 2015), students may be inaccurate in calibrating their self-reported and actual behaviors in an online learning environment, and may tend to overestimate their behaviors (Winne & Jamieson-Noel, 2002).In addition to the construct of careless responding discussed above, such a discrepancy may further result from poor learner reflection or poorly reconstructed memories, such that learners' behavioral indicators in a learning system may be less biased than self-reported reflections (Zhou & Winne, 2012).Such findings have further led scholars to triangulate multiple methods to capture authentic learning processes (Azevedo, 2015;Ellis et al., 2017).
There is a tendency to benefit from learning analytic approaches in higher education in general and in open, online, and distance education in particular (Pelletier et al., 2021).In response to the widespread use of learning analytics and multiple data sources, some scholars are still cautious (Selwyn, 2020) and have suggested asking further questions about the nature of what is really being measured, why it is really useful, and how such data relate to the learning experience (Wilson et al., 2017).Given the wide range of arguments about the reliability of self-reported data and the promise of learning analytics, we aimed to explore the alignment between self-reported and system-generated data by contextualizing the current study in an open, online, and distance education system where learning was available at scale and such data sources influence decision making in multiple dimensions.
In short, we used an unobtrusive method to identify inconsistencies between different sources of learner data in a formal open, online, and distance education system.That is, rather than adding validation items to the self-report measures, we examined response consistency by comparing different sources of self-report and learning management system (LMS) data.Based on the aforementioned literature, we hypothesized that learners' perceived intentions and actual behaviors may differ, such that their selfreported data may differ from the objective data, likely due to poor learner reflection or poorly reconstructed memories (Zhou & Winne, 2012).However, we expected that the current formal educational environment could be less prone to low-quality data than non-formal online environments such as online gaming sites (e.g., Akbulut, 2015).In line with social cognitive theory, we further hypothesized that personal and environmental factors may have played a role in the degree of response inconsistency.In this regard, we expected several variables such as participants' seniority, academic ability, gender, and satisfaction with the learning system to predict their response patterns.Finally, we hypothesized that participants' poor reflection of their actual behaviors combined with consistencyseeking needs may have led to a certain level of consistency across multiple self-report measures, in line with the concepts of anchoring and adjusting discussed above (Zhao & Linderholm, 2008).In accordance with the above literature and current hypotheses, the following research questions are investigated: 1. How similar are self-reported and LMS data?
2. What are the predictors of inconsistency between self-reported and actual use? 3. Do different sources of self-reported data (e.g., learner satisfaction, preference, and usage) support each other?

Method Research Context
The research was conducted in an open, online, and distance education university with over two million students worldwide.The Open Education System (OES) consisted of three degree-granting colleges: The College of Open Education, The College of Economics, and The College of Business.These colleges offered a total of 60 associate or undergraduate degrees delivered entirely through open and distance learning.Students accessed courses and learning resources through an LMS.The pedagogy was primarily self-paced, while some courses include optional weekly synchronous videoconferencing sessions (i.e., live lectures).The OES allowed learners to study the learning resources online at their own time and pace, but required them to take proctored face-to-face exams to determine learner success.Applied courses within the OES also incorporated other assessment strategies such as project work.Following a multimedia approach to increase accessibility and flexibility in the learning process, a wide range of multimedia learning resources were provided online, including course books (PDF and MP3), chapter summaries (PDF and MP3), live lectures, and practice tests.The practice tests also came in a variety of forms, including open-ended questions with extended answers, multiple-choice tests with short and extended answers, practice exams, end-of-chapter exercises, and previous semester's exam questions.

Data Collection and Cleaning
Ethics approval was granted by the institutional review board of the university.The data, then, were collected from different sources: the LMS database, satisfaction and preference questionnaires, and student information system (SIS) data for learner demographics.Learner access to resources was derived from the LMS learning analytics database.The data for each learning resource indicated whether an individual had access to the resource and the frequency of their access over the course of the semester.Self-reported data were collected for two weeks toward the end of the semester.An announcement was made on the LMS homepage, and voluntary participants who responded to the surveys were included in the current dataset.
Satisfaction and preference data came from short questionnaires.The first was a 15-item satisfaction scale developed by Open Education faculty members and used for formal and institutional research.
Items were created to address student satisfaction with the open, online, and distance education system on a 5-point Likert scale ranging from 1 (very dissatisfied) to 5 (very satisfied).Exploratory factor analysis on the current dataset using maximum likelihood extraction revealed that the single-factor structure of the scale explained 77.73% of the total variance, with factor loads ranging from .84 to .92 (Cronbach's alpha = .98).
In the second questionnaire, satisfaction with each of the 11 learning resources was measured with a single 5-point Likert-type question that included options such as: This learning resource was not available in my courses (1), This learning resource was available but I did not use it (2), I used the resource but I am not satisfied (3), I used the resource and I am satisfied (4), and I used the resource and I am very satisfied (5).This question is regularly used in institutional reports to address student usage and satisfaction.
In the third questionnaire, students were asked to select three of the 11 learning resources that they preferred the most, so that the preference score pertaining to each learning material ranged between 0 and 3.This question was deliberately used by the current research team to see the relationships between usage, satisfaction, and preference.Finally, the SIS database provided us with learner demographics such as gender, age, GPA, and current semester (i.e., 1st through 8th semesters).
Data from these resources were then combined based on unique user IDs.Duplicate responses from the same ID (if any) were removed and the most recent responses were retained.At the end of the data cleaning process, data from 20,646 students were used in the current analyses.Participants ranged in age from 17 to 75 with a mean of 32.22 (SD: 10.6).The average number of courses taken by participants ranged from 1 to 12, with a mean of 6.82 (SD: 2.11).Their semesters ranged from 1 to 8; but almost 40% of the volunteers were in their first year.The gender distribution of the participants was similar (males, 50.6%; females, 49.4%).
To identify inconsistencies between self-reported satisfaction and actual use, the following criteria were used to cross-reference the various data sources: • IF the student response was This learning resource was not available in my courses BUT there was access to the learning resource, THEN an inconsistency was coded.
• IF the student response was This learning resource was not available in my courses AND there was no access to the learning resource, THEN consistency was coded.
• IF the student response was This resource was available but I did not use it BUT there was access to the resource, THEN an inconsistency was coded.
• IF the student response was This resource was available but I did not use it AND there was no access to the resource in question, THEN consistency was coded.
• IF the students reported usage and satisfaction/dissatisfaction (i.e., I used and I am satisfied/dissatisfied) BUT there was no access to the learning resource, THEN inconsistency was coded.
• IF the students reported usage and satisfaction/dissatisfaction AND there was access to the learning resource, THEN consistency was coded.
Accordingly, the inconsistencies between the self-reported satisfaction questionnaires and the learning analytics were determined for each of the 11 learning resources.It was also possible to calculate how many consistent (and inconsistent) answers each participant gave.

Data Analysis
Descriptive statistics were used to present self-reported and actual use, proportion of consistent responses, and preference rates.Self-reported and actual use were compared using a paired t-test.
Correlations between preference rates and actual use frequencies were presented.Participants' consistency rates were presented using descriptive statistics, and predictors of consistency were examined using correlations and multiple regression.Satisfaction of actual users and non-users was compared using independent t-tests.Finally, different sources of self-reported satisfaction were investigated with further t-tests.Parametric test assumptions (e.g., normality) were checked before each analysis.

Results
Descriptive statistics of self-reported versus actual usage are summarized in Table 1.A comparison using a paired t-test indicated that actual usage for each of the eleven learning resources was significantly lower than self-reported usage, with a large effect size, t(10) = 4.650, p < .001,η 2 = .684.
That is, students seemed to overreport their use of the learning resources.Preference was calculated by asking students to select their three favorite materials (one point each) across eleven learning resources, and the correlation between their total preference scores and their actual usage is shown in Table 1.All correlations were significant at the .001level; however, this was likely due to the large sample size, as the correlation coefficients were quite small.* Correlations significant at the .001level.
As shown in Table 1, the percentage of consistent responses was also calculated for each learning resource and ranged from 43% to 69.5%.If one chose to eliminate all inconsistent responses across learning resources listwise, the remaining data would look quite limited.Specifically, the number of students whose self-reported data were consistent with actual access data across all learning resources was 394 (2.2%).Table 2 shows the number of consistent responses across 11 learning resources.Learning analytics data and self-reported satisfaction scores were also used to compare the average satisfaction scores of users who actually visited a particular learning resource with the average satisfaction scores of non-users (who never visited a particular learning resource).With the exception of PDF and audio coursebooks, the satisfaction scores of users were slightly higher than those of non-users, as summarized in Table 4.However, the means of both groups were already high, as indicated by a negatively skewed and leptokurtic distribution (skewness = -1.21;kurtosis = 1.05).In addition, the effect sizes associated with these comparisons were very small.Accordingly, the number of visits to each learning resource did not show substantial correlations with the average satisfaction scores.More specifically, the actual use of each learning resource could explain a trivial amount of the variance in satisfaction scores, -R = .08;R 2 = .007;F(11; 18,221) = 11.47;p < .001.Preference rates pertaining to each learning resource and satisfaction scores were not substantially related either, -R = .14;R 2 = .02;F(11; 20,634) = 38.67;p < .001.Through the aforementioned analyses, we suggested an inconsistency between the objective data derived from the open, online, and distance education system and the subjective data (i.e., self-reports).
In addition, it was not possible to maintain a substantial relationship between the satisfaction, preference and the actual use.However, the validation of the 15-item satisfaction scale with selfreported usage was somewhat successful.Specifically, students who reported use and satisfaction (i.e., I used the resource and I am satisfied/very satisfied) were compared with those who reported use but dissatisfaction (i.e., I used the resource but I am not satisfied).Almost all comparisons resulted in large effect sizes, as summarized in Table 5.That is, two separate self-report measures of satisfaction were somewhat consistent.

Discussion
The current research signaled a discrepancy between objective student behavior (i.e., tracking data through digital footprints) derived from the learning management system and subjective data (i.e., selfreports), which supports the findings of empirical studies in the literature (Gasevic et al., 2017;Zhou & Winne, 2012).More specifically, students overreported their use.This could be due to either insufficient motivation to respond, intentional falsification (i.e., faking), or poor recall of learning experiences by students.While the source of such discrepancies should be explored through further research, scholars may choose to use a combination of multiple methods to better reflect the processes used during learning (Azevedo, 2015;Ellis et al., 2017).Learner metacognition may be specifically considered as a covariate when making decisions about inconsistency, as either poor learner reflection or poorly reconstructed memories may have resulted in low-quality data (Zhou & Winne, 2012).
Inconsistency was observed even though the content was not culturally sensitive and even though the setting was a formal learning environment.Furthermore, learners' gender, age (Schneider et al., 2018), and their academic ability (Rosen et al., 2017) predicted consistency, as expected.While the degree of consistency varied across learning materials, both actual use and learner satisfaction were associated with the degree of consistency.In this regard, when learning materials are more satisfying and useful, there seems to be a greater match between what learners say and what the system data provides.
However, we do not know about the perceived quality and usefulness of the learning resources as rated by the learners.In this regard, further research could include the perceived usefulness and quality of learning materials as variables of interest.
Students' current semester was negatively correlated with consistency.We speculated that because students were asked to respond to multiple online surveys over the course of their undergraduate studies, survey fatigue may have led to an overdose of research participation and thus higher levels of careless responding.While there were slight differences between actual users and non-users in terms of satisfaction, the overall satisfaction average was very high.In addition, the number of visits to each learning resource was not strongly correlated with satisfaction scores.That is, even learners who did not use the system were satisfied with it.This was considered quite problematic, since it may not be right to make policy decisions based on students' judgments about a system they do not actually use.
Similarly, students' preferences and actual use were correlated due to the large sample size, but the coefficients were quite small.Thus, their self-reported preferences did not show a substantial relationship with their actual usage patterns.Several empirical studies have often used student satisfaction (e.g., Alqurashi, 2019;So & Brush, 2008;Wu et al., 2010), intention to use the online learning systems (e.g., Chao, 2019), or learner preferences (e.g., Rhode, 2009;Watson et al., 2017) to evaluate online learning environments.However, the current findings suggested that objective system or performance data should be considered in addition to self-reports in order to draw more robust implications regarding the accountability of online learning systems.In addition, current LMS data is primarily limited to the presence and frequency of access to specific learning resources.Additional objective data sources and variables related to online learning experiences need to be integrated to support or refute current hypotheses.
While we were able to identify some of the predictors of inconsistencies between self-report and LMS data, we were only able to explain a very small percentage of the variability.In this regard, alternative variables from the field of learning analytics can be integrated.On the other hand, the consistency between two sources of subjective data addressing the same construct (i.e., learner satisfaction) was strong.While the inclusion of such validation items and scales in the research design has been considered as a method to directly assess response quality (DeSimone & Harms, 2018), this was not the case between self-reported and LMS data.That is, our findings suggested that two self-reported data sources may sometimes be compatible with each other, but both may be at odds with the actual usage data.In this regard, unobtrusive methods may be more effective at eliminating low-quality data than integrating validation items.To test this speculation, future researchers could compare the effects of obtrusive and unobtrusive validation methods on multiple groups.In addition, we did not record participants' survey response times, which may be considered as a covariate in further studies.
A critical implication of the current study is to consider the unreliability of self-report data, which is commonly used in educational research to inform policy decisions.In addition to using alternative data collection tools, we need to look for more objective and direct measures.We have tended to focus a great deal on the reliability of measures in general, and the internal consistency of items in particular, to the detriment of validity (Steger et al., 2022).The survey itself was not the only source of measurement error observed in the current study.Participants can also be a critical source of erroneous data.In addition to attitudes and reflections, which may be over-or underreported depending on the sensitivity of the issue, we need to use actual performance data as well.For example, while years of self-report research have emphasized that men have an advantage in technical competence, systematic analyses using performance-based measures have found that the opposite may be true (Borgonovi et al., 2023;Siddiq & Scherer, 2019).These limitations, combined with the implications of the current study, support calls from eminent scholars for robust intervention research that should include sound measures and variables to address relevant instructional technology problems (Hsieh et al., 2005;Reeves & Lin, 2020;Ross & Morrison, 2008).These findings also suggested that strategic planning decisions that guide short-, medium-, and long-term goals can be based not only on self-reported data, but also on learning analytics data available in most LMSs.We recognize the potential of the current findings to unsettle the social science community at large, where thousands of self-report studies are conducted each year.On the other hand, if we do not integrate alternative and more objective data sources into more robust designs, it is likely that the replication crisis will continue.

Concluding Details
The following are details about specific aspects of how this research was conducted.First, this research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
The authors declare that there are no conflicts of interest related to this article.Data will be made available upon reasonable request.Finally, our research proposal was approved by the Institutional Review Board of Anadolu University (March 28, 2023, No: 33/63).

Table 1
Statistics on the Use of Learning Resources

Table 3
Predictors of Inconsistency

Table 4
Satisfaction of Users and Non-Users

Table 5
Consistency Between the Two Separate Measures of Satisfaction