Mobile Technology Acceptance Scale for Learning Mathematics: Development, Validity, and Reliability Studies i

The purpose of this study is to develop a valid, reliable, and useful scale to measure high school students’ levels of acceptance of mobile technologies in learning mathematics based on the second version of the unified theory of acceptance and use of technology (UTAUT2) model. The study was designed based on a sequential exploratory mixed-method research design. To this end, both qualitative (interviews with students, review of literature, and expert panel evaluation) and quantitative procedures (Lawshe content validity technique, exploratory and confirmatory factor analysis, convergent validity, discriminant validity, nomological validity, criterion validity, internal consistency reliability, and temporal reliability) were used to develop and validate the Mobile Technology Acceptance Scale for Learning Mathematics (m-TASLM). As a result, a 5-point Likert scale with 36 items grouped under 8 factors was developed and confirmed. Both validity and reliability studies yielded favorable results.

Introduction specifically, Al-Hujran et al. (2014, p. 14) defined m-learning as "integrating mobile technologies with learning and education processes." To Crompton and Burke (2018, p. 53), m-learning denoted "learning involving the use of a mobile device." Barati and Zolhavarieh (2012, p. 298) defined m-learning as "any form of learning and teaching process that occurs by mobile device or in [a] mobile environment."

Benefits and Barriers
There is a good amount of literature on the benefits of m-learning. M-learning removes the boundaries of the classroom (Reychav, Dunaway, & Kobayashi, 2015), enabling students to learn anytime anywhere (Chung et al., 2019). In general, m-learning is best characterized by its ability to enrich learning experiences with enhanced mobility and connectivity (Cheon, Lee, Crooks, & Song, 2012). Thanks to this mobility and connectivity, students are able to learn more easily, practically, and productively (Choon-Keong, Ing, & Kean-Wah, 2013;Gikas & Grant, 2013). When accompanied with proper learning strategies, m-learning is also praised for its positive impact on students' learning achievement (Hwang & Chang, 2011;Hwang & Wu, 2014), attitudes towards the lesson (Hwang & Chang, 2011), interest and motivation (Hwang & Chang, 2011;Hwang & Wu, 2014), as well as on problem-solving skills (Al-Khateeb, 2018;Lai & Hwang, 2014), creativity, and communication skills (Lai & Hwang, 2014).
There are however some disadvantages and barriers to using mobile learning. Among these are issues of connectivity, small screen size, limited processing power, low input capacity, security and abuse, distracting factors, reluctance to adopt, and difficulty in using technology (Awadhiya & Miglani, 2016;Wang et al., 2009). Gikas and Grant (2013) mentioned that distracting factors such as social networks and small mobile keyboards are among the difficulties experienced by students using mobile devices. Gökdaş, Torun, and Bağrıaçık (2014) attributed the pre-service teachers' negative attitudes towards mobile learning to low Internet speed, and the poor quality or lack of digital content. Şad, Özer, Yakar, and Öztürk (2020) argued that using smartphones for learning may prevent students from achieving the degree of cognitive depth necessary for long-term retention.
M-learning in teaching mathematics. The unprecedented capacities of mobile devices (e.g., portability and availability) and their wide acceptance among young people have also influenced learning and teaching mathematics (Attard & Northcote, 2012). Since learning mathematics through mobile means helps students gain new mathematical knowledge, skills, and experiences, mobile mathematics learning has become a new area with a growing interest among educational researchers and practitioners (Kyriakides, Meletiou-Mavrotheris, & Prodromou, 2016). Several mobile and online applications have been developed to support teaching algebra, geometry, analysis, statistics, probability, and other areas of mathematics (Cayton-Hodges, Feng, & Pan, 2015;Fabian, Topping, & Barron, 2016).
The pedagogical potential of mobile applications especially in the fields of mathematics, science, and engineering stems from their advantages in helping students grasp the abstract concepts in these disciplines (Subramanya & Farahani, 2012). Using mobile devices also allows learners to become aware of their mathematical skills including measurement, prediction, and problem-solving (Fabian et al., 2016;Tangney & Bray, 2013). Allowing direct interaction with mathematical phenomena through visual and dynamic affordances on touchscreens, mobile mathematics learning provides students with opportunities for the easy transference between home and outdoor learning situations and more flexible ways to work collaboratively . Mobile mathematics learning applications also provide opportunities to discover mathematics independently or cooperatively in real-life situations through visualization and contextualization (Baya'a & Daher, 2009).

Technology Acceptance and Unified Theory of Acceptance and Use of Technology
Despite the documented advantages of m-learning, its success is not guaranteed unless it is adopted and applied properly. Proper adaptation and application depends on learners' acceptance of mobile technologies (Awadhiya & Miglani, 2016;Wang et al., 2009). Wang et al. (2009) stated that "the success of m-learning may depend on whether or not users are willing to adopt the new technology that is different from what they have used in the past" (p. 93). Likewise, Awadhiya and Miglani (2016) have warned that learners' reluctance to adopt specific technologies is an important challenge to m-learning.
Many theoretical models were developed with regard to technology acceptance research, each with different acceptance determinants (Venkatesh, Morris, Davis, & Davis, 2003;Venkatesh, Thong, & Xu, 2012). Researchers had a multitude of models from which to choose, though picking one meant ignoring the contributions of alternative models (Venkatesh et al., 2003). This necessitated the need to review and synthesize these models, and eventually Venkatesh et al. (2003) integrated the most specific and well-known eight models and proposed a compact model called the unified theory of acceptance and use of technology (UTAUT; Venkatesh et al., 2012;Wang et al., 2009).
Since UTAUT was originally developed to evaluate the acceptance and use of information technologies by employees, especially in the organizational context, use of the model with different technologies, users, or cultures was limited. Recognition of the need to expand and update the model emerged (Venkatesh et al., 2012;Wang et al., 2009;Yang, 2013). As a result, Venkatesh et al. (2012) extended the UTAUT model to include the consumer context and proposed UTAUT2. UTAUT2 included three additional determinant variables (hedonic motivation, price value, and habit) as important factors affecting consumers' acceptance and use of technology. Furthermore, UTAUT2 preserved three moderator variables (gender, age, and experience) though excluded the voluntariness of use variable. As a result, the final UTAUT2 model included seven independent variables and three moderator variables representing the determinants of consumers' behavioral intention to use a technological device.
Behavioral intention and the determinants of behavioral intention in the UTAUT2 model can be defined as follows: • Behavioral intention (BI): "The strength of one's intention to perform a specified behavior" (Fishbein & Ajzen, 1975, p. 288).
• Performance expectancy (PE): "The degree to which an individual believes that using the system will help him or her to attain gains in job performance" (Venkatesh et al., 2003, p. 447).
• Effort expectancy (EE): The degree of ease associated with consumers' use of the system/technology (Venkatesh et al., 2003;Venkatesh et al., 2012).
• Social influence (SI): The degree to which a consumer or individual perceives that important others (e.g., family and friends) believe he or she should use the new system or a particular technology (Venkatesh et al., 2003;Venkatesh et al., 2012).
• Price value (PV): "Consumers' cognitive trade-off between the perceived benefits of the applications and the monetary cost for using them" (Venkatesh et al., 2012, p. 161).
Although there are many studies using the UTAUT framework, a limited number of studies (Bharati & Srikanth, 2018;Kumar & Bervell,  attempted to determine the factors affecting university students' behavioral intention to accept mobile learning using structured questionnaires based on components of the UTAUT2. Bharati and Srikanth (2018) modelled university students' acceptance of mobile learning using the UTAUT2 framework, introducing two additional components: quality of service and interactive visual information. Yang (2013) also tested the factors affecting the intention of university students to adopt mobile learning, using the dimensions of UTAUT2 plus self-management of learning as a new dimension. In a recent study, Kumar and Bervell (2019), using components of the UTAUT2 model, investigated factors affecting university students' behavioral intention to use Google Classroom as a mobile learning platform.
All these studies, whether they used the UTAUT or UTAUT2 framework, entailed the development of questionnaires to examine participants' levels of mobile learning acceptance. The factors affecting individuals' behavioral intentions and technology use regarding mobile learning were tested through structural equation modeling. However, most questionnaires were adapted from those developed by Venkatesh et al. (UTAUT;2003) and Venkatesh et al. (UTAUT2;, applying therelevant mobile technologies and contexts. Moreover, most of these studies failed to provide detailed information about the validity and reliability criteria of their questionnaires. The psychometric properties such as construct validity, convergent validity, divergent validity, and reliability were evaluated based on the statistics (e.g., factor loadings, composite reliability, average variance extracted, etc.) obtained as a result of structural equation modeling. Thus, Venkatesh et al. (2003) suggested that measures for the UTAUT should be viewed as preliminary, and research should be done to fully develop and validate scales to obtain favorable psychometric properties.

Methodology Design
This research aimed to develop a m-TASLM using a sequential exploratory mixed-method research design. Sequential exploratory mixed-methods research involves a two-stage and sequential process in which the researcher begins by exploring the subject matter qualitatively, and then follows up using a quantitative strand (Creswell & Plano Clark, 2011). In the preliminary qualitative strand of the study, 25 high school students in grades 9 to 12 with experience using tablet computers in their math classes were interviewed to develop draft items for the m-TASLM. Content validity of the scale was tested by an expert panel of five people, including three measurement and evaluation experts and two mathematics education experts. In the quantitative strand, validity and reliability studies of the m-TASLM were conducted on data obtained from four independent study groups during the spring semester of the 2018-2019 school year.

Study Groups
First study group. Our first independent group was comprised of students representing various grade levels and types of high schools in the central districts of Malatya province in Turkey (Table 1). The data obtained from this group were used to perform an initial exploratory factor analysis (EFA). Note. N = 523. * Anatolian high school is a common type of state school applying a general curriculum, while Science high school is more competitive providing a curriculum with more science and math courses.
Second study group. Our second independent group was comprised of 815 students representing various grade levels and types of high schools in the central districts of Malatya province in Turkey (Table 2). The data obtained from this group were used to perform confirmatory factor analysis and to estimate Cronbach's alpha internal consistency coefficients, composite reliability (CR), average variance extracted (AVE), maximum shared variance (MSV), and average shared variance (ASV). Note. N = 815.
Third study group. Our third independent group was comprised of 83 students representing various grade levels from a private high school in the central district of Malatya province in Turkey (Table 3). The data obtained from this group were used to estimate the temporal reliability of the m-TASLM. The re-test was administered to the students one month after the first test during the spring semester of 2018-2019. Fourth study group. A fourth independent group was comprised of 64 students representing various grade levels from an Anatolian high school in the central district of Malatya province in Turkey (Table 4). The data obtained from this group were used to test the criterion validity of m-TASLM. Note. N = 64.

Scale Development Procedure: Item Development, Data Collection, and Analysis Processes
The m-TASLM was intended to include the original components of UTAUT2 developed by Venkatesh et al. (2012): PE, EE, SI, FC, HM, H, PV, and BI. In order to develop an item pool, similar scales were examined first (Cheon et al., 2012;Venkatesh et al., 2003;Venkatesh et al., 2012;Wang et al., 2009).
Next, 25 high school students were interviewed about their views and experiences using mobile technologies while learning mathematics. As a result, a collection of 69 items was developed and grouped as follows: • 10 items about PE (e.g., Using mobile technologies while learning mathematics improves my mathematics performance).
• 14 items about EE (e.g., Learning to use mobile technologies for studying mathematics is easy for me).
• 10 items about SI (e.g., Educators in my immediate environment support me to use mobile technologies while learning mathematics).
• 6 items about FC (e.g., I have the necessary knowledge to use mobile technologies while learning mathematics).
• 7 items about H (e.g., It is a habit for me to use mobile technologies while learning mathematics).
• 5 items about PV (e.g., The mobile technologies I can use to learn mathematics are costeffective).
• 11 items about BI (e.g., I intend to start using mobile technologies while learning mathematics).
Scale items were arranged in the form of a 5-point Likert questionnaire. Responses ranged from 1 (strongly disagree) to 5 (strongly agree). The items were submitted to an expert panel to check content validity. Expert opinions were evaluated using the Lawshe technique (Lawshe, 1975). The content validity ratio (CVR) was calculated for each item, and the critical CVR value for the five experts was set to .99 at α = 0.05 significance level according to the following formula (Lawshe, 1975 Based on the feedback from experts, necessary corrections were made to the scale. The content validity index (CVI) was estimated for the entire scale, calculating the mean of CVRs for all remaining items (Lawshe, 1975). Next, the draft scale was examined by a Turkish linguist for issues of language and expression. Finally, five high school students were asked to check the scale for clarity and understanding, and four items with minor problems were rearranged.
Following the preliminary qualitative item development procedures, the m-TASLM was administered on the first (N=523) and second (N=815) study groups successively to test exploratory and confirmatory factor analyses. Moreover, convergent, discriminant, and nomological validity of the scale were examined using the results of the confirmatory factor analysis.
The temporal reliability of the m-TASLM was determined through test-retest analysis using two data sets obtained from 83 high school students over a one-month interval. To test the criterion validity of the scale, the correlation was estimated between the final form of the m-TASLM and the Tablet Computer Acceptance Scale, originally developed by Güngören, Bektaş, Öztürk, and Horzum (2014) to measure high school students' acceptance of tablet computers. Finally, Cronbach's alpha, CR, and AVE coefficients were calculated and reported.

Content Validity
Based on the ratings of five subject-matter experts, the content validity ratio (CVR) for each of the 69 items was calculated. The CVRs for 47 items were equal to 1, indicating perfect agreement. On the other hand, the CVRs for 5 items were equal to -0.2, for 10 items equal to 0.2, and for 7 items equal to 0.6. Thus, these 22 items (5+10+7) with CVRs less than .99 were excluded from the scale.
That left 47 items remaining, which included: 6 in the PE subscale; 8 in the EE subscale; 6 in the SI subscale; 6 in the FC subscale; 4 in the HM subscale; 6 in the H subscale; 5 in the PV subscale; and 6 in the BI subscale. Since all components of the UTAUT2 model were represented in these remaining 47 items, content validity was not impaired by the removal. In addition, the content validity index (CVI) value was equal to 1 for each subscale and overall scale. Thus, it can be said that the content validity of the m-TASLM was statistically established (Lawshe, 1975).

Construct Validity
Exploratory factor analysis (EFA). The construct validity of the m-TASLM was initially tested using EFA in the SPSS Statistics 20 program (Hair, Black, Babin, Anderson, & Tatham, 2014;Tabachnick & Fidell, 2013). Before the analysis, the data set was checked to meet the assumptions of EFA. To this end, first, both univariate and multivariate normality assumptions for the data set for 523 cases were tested. To test univariate normality, cases with z scores exceeding ± 3.29 (p <.001) were considered outliers (Tabachnick & Fidell, 2013). Also, skewness and kurtosis values for all items were calculated and found to be between ±1 (skewness = -.800 to .693; kurtosis = -.882 to .293). To test multivariate normality, Mahalanobis distances were calculated, and a total of 60 outliers were detected for p < .001 significance level (Tabachnick & Fidell, 2013). After deleting these outliers, the data set was reduced to n = 463. Next, the missing values, not exceeding 1.1% for any item, were replaced using the series mean technique.
The correlation matrix for all items was examined and coefficients were found above .30 for all variable pairs. Also, all correlation coefficients were lower than .90, indicating no multicollinearity problem between variables. Results of the Bartlett Sphericity test (χ 2 = 15290.534; df = 1081; p = .000 <.05) and KMO statistics (KMO = .961) indicated the sampling adequacy of the whole data set, while anti-image correlation coefficients for each item (r =.657 to.983) were adequate for sampling adequacy of individual items.
As a result of the EFA, item 34 in the draft scale was removed because it had close loadings (<.10) in two factors (H and FC factors); and items 10,11,12,13,14,18,26,31,37, and 41 were removed because they had loadings less than .50. The analysis results are shown in Table 5. Note. Factor loadings below .500 are not shown in the table. The extraction method was principal axis factoring. PE = performance expectancy; H = habit; PV = price value; SI = social influence; HM = hedonic motivation; FC = facilitating conditions; BI = behavioral intention; EE = effort expectancy.
As it is seen in Table 5, the factor analysis yielded an 8-factor construct with 36 items explaining 66.068% of the total variance. The factor loadings, communalities, and corrected item total correlations were also favorable, proving the construct validity of the scale.

Confirmatory factor analysis (CFA).
In order to further test the 36-item and 8-factor construct obtained from the EFA, a CFA was conducted using the data collected from the second study group (n = 815) in the LISREL 8.8 statistics program. Table 6 shows the goodness of fit values produced in the first CFA. Note. *p<.01; RMSEA = root-mean-square error of approximation; RMR = root mean square residual; SRMR = standardized root mean square residual; GFI = goodness of fit index; AGFI = adjusted goodness of fit index; CFI = comparative fit index; NFI = normed fit index; NNFI = nonnormed fit index.
The first analysis yielded a significant p-value for the 8-factor model (χ 2 = 1749.15, df = 566, p = .0000 <.01). Therefore, other goodness of fit values were examined to confirm the model. The goodness of fit values were found to be either excellent or acceptable in general, since benchmark values indicate acceptable model fit when: (a) χ 2 /df is less than 5; (b) RMSEA, RMR, and SRMR values are less than .08; and (c) CFI, GFI, AGFI, NFI, and NNFI values are greater than .90. They indicate excellent model fit when: (a) χ 2 /df is less than 2; (b) RMSEA, RMR, and SRMR values are less than .05; and (c) CFI, GFI, AGFI, NFI, and NNFI values are greater than .95 (Brown, 2006;Hair et al., 2014;Hu & Bentler, 1999;Tabachnick & Fidell, 2013) (see pre-modification model values in Table 6). However, the values for GFI = .88 and AGFI = .86 were slightly under what is normally deemed acceptable. At this stage, the modification suggestions offered by the LISREL program were applied, by correlating residuals of items 19 and 20 in the SI factor; and items 24 and 25 in the FC factor, which statistically significantly improved the model (p < .01). As a result, the goodness of fit values of the 8-factor model became acceptable or excellent except for the AGFI value (see post-modification values in Table 6). Since AGFI = .89 was also found to be very close to the acceptable limit, it can be said that the 8-factor construct of the measurement model is confirmed adequately. The standardized factor loadings, squared standardized factor loads (R 2 ), and t values for the model after modification are presented in Table 7. Note. PE = performance expectancy; EE = effort expectancy; SI = social influence; FC = facilitating conditions; HM = hedonic motivation; H = habit; PV = price value; BI = behavioral intention.
As is seen in Table 7, the standardized factor loadings, R 2 estimates, and the t values suggest favorable results.

Convergent, Discriminant, and Nomological Validity
In order to test the convergent, discriminant and nomological validity of m-TASLM, CR, AVE, MSV, and ASV values, inter-factor correlations, and the square root of AVE values were calculated. The results are presented in Table 8.  (Hair et al., 2014). Furthermore, the square root AVE values calculated for each factor are higher than the correlations between one factor and others. In addition, the criteria of AVE> MSV and AVE> ASV were satisfied. Thus, it can be said that m-TASLM also has divergent validity, which suggests that though measuring conceptually similar concepts, the measures are sufficiently different from one another (Hair et al., 2014). Finally, the values in Table 8 indicate that the correlations between factors are positive and statistically significant (p <.05), indicating the nomological validity of the m-TASLM, which suggests that each construct accurately relates with the others in a theoretically consistent way (Hair et al., 2014).

Criterion Validity
Since the normality assumptions for the data sets obtained from the Tablet Computer Acceptance Scale (skewness = -.015, kurtosis = -.607) and m-TASLM (skewness = -.633, kurtosis = -.714) were adequately satisfied, the criterion validity of the m-TASLM was tested using the Pearson correlations coefficients test. Test results revealed a positive, high, and statistically significant correlation between the two scales (r = .726, p <.05). Therefore, it can be said that m-TASLM can adequately measure a construct similar to the one measured by the Tablet Computer Acceptance Scale.

Reliability Analysis
To test the reliability of scores obtained from the m-TASLM, Cronbach's alpha internal consistency and test-retest temporal reliability coefficients were estimated. The results are shown in Table 9.  (Kline, 2011, p. 70). In order to test the consistency of the responses for individuals at two points in time (with a one-month interval in between), a test-retest method was used (Hair et al., 2014). Since the normality assumptions for pre-and post-test results obtained from 83 students were adequately satisfied (skewness and kurtosis <±1), scores were tested with the Pearson correlation analysis. The analysis yielded significant positive moderate-to-high correlation coefficients for factors (r = .463 to .772). For the entire scale, the consistency was high (r = .932, p = .000<.05). Accordingly, it can be said that the scale is reliable enough against random errors depending on time. Also, the CR coefficients over .70 and AVE values over .50 for all factors (see Table 8) support evidence for the reliability of the scale (Hair et al., 2014).

Conclusion and Discussion
In this study, a scale, called m-TASLM, to measure high school students' level of mobile technology acceptance in learning mathematics was developed. The m-TASLM was designed to include the components of UTAUT2 developed by Venkatesh et al. (2012), specifically PE, EE, SI, FC, HM, H, PV, and BI.
Composed as a synthesis of eight technology acceptance models, UTAUT2 is an extended version of the UTAUT model for consumers (Venkatesh et al, 2012). In addition, the UTAUT2 model has a better predictive validity than other models as it can explain higher percentages of variance in behavioral intention (74%) and technology use (52%) scores (Venkatesh et al., 2012). In contrast, the UTAUT2 model was used in a limited number of studies which investigate mobile technology acceptance (Bharati & Srikanth, 2018;Kumar & Bervell, 2019;Ramírez-Correa et al., 2019;Venkatesh et al., 2012;Yang, 2013). Thus, the present is believed to contribute to both the relevant literature and practice, especially for researchers who would like to investigate learners' tendency to use mobile technologies in learning mathematics or other subjects, after due adaptations are made.
In this comprehensive and meticulous scale development study, designed according to a sequential exploratory mixed method approach, both qualitative and quantitative research methods were used, and a 5-point Likert scale with 36 items under 8 factors, explaining 66.068% of the total variance, was developed and confirmed. Results of the validity and reliability studies showed that the m-TASLM sufficiently meets the benchmark criteria regarding validity and reliability.
Based on these results, the m-TASLM is a valid and reliable instrument to measure high school students' acceptance levels of mobile technologies in learning mathematics. Researchers could examine the psychometric properties of the scale for different educational levels (e.g., grades 5 to 8) or for mobile technology users of any age. Furthermore, though the m-TASLM has been developed to measure mobile technology acceptance in learning mathematics, its psychometric properties could be adapted to different subjects (e.g., science, social sciences, etc.).