Development and Evaluation of an Open-Source, Online Training for the Measurement of Adult-Child Responsivity at Home and in Early Childhood Education and Care Settings

Efforts to monitor and improve responsive caregiving for young children, because of its importance for child development, are part of the United Nations Sustainable Development Goals. Two brief observational measures of responsive caregiving have been developed and validated (Responsive Interactions for Learning—parent [RIFL-P] and educator [RIFL-Ed] versions), with the RIFL-P available in English, Portuguese, and Spanish. The aim of the current study was to present and evaluate two online training programs for the RIFL measures. These distance learning courses were designed as open-source and asynchronous to enable their use in lowand middle-income countries and remote areas. The following course components are used: readings, lectures, observation of interactions on video, coding practice with automated feedback on item coding, and quizzes. Of the 76 trainees who registered for one of the online courses, 58 (76%) completed all theoretical module components. Student performance was generally high. Marks on quizzes ranged between 83%–100%. Ninety percent of those who took the reliability tests passed (40/44). Student satisfaction during and after the course was high. The effective online training programs are available free of charge and the RIFL suite of measures is efficient to implement. Implications for research and practice are discussed.


Introduction
Responsive caregiving, defined as sensitivity and stimulation, is one of the cornerstones of nurturing care and a prerequisite for achieving positive developmental outcomes for young children (Black et al., 2017;Britto et al., 2017;Jeong et al., 2021). This specific type of caregiving has reached global attention with directed efforts on behalf of international agencies and governments to implement programs to increase this aspect of caregiving (Santos et al., 2020;World Health Organization [WHO], United Nations Children's Fund [UNICEF], & World Bank Group, 2018). Responsive caregiving has been found to be important in home and educational contexts (Madigan et al., 2019;Vermeer et al., 2016). Despite international acceptance of the importance of responsive caregiver-child interactions, there is a clear need to refine and standardize measurements for this aspect of caregiving (Jeong et al., 2018(Jeong et al., , 2021. Proxy indicators (e.g., parental mental health, childcare availability, frequency of proxy activities with children; UNICEF & Countdown to 2030Pierce, 2021) were initially used to assess the construct, but our group has developed efficient (8 minutes) and psychometrically strong instruments that can be used at the population level in home and educational contexts (Pauker et al., 2018;Prime et al., 2015;Schneider et al., 2021;Sokolovic et al., 2021aSokolovic et al., , 2021b).
In the current study, based on our measures, we examined whether we could develop asynchronous, online courses based on video recording examples of responsive interactions to teach professionals (with diverse cultural and linguistic backgrounds) how to reliably code responsive interactions. Evidence from the teachers' education field shows that video examples can be an effective way to teach and improve students' coding reliability and content accuracy (Prusak et al., 2010). This study represents a novel contribution to researchers, policymakers, and program leaders in charge of implementing the Nurturing Care Framework (WHO, UNICEF, & World Bank Group, 2018) in national and global spheres, because it presents and evaluates open-source training for a reliable and valid measure of responsive parenting. This is particularly relevant for low-and middle-income countries (LMICs) where most of the world's children live. This aspect of caregiving is modifiable; however, there is an urgent need for efficient, psychometrically sound measurement of caregiving outcomes (including responsivity) that could be used at the population level across cultures (Jeong et al., 2018). Responsive interactions can only be reliably and validly measured through individual-level assessments that use observational methods (Lotzin et al., 2015).
Most coding schemes typically require extensive training, and are complex, time-consuming, and expensive to administer and code (Bailey et al., 2017). This limits their usefulness for population-based studies.
An observational assessment of interaction quality-the Responsive Interactions for Learning (RIFL) measure (previously called Cognitive Sensitivity)-was originally developed using a Canadian sample of parents (Prime et al., 2015) and early childhood educators (Pauker, et al., 2018;Sokolovic et al., 2021b) interacting with young children. Since its development, the RIFL measures have been successfully adapted and tested in LMICs, including Brazil (Schneider et al., 2021) and Peru. This psychometrically sound measure assesses a person's ability to understand and respond appropriately (incorporating sensitivity and stimulation) to the thoughts and feelings of the person with whom they are interacting. This measure uses thin-slice methodology (popularized by Gladwell, 2005), which involves taking a highly complex psychological phenomenon that has been extensively researched, and operationalizing it in a rating that is brief and intuitive. Thin-slice ratings have been found to possess similar psychometric properties to much longer, labor-intensive coding schemes (Matias et al., 2014;Pederson et al., 1990;Prime et al., 2014aPrime et al., , 2015.

Description of the RIFL Measures
The RIFL is a unidimensional observational tool that assesses three interconnected caregiving skills, namely clear communication, mind reading, and mutuality building. Clear communication refers to communicating in a way that the interactional partner(s) can understand. It is operationalized as providing verbal and nonverbal directions that are meaningful to the activity, as well as promoting a mutual understanding about the goals and rules of the task. Mind reading denotes understanding partners' thoughts and feelings. It is operationalized through items related to an awareness of what the partner knows or understands, rephrasing to achieve understanding, and responsiveness to subtle requests for help.
Finally, mutuality building captures the back-and-forth quality of interactions and includes the caregiver's ability to provide positively-valenced feedback and fostering turn-taking within the interaction. The version used to assess interactions between early childhood educators (ECEs) and multiple children includes additional items that capture an educator's ability to meet the needs of multiple children simultaneously.
For the parent (RIFL-P) and sibling (RIFL-S) versions of the measure, two people (e.g., a parent and a child ranging from 18 months to school age or two siblings) are asked to work together for 5 minutes to build a block structure, copying a design they are shown. The complexity of the design varies to ensure it is adequately challenging for different developmental levels. For 18-month-old children, a shape and color sorter is used. For children from 2.6 years of age and older, a Lego model is built, with each person only allowed to touch 2 colors. For the educator (RIFL-Ed) version, the educator is asked to lead either a structured or naturalistic activity with a group of children.
In both cases, interactions are video-recorded and trained coders later observe the 5-minute video. Coders view the video only once and then rate each of the 11 (parent, sibling versions) or 15 items (educator version) on a scale ranging from 1 (not at all true) to 5 (very true). A mean of the 11 or 15 items is calculated, yielding a composite score of responsive interactions that can range from 1 (very low responsivity) to 5 (very high responsivity). Most notably, viewing the 5-minute video and reliably carrying out the coding results in a psychometrically sound assessment of responsivity achieved in 8 (parents, siblings) or 10 minutes (RIFL-Ed). Other observational measures of responsivity, both in parents (e.g., PICCOLO; Roggman et al., 2013) and educators (e.g., CLASS; La Paro et al., 2009), take over an hour (Matias et al., 2014;Pederson et al., 1990).
The RIFL-P and RIFL-S have strong psychometric properties across languages. Specifically, in Canadian samples, scores on the RIFL-P have been found to correlate with other parental sensitivity measures, to be inversely associated with contextual risk, and to relate to child outcomes including receptive vocabulary, executive functioning, theory of mind, and academic achievement (Prime et al., 2014a(Prime et al., , 2014b(Prime et al., , 2015Sokolovic et al., 2021a). The Brazilian-Portuguese version of the RIFL-P demonstrates high reliability (internal consistency α = .94; inter-and intra-rater r's between .83 and .94) and validity (correlations with the PICCOLO parenting measure r's between .32 and .47; correlations with children's cognition, language, and behavior r's between .17 and .29; Schneider et al., 2021). The Spanish version also shows good reliability (internal consistency α = .97; inter-rater r = .87) and validity (correlations with parenting measures of autonomy support [Whipple et al., 2011] r = .70, and parental control r = -.47). The RIFL-Ed has also shown good reliability and validity (Pauker et al., 2018;Sokolovic et al., 2021b); notably, scores are associated with popular, validated measures of classroom quality such as the CLASS. No studies linking RIFL-Ed scores to child outcomes have been completed to date.

Open-Source, Online Training of RIFL Coding
Our research team developed multiple password-protected, open-source online courses to train new coders on the different RIFL measures, with the goal of providing a tool that could expand our ability to assess responsivity efficiently at a population level, especially in LMICs. Training for the RIFL-P is currently available in English, Portuguese, and Spanish, and training for the RIFL-Ed is available in English.
The course was designed based on findings from pedagogical research over the last half decade. Hattie (2008) meta-analytically synthesized the instructional methods from over 50,000 empirical studies to identify the most effective methods for student learning. These included learning goals that are explicit, narrow, and well-articulated; success criteria for students; multiple teaching strategies that triangulate the learning goal; and provision of feedback. Quality feedback relies on teachers being continuously aware of their students' learning status and providing directed and brief feedback (González et al., 2017;Molin et al., 2020). These findings are based on face-to-face delivery models, although those from online delivery suggest similar processes of design (Davis et al., 2018).
Systematic reviews and meta-analyses have demonstrated the advantages and disadvantages of online learning (Davis et al., 2018;Hrastinski, 2008;Means et al., 2009;Watts, 2016). The issues relate to maintaining student engagement, prevention of dropout, the provision of interactive elements to the learning, and the type of content to be learned. An early meta-analysis (Means et al., 2009) found that students who took all or part of their class online performed better, on average, than those taking the same course through traditional face-to-face instruction. The effect was strongest when the online learners were able to engage with course materials for longer periods of time. Findings with respect to synchronous versus asynchronous are similar. Asynchronous learners show more directed engagement with course content and deeper reflection of course issues. Synchronous learners experience less isolation, and receive more problem solving which may help them to persist with content (Hrastinski, 2008;Watts, 2016); however, it comes at the expense of achieving the narrow learning goal. Of course, the major advantages of asynchronous, online delivery include timing flexibility, geographical scope, and equalization of learning opportunities (Barteit et al., 2020;Chang et al., 2014).
The present asynchronous, online course was designed as a cost-effective, convenient way to provide training on responsive interactions, with the flexibility needed for uptake in a range of countries and time zones, in both urban and rural settings. It includes pre-recorded lectures, video clips of adult-child interactions, observational exercises with automated feedback, and reading materials. Videos in the English course are from North American samples, while videos in the Portuguese and Spanish versions of the course display Brazilian and Peruvian parents, respectively. Students are given explicit descriptions for each item on the scale, as well as criteria for how to score them along the entire range of the scale. A reliability test is given after the course has been completed, with the option of additional reliability testing if the coder does not pass the first round. There is also a module for rater drift that allows coders to recalibrate their coding every 10 videos.
In line with Hattie (2008), learning goals are explicit, narrow, and well-articulated; students are aware of and receive immediate feedback about whether they have been successful in achieving the learning goal.
Multimedia presentation is used to encourage learning through modalities of text, verbal presentation, and observation, following face-to-face and online empirical evidence of learning (Davis et al., 2018;Hattie, 2008). In the current study, primary and secondary outcomes were articulated for the different versions of the course. The primary outcome was the achievement of reliability, which captures the accuracy with which trainees are able to identify the quality of caregiving observed in different videos. This provides a strong measure of the learning outcomes intended for the course. The secondary outcome was related to trainee engagement with the materials and satisfaction with the courses. This data was collected from end-ofcourse surveys given to trainees.

Course Descriptions
Both the RIFL-P and RIFL-Ed courses are based on coders observing many video clips of caregiver-child interactions. The RIFL-P shows interactions between one parent and one child, while the RIFL-Ed shows one educator interacting with multiple children. Videos of parent-child interactions were obtained in Children's parents and educators consented to their interactions being available on a password protected site for educational purposes. The course completion times range from 6-8 hours for the RIFL-P course (Modules 1-4, one coding practice assignment, one reliability test) and 8-10 hours for the RIFL-Ed course (Modules 1-4, two coding practice assignments, two reliability tests). Learning goals and course components are outlined in Table 1. They involved lectures, observations of interactions on video, coding practice with automated feedback on item coding, and quizzes. Short video clips of caregiver-child interactions were presented with annotations highlighting the presence/absence of specific behaviors related to responsive caregiving. Practice coding assignments included automated feedback. That is, when the trainee rated an item, a pop-up window provided them with feedback on the accuracy of their coding as well as the expert coder's rationale for the item, which was determined by two or three independent coders.
Two reliability tests are offered after course completion, and the agreement between the expert coder and the student coder is examined through Pearson Correlation (automatically done within the online platform). If the first test is passed at r = .8 or higher (Stemler, 2004), the student is deemed reliable and receives a certificate of completion. If the participant is not successful on the first reliability test, they are required to review parts of the course, engage in an additional coding practice, and take a second reliability test. The two reliability tests reproduce the previous and successful structure of the face-to-face RIFL training.

Procedure
The commencement date for the training courses for the RIFL-P in English, Portuguese, and Spanish were as follows: October 2018, June 2019, and June 2020. Although the course is now available in Spanish, no evaluations were carried out on the Spanish version of the course (because of the pandemic). The training course for the educator measure (RIFL-Ed) began in January 2020.
Evaluations were carried out during and after the courses. During the courses, at the end of each module, students provided feedback by answering four questions (rated on a 5-point scale) regarding their satisfaction with the module (overall satisfaction, usefulness of content, clarity, and mode of delivery). As the correlation between items within the modules was high (mean r = .6), we created a mean composite.
Assessing satisfaction at the end of each module led to the inclusion of everyone who had taken the module (see Figure 1), allowing for high representativeness of these ratings.
A post-course anonymous survey was designed to assess participants' satisfaction with different course components, ask participants to contrast their experience with other face-to-face coding trainings in which they may have previously participated, determine whether they had used the measure after completing the training, and obtain feedback for improvement. Closed-ended questions were used to assess satisfaction (on a 5-point scale, from 1 = strongly disagree to 5 = strongly agree), as well as previous experiences and use of the measure (yes/no questions). Open-ended questions were used to understand challenges and recommendations for improvement with the course experience; we used inductive coding to aggregate these comments. The survey took less than 10 minutes to complete. Participants received a $20 (in Canadian dollars) gift card as compensation for their time. All procedures were approved by the University of Toronto Research Ethics Board.

Sample
Requests for use of the RIFL-P and RIFL-Ed measures led to the development of the online courses.
Trainees included research assistants (undergraduate and graduate students in psychology and education), academic principal investigators, and professionals working in hospital and government settings. Trainees have been from a range of countries: Canada, the United States, United Kingdom, Israel, China, Peru, and Brazil.

Course Completion
The sign-up and completion rates for the three courses are presented in Figure 1. Access to the course was given to all professionals who expressed interest. While some requested it because they wanted to use the RIFL instrument in their research or professional practice (and thus achieve reliability), others were simply curious about online reliability training, learning about observational coding, etc. Unfortunately, we did not track these different motivations, but it is possible to see a substantial dropout (18/76 = 24%) from initial log-on to Coding Practice #1 completion.

Primary Outcome: Student Performance
Performance on module quizzes was high, with accuracy ranging from 83% to 100% (see Table 2). These quizzes involved simple, factual, multiple-choice or true-false questions about the material that was covered in the preceding online lecture. The high accuracy indicates that participants were actively paying attention to and understanding the material presented in the online lectures.

Secondary Outcome: During Course Satisfaction
Satisfaction for all modules for the parent course was high and ranged between 4.6 to 4.94 out of 5, with little difference in ratings across modules. Satisfaction for the educator course was also high, and ranged between 4.38 to 4.79 out of 5. See Table 3 for satisfaction rates for all modules across the various courses. Values are the mean across four questions: overall satisfaction, usefulness of content, clarity of presentation, and mode of delivery. The consistent satisfaction across all modules suggest that all the course content was equally valuable to participants and there was not repetition or fatigue over time.

Satisfaction Post Course
Twenty-one participants (of 29; 72% response rate) completed the survey about the RIFL-P and eight participants (100% response rate) completed the survey about the RIFL-Ed. Results can be seen in Figure   2. A single anonymous link was sent to all RIFL-P course participants and we were unable to disaggregate those who completed the English vs. Portuguese versions of the course. Overall, post-course satisfaction was high (4.80 for the RIFL-P, 5.00 for the RIFL-Ed, on five-point scales). Participants seemed to especially value the lecture videos (4.62 and 4.75 for the RIFL-P and RIFL-Ed, respectively), video examples for each item (4.52, 4.88), coding manual (4.48, 4.88), coding practice (4.52, 4.88), and automated individualized feedback (4.52, 4.75). The background reading (4.20, 4.33) and monitoring drift modules (4.00, 4.25) were rated as less helpful, on average, and individuals did not feel fully prepared for the first reliability trial (4.20, 3.88). The majority of participants in both courses thought all course components were necessary and would not recommend removing or shortening any section.

Figure 2
Survey Results: Overall Retrospective Satisfaction Note: Error bars show standard errors.
Eight participants who completed the RIFL-P course had also previously been trained in a different coding measure that required them to achieve interrater reliability. More than half of participants said they were able to grasp the theoretical construct and learn to code more quickly in this course compared to their other course, while most others said it was about the same in both courses. One participant said it was easier to learn when training was delivered face-to-face. For the RIFL-Ed, only two participants had previous interrater reliability training experience-one said they learned faster in the RIFL-Ed course, while the other said the ease and rate of learning was similar in both courses.
Eleven participants who completed the RIFL-P course used the measure to code dyadic interactions in their own research projects, which required coding of between 20 and 4,000 videos. Two participants who completed the RIFL-Ed began using the measure in the short time between completing the course and completing the satisfaction survey.
Themes from the open-ended comments were as follows: requests for more videos that illustrate the midpoint of the scales (RIFL-P), more practice videos before the reliability test (RIFL-P & RIFL-Ed), shortened introductory lectures (RIFL-P & RIFL-Ed), and an expert explaining their coding of all items in a 5-minute video (RIFL-Ed). Other challenges that were noted were the inability to ask questions to obtain clarification (RIFL-Ed) and the need for increased age and ethnicity variation in taped examples (RIFL-P).

Discussion
Although research has consistently shown the importance of responsive caregiving for children's cognitive and socioemotional development (Britto et al., 2017;Jeong et al., 2021;Scherer et al., 2019), there remains a gap in ways to assess this aspect of caregiving at the population level. Having psychometrically strong, quick to train in and administer, and widely accessible measures of responsive caregiving is essential for monitoring, evaluating, and improving programs and policies designed to improve child outcomes. The aim of the current study was to evaluate whether it is possible to train students, researchers, and practitioners to reliably assess responsivity in parent-child and educator-child interactions (using the RIFL measures) in an asynchronous online course.
The high pass rates for the RIFL-P (English and Portuguese versions) course reveal that people can effectively learn how to reliably code responsivity in parent-child interactions using an online training model. Indeed, it is notable that most participants passed the reliability test on their first attempt, learning how to code responsivity in less than 10 hours. Pass rates for the RIFL-Ed course were also high, but in contrast to the RIFL-P course, the majority of participants required two reliability tests before being deemed reliable. These results were not surprising given the increased complexity of learning how to code interactions in which one educator is displaying different behaviors towards multiple children with varying cognitive and socioemotional skills, compared to dyadic interactions between one parent and one child.
The high pass rates across the RIFL-P and RIFL-Ed may in part be attributable to our choice to design the online platforms based on findings from the literature on effective teaching via face-to-face and online delivery models (Davis et al., 2018;Hattie, 2008). For instance, learning goals for each course component were explicit and narrow, multiple teaching strategies were incorporated into each module, video clips with annotations illustrated the learning goals, and practice coding assignments provided immediate feedback on the learning goals. Importantly, for all courses, an effort was made for the courses to be culturally appropriate and diverse with videos obtained from Brazil, Peru, Canada, and U.S.A., and from different socioeconomic strata. Capturing illustrative parental and educator behavior across countries and social strata was our most significant challenge, and we continue to refine content as new videos become available.
In addition to the high success rates, participants reported being very satisfied with their overall training experience. Across both courses, participants were satisfied with the multimedia design of the course and found the various aspects such as lectures, videos, and feedback helpful to their learning. Participants provided meaningful feedback during the course surveys, such as displaying interactions that include children in the middle childhood period, illustrating the mid-points of the scale, and providing an additional set of optional videos to review prior to completing the reliability tests. These suggestions are currently being incorporated into the existing courses as we expand our library of available videos.
Given the effectiveness and feasibility of the current courses, we can conclude that online asynchronous training may be the cheapest, most equitable and efficient approach to global reliability training for observational assessments. Achieving reliability on an observational instrument appears to be an apt fit for asynchronous, online teaching, particularly when the course content is focused and detailed (Chang et al., 2014;Hrastinski, 2008;Watts, 2016).
For both the English and Portuguese versions of the RIFL-P course, the average completion time ranged from 6 to 8 hours. While participants took longer, on average, to complete the RIFL-Ed course (8 to 10 hours), these results were not surprising given the additional coding practice and reliability test required for participants to pass the course. In person trainings can often be quite lengthy, with many responsivity measures requiring multiple days of training (e.g., PICCOLO, CLASS), resulting in large labor costs associated with compensating both trainers and trainees. Furthermore, in-person reliability trainings require trainers and coders to be in the same place, often resulting in large travel costs. The online courses presented in this paper reduce these costs and barriers by providing a quick and effective manner to train coders remotely, giving coders flexibility to do so in their own time, and with the only expense being compensation for the trainee's time.
The RIFL-P and RIFL-Ed measures are psychometrically robust, quick to train in, and easy to administer, which allows them to be used at a population level. Indeed, with free, online training available in multiple languages, researchers and practitioners worldwide can learn to use and apply these measures. For instance, responsivity can be assessed and used as a marker to identify families with children at risk for developmental difficulties, for targeted prevention or intervention efforts. The RIFL measures can also provide an efficient manner to monitor, improve, and evaluate programs designed to increase responsive caregiving. Indeed, the RIFL-P is currently being used to evaluate a national home-visiting program and in large, longitudinal cohort studies in Brazil (Hallal et al., 2018). Finally, having parallel measures that capture the same construct across different caregivers in young children's lives is another advantage of the RIFL measures.
Completion rates for courses have been found to be lower in asynchronous online training than in face-toface environments (Khalil & Ebner, 2014;Paton et al., 2018). It is notable that only 76% (58/76) of the trainees who signed up for the RIFL training courses completed the coding practice #1, and only 58% (44/76) went on to the reliability test. While this is likely to be, in part, a reflection of trainee motivations to take the course, it also likely reflects the challenge of keeping students engaged during asynchronous learning (Davis et al., 2018). From students' satisfaction ratings both during and post completion of the course, as well as their grades on quizzes, it is clear that the course design suited student learning needs; however, there was still significant dropout. Given that this is a ubiquitous finding in online learning, remedial suggestions such as building the interactive element with synchronous or asynchronous discussion boards and adding an element of competition as in a gaming framework (Burgos et al., 2018;Davis et al., 2018) may also help improve completion rates in the RIFL. Future research should evaluate such social components to reliability training on a measure to reduce dropout.
Several limitations of this study should be noted. First, the sample size was small, particularly for the RIFL-Ed (recently live) and therefore, it is important to continue to monitor completion rates as well as participant satisfaction. This information will guide continuous improvement of the online courses. Second, the post-course survey results were not representative of the population that began the course, and because the survey was anonymous, data cannot be linked with the course results and satisfaction rates. Finally, the predictive validity of the RIFL-Ed measure, which was developed more recently, has yet to be tested.

Conclusion
The RIFL measures and online training are particularly timely due to the unprecedented global attention on the topic of responsive caregiving, as well as the current trend of exploring technology-based platforms for massive online training. The RIFL measures, because of their efficiency, advance the assessment of caregiver responsivity, while the development of an open-source, online training builds capacity in LMICs and remote settings. Helping children to survive and thrive relies on our ability to efficiently train a workforce to measure (and eventually improve) responsive caregiving.