The added value of classifying teachers into stages of assessment skills: explaining variation in student mathematics achievement
This section examines the extent to which the classification of teachers into the four stages explains variation in achievement in mathematics. Due to the hierarchical structure of the data (students within classes within schools), multilevel analysis was carried out using MLwiN. The first step of the multilevel analysis of student achievement in mathematics at the end of the school year was to determine the variance at individual, class and school level without explanatory variables (i.e., baseline model). The empty model revealed that 74.3 per cent ofthe total variance was situated at the student level, 16.7 per cent of the variance was at the classroom level and 9.0 per cent was at the school level. In subsequent steps, explanatory variables at different levels were added. Explanatory variables, apart from grouping variables, were centred as Z-scores with a mean of 0 and a standard deviation of 1. This is a way of centring around the grand mean (Bryk & Raudenbush, 1992) and yields effects that are comparable. Thus, each effect expresses how much the dependent variable increases (or decreases, in case of a negative sign) by each additional deviation on the independent variable (Snijders & Bosker, 1999). Grouping variables were entered as dummies with one of the groups as baseline (e.g., boys = 0). The models presented in Table 2 were estimated without the variables that had no statistically significant effect at 0.05 level
In model 1 the context variables at each level were added to the empty model. The figures in the third column of Table 2 show that model 1 explained 32.3 per cent ofthe variance, most ofwhich was attributed at the student level. All student background variables were found to have statistically significant effects on student achievement. Prior knowledge had the strongest effect in predicting student achievement at the end of the school year. In addition, prior knowledge was the only contextual variable which had a consistent effect on achievement when aggregated either at the class or the school level. In model 2, the impact of teacher assessment upon student achievement was investigated. Since teachers were assigned to four stages according to their assessment skills, we investigated the extent to which the classification ofteachers into these four stages could explain variation in student achievement. Thus teachers at stage 3 were treated as a reference (or baseline) group and three dummy variables were entered in model 1. The developmental stage at which a teacher is situated was found to have a statistically significant effect on student achievement. Specifically, students of teachers at stage 1 had the lowest achievement, whereas students of teachers at stage 4 had higher achievement than students in the first three stages
Discussion
Classroom assessment research appears to be of high priority in the field of education. However, despite the numerous attempts to establish a theoretical base for classroom assessment (Black & Wiliam, 2009; Brookhart, 2004; Gipps, 1994; Pryor & Crossouard, 2005; Sadler, 1989), a research gap still exists with regard to what constitutes effective assessment (Perrenoud, 1998; Yorke, 2004) and how it translates into action (Wiliam et al., 2004). In addition, there is little research investigating teachers’ assessment skills, either for formative or summative purposes (Mok, 2010; Wiliam et al., 2004). The intention was to engage critically with the theory of formative assessment (Black & Wiliam, 2006) by investigating the emerging issues from a more practical and applied perspective.
In this study, a specific measurement framework was used to describe not only quantitative, but also qualitative characteristics of classroom assessment to help us define and measure specific skills associated with assessment practice. The proposed measurement framework could help clarify the hazy area of classroom assessment by directly associating it with specific dimensions. These dimensions would permit the measurement of classroom assessment’s effectiveness not only in terms of its formative purpose but also in terms of all aspects of the assessment process. Furthermore by moving away from the commonly applied summative-formative distinction, the four stages of assessment behaviour could represent an integrated approach to assessment practice, including the various functions and purposes of assessment. Looking at the description ofthe four stages, we can see that they move from relatively easy to more advanced types of teacher behaviour in terms of assessing student knowledge and skills in mathematics. Starting from skills associated with everyday classroom routines with a mainly summative orientation, we can observe a gradual movement towards skills associated with the use of assessment for formative purposes
Important conclusions also arise when examining in more detail the content ofthe stages identified. Firstly, the stages appear to provide support for arguments concerning the dynamic nature of the assessment process. The four phases of assessment process which were used to measure teachers’ skills do not stand independently but, on the contrary, they are found to coexist in all four stages. This implies that teachers in all four stages are involved in the cycle of assessment, with their skills differentiated in terms oftheir complexity in each phase. In particular, teachers in stages 1 and 2 differ in relation to the techniques used during student assessment. Whereas stage 1 teachers rely only on the use of written tests, stage 2 teachers are able to use a variety of techniques during assessment. However, teachers in the first two stages appear to use assessment only for summative purposes and they attempt to measure only basic skills in mathematics. Moving on, differences are found between stage 2 and stage 3 teachers in terms of the purpose as well as the content of assessment in mathematics. In particular, stage 3 teachers use assessment for formative purposes and in addition expand the content of their assessment to include more complex educational tasks. Finally, the dimension of differentiation is only present in the last stage. This implies that differentiating assessment across the different phases ofthe assessment process and in relation to different techniques is more difficult to achieve as the analysis of data, using the Rasch model, has shown. This finding is in line with previous studies that found the differentiation of instruction was situated at higher stages of teacher development (e.g., Kyriakides et al., 2009, 2013).
ges of teacher development (e.g., Kyriakides et al., 2009, 2013). Moreover, using student achievement data, it was found that teachers situated at a higher stage ofassessment are more effective than those situated at the lower stages. These findings are in line with recent literature that supports the view that effective teachers use formative–oriented assessment in everyday classroom practice (Antoniou & Kyriakides, 2011; Creemers & Kyriakides, 2008; Hattie & Timperley, 2007; Wiliam et al., 2004). Specifically, students of teachers in stage 1 had the lowest achievement, whereas students of teachers in stage 4 had higher achievement than students of teachers in the first three stages. Therefore, it was found that teachers exercising more advanced types ofassessment behaviour had better student outcomes. This finding confirms the impact that assessment practice has on student outcomes. Thus, assessment is not only necessary for evaluating learning but is also a means for achieving it, placing assessment at the heart of the learning process. Furthermore, the fact that the content of each stage is distinctively defined allows the identification of specific assessment skills that have a greater impact on student achievement. This suggests that more extensive usage of assessment skills that were found to have a bigger impact on student outcomes should be encouraged among teachers. These findings can be used not only to determine what constitutes effective assessment, but also how it translates into action.
The significance and importance ofthis research can be found in the way that its results can be used to yield improvement in the field ofclassroom assessment and teacher education at both policy and practice level. Professional development in assessment appears to be a controversial issue in the literature. One line of research recognising the inadequate assessment training at both pre- and in-service teacher education (Popham, 2004; Stiggins, 1991 ) shifts the attention to the need for teachers to understand the principles ofsound assessment in order for effective practice to be achieved. Another line of research brings forward other factors besides teacher competence that impact upon the effectiveness of assessment practice, such as the role of the classroom assessment culture (Shepard, 2000) and teachers’ perceptions and beliefs (Brown, 2004; Pajares, 1992) as well as the formative function of assessment (Black & Wiliam, 1998). This study stresses the need to identify those activities associated with the factor of classroom assessment, which have a positive impact on student outcomes. The fact that teacher assessment skills were found to be related to student achievement outcomes implies that effective assessment practices can be defined and promoted through teacher professional development programmes. In particular, the results can be used to inform educational policy in order to move forward to the establishment of assessment targeted training and professional development opportunities for in-service teachers. Developing assessment skills thought well-targeted pre- and in-service teacher education could contribute to a more effective use of assessment in the everyday classroom, however, further studies are needed to establish this.
While this study has provided evidence that teacher skills in assessment can be grouped into certain stages of assessment behaviour, more studies in this area are necessary. More specifically, given the fact that the study was conducted in a single country and was concerned with primary teachers’ assessment skills in mathematics, further research is needed in order to test the generalisability of the findings of this study. Whether the developmental stages of classroom assessment skills can also be identified when measuring skills of teachers in assessment ofstudent achievement in various subjects (not only in mathematics) and in assessment ofstudents in the different phases of schooling (not only at primary school level) should be further investigated in order to test the generalisability of the findings. Finally, studies investigating effective ways of using these results for teacher improvement purposes are needed in order to contribute to the improvement assessment practice.
Appendix A. The statistical rationale underlying the Rasch model
The Rasch model is based on the assumption that the difference between item difficulty and person ability should govern the probability of any person being successful on any particular item (Bond & Fox, 2001). For example, the simplest member of the Rasch family of models, the dichotomous model predicts the conditional probability ofa binary outcome (correct/incorrect), given the person’s ability and the item’s difficulty. Specifically, the probability of a correct response is a logistic function of the difference between the ability of the person and the difficulty of the item. This S-shaped function transforms any value of the real line into a value between 0 and 1. The Rasch model does not test only the unidimensionality of the scale, but it also is able to find out whether the tasks can be ordered according to the degree of their difficulty. At the same time the people who carry out these tasks can be ordered according to their performance in the construct under investigation.
This procedure is justified theoretically and is used in studies on teacher evaluation (e.g., Burry & Shaw, 1988; Wang & Cheng, 2001; Wright & Linacre, 1989). Specifically, the Rasch model puts people and tasks on the same scale and enables the researcherto examine the range ofthe assessment practice scale to see ifthe items/tasks within it form a continuum of assessment practice from ‘easy to perform’ to ‘difficult to perform’ that is devoid of gaps in construct coverage (Green & Frantom, 2002). Furthermore, the reliability of persons and items can be calculated, indicating how well the scale discriminates among people on the basis oftheir estimated assessment practice and how well items/tasks can be discriminated from one another on the basis of their difficulty (Andrich, 1988). Finally, Rasch analysis provides a basis for insight into the validity of a measurement tool and provides information that may limit the reliability and validity of measures made with the instrument (Sampson & Bradley, 2004). In the case of this study, specifying the position of one assessment skill on the scale provides exact information about the individuals (teachers) who can perform sufficiently (i.e., those scoring higher than the position of this skill on the scale) or insufficiently (those scoring lower than the position ofthis skill). This analysis also makes it possible to make statements about the relative difficulty of each assessment skill. Similarly, specifying an individual teacher’s position on this continuum provides information about the probability ofthis teacher showing assessment competence below or above this position (Bond & Fox, 2001).
The extended logistic model of Rasch (Andrich, 1988) is an extension ofthe dichotomous Rasch model to the case in which items have more than two response categories and was therefore used to analyse the data that emerged from teachers’ responses to each questionnaire item. Since each item has five responses, it can be modelled as having four thresholds. Each threshold has its own difficulty estimate, and this estimate is modelled as the threshold at which a person has a 50 per cent chance ofchoosing one category over another. These thresholds are calculated in log odds (otherwise called logits) and should be ordered to represent decreasing probability of each assessment behaviour occurring. Thresholds that do not increase monotonically are considered disordered. The magnitudes of the distances between the threshold estimates are also important. Threshold distances should indicate that each step defines a distinct position on the variable and therefore they should be neithertoo close together nor too far apart on the logit scale. Specifically, guidelines indicate that thresholds should increase by at least 1.4 logits (i.e., to show distinction between categories) but no more than 5 logits (i.e., to avoid large gaps in the variable; Linacre, 1999).
Appendix B. The statistical rationale underlying cluster
analysis Suppose that V1, V2, V3, . . ., Vn represent the elements of the observed measurement vector Vi which have to be clustered into groups. Initially, we find the minimum value (Vmin) of the observed measurements (i.e., Vmin = min {Vi}) and its maximum value (i.e., V max = max {Vi}). Then we standardise the elements of the observed measurements using the formula Si = (Vi Vmin)/(Vmax Vmin). The vector ofSi is now standardised between 0 and 1. Because the relative standing ofthe terms in vector ofSi are the same as those ofthe vector Vi, we sort the vector Si in order to obtain S(i) such that S(i) < S(i+1). Based on this sort, it follows that S(1) = 0 and S(N) = 1. At the next stage, we calculate Di = S(i+1) – S(i) for i = 1, 2, . . ., N 1. The values of Di represent the gaps between two consecutive values in the sorted vector of Si. Finally, the vector D is sorted in decreasing order: D(1), D(2), D(3), D(4), etc. In this way, the largest term D(1) divides the n points into two clusters with the widest possible cluster. When the first k D’s are selected, k + 1 clusters are defined maximising the smallest gap between any two clusters. Thus the number of clusters (identified in terms of the number of gaps between clusters) can be determined by examining the percentage contribution of Di
0 Response to "tugas jurnal 2"
Post a Comment