Studies and Eavaluation part 2

Practice of observations by colleagues and management in the pilot study

The competence profile for senior teachers together with the rubrics was transformed into questionnaires by the researchers. This questionnaire used a 4-point rating scale, varying from ‘my senior colleague does this (1) almost never to (4) very often’. The teacher colleagues and school management used this questionnaire in order to rate the competences of the senior teacher whom they observed. Four different colleagues and one manager observed one senior teacher. The colleagues completed the questionnaire twice, with an interval of six months

Interviews: background and rationales

Next to observations and questionnaires, interviews are a valid method for the judgement of competence, especially in combination with other methods and information (Landy & Conte, 2013; Schmidt & Hunter, 1998). It is important that at least two experts, not attached to the school in any other way, conduct the interviews (van der Schaafet al., 2005). The inclusion oftwo experts instead of one increases reliability (Murhpy & Davidshofer, 1994) and is sufficient to produce acceptable levels of inter-rater agreement (Marzano, 2003).

Practice of interviews in the pilot study

In the pilot study two experts from the teacher-training college were hired as external assessors. Based on an analysis of all data available, being the questionnaires of pupils, colleagues and management and senior teachers’ portfolio’s (as described below), these experts could hold interviews with each individual senior teacher in order to judge their level ofcompetences. The individual interviews took about 1.5 h each. The interviews were explicitly directed on (1) the confirmation of evidence for ‘already proven competences’ from the materials analysed; and (2) on the further exploration of competences which were ‘unclear’, or not yet proven sufficiently

Portfolio assessment: background and rationales

By constructing a portfolio, the senior teachers themselves could be actively involved in the assessment. In this portfolio they described and reflected on their own strengths and weaknesses (e.g., van der Schaaf et al., 2005). The process of working on this portfolio asks for reflection and introspection. Research shows that the use ofrubrics leads to learning (Boud, 1995; Jonsson & Svingby, 2007) and possibly even to a professional growth (Hatton & Schmith, 1995). For this to happen, the criteria, format and guidelines for the portfolio should be transparent and clear (Linn, Baker, & Dunbar, 1991; van der Schaaf & Stokking, 2008). From other studies it is known that asking colleagues for feedback on competences, can lead to learning experiences as well (Hattie & Timperley, 2007).

Practice of portfolio assessment in the pilot study For the pilot study, a portfolio manual was written by the external experts, containing guidelines, fill-in factsheets, and the competence profile for senior teachers with rubrics, as being the criteria the teachers should judge themselves on. The teachers had to prove their competences in at least two different situations. These two situations should be additional to what ‘already could be known’ about the teachers from the other measurements. Next, these situations should be described in detail, so the experts would be able to visualize the situation and ask specific (check)questions on this during the interview, and colleagues would be able to write feedback (or a specific addition) regarding the situations described. The addition ofwritten feedback by at least two colleagues for each situation described was put into the portfolio. As mentioned earlier, the entire assessment program should offer learning opportunities for the people involved, because it is an expensive and time consuming process all together. Indeed, the complete assessment should not only result in a judgement, but it should also have developmental possibilities for the teacher during and after the assessment. For the portfolio, the teachers completed the ‘pupil questionnaire’ on interpersonal teacher behaviour at the start of the pilot study assessing the way they viewed their own behaviour with pupils in the class. Next, the senior teachers proved the eight competences by completing an ‘evidence form’ (what is your evidence for this competence, why is this convincing). This evidence could be a video from a good practice, a series oflessons, a manual they developed and so on. Together with this evidence, written feedback from colleagues was added. For writing this feedback, the teacher’s colleague(s) used the rubrics from the competence profile.

Step 4: pilot study

As introduced above, a pilot study was organized to evaluate the working of the assessment program in practice and the acceptance of the program within the school team. Validity and reliability are the most widely used quality criteria for assessment, but just these two criteria are not sufficient when it comes to assessing competence (see also Baartman et al., 2007). Several authors have proposed other or complementary quality criteria, focusing for example on the meaningfulness of the assessment for learning or the quality of the feedback it provides (e.g., Baartman et al., 2007; Linn et al., 1991). Quantitative measures ofquality are often not available for these kinds of assessment programs, necessitating other operationalizations of validity and reliability (Baartman et al., 2007). The assessment program described in this paper was evaluated by using 12 quality criteria for the determination ofthe quality ofcompetence assessment programs. Of course, the assessment parts should be valid, reliable and objective (traditional criteria). The rationale behind using the 12 new criteria is that competence assessment consists of both more traditional and new forms of assessment, and as a consequence, both traditional and new quality criteria are needed to evaluate the quality of the assessment. Table 4 presents a description of the quality criteria used in this study (Baartman et al., 2007, p. 261).

Method

The study presented in this paper describes pilot study of the assessment program, including an evaluation of the assessment program as well.

Instruments

The competence profile for senior teachers was transformedinto a questionnaire, as described above and was used by the peer teachers, the management and the teachers themselves. Next, a standardized questionnaire on teacher behaviour was used, the QTI (questionnaire on teacher interaction) by the pupils and the senior teachers themselves. This questionnaire is a validated and reliable instrument used in many other (international) studies already (den Brok et al., 2010; Levy et al., 2003; Telli & den Brok, 2012). The scales and numbers ofitems ofthe QTI are presented in Table 3.

Evaluation instruments of the (perceived) quality of the assessment program

To evaluate the quality ofthe entire assessment program the 12 quality criteria of Baartman et al. (2007) were used (Table 4 presents the categories). In a previous study, these quality criteria were specified into 4–6 indicators per quality criterion (Baartman et al., 2007), which were used as questions in a questionnaire in this study. The participating senior teachers and their colleagues judged the quality ofthe assessment program on a 10-point Likert scale. The pupils could fill in four ofthe twelve quality criteria: (1) fitness for purpose; (2) transparency; (3) fairness; and (4) (costs and) efficiency. These four were chosen, because these are the most visible for the pupils, for example, ifthe criteria were clear to them and ifthey thought the criteria represented their opinion ofa good teacher. Next, four questions were added to the pupils’ questionnaire in order to receive information about the pupils’ perspective on the usefulness of the assessment program.

Participants

Eight senior teachers participated as ‘assesses’ in the pilot study of the assessment program: six men and two women. The age of the teachers varied between about 30 years and 63 years ofage. All teachers taught a subject like maths or languages, and one teacher taught physical exercise and had managerial tasks next to her teaching tasks. All teachers had gained at least five years of teaching experience. In the evaluative part ofthe pilot study, which was not obligatory, seven out of the eight senior teachers completed the evaluation questionnaire of the quality of the assessment program.

For each participating senior teacher, pupils oftwo classes rated their teachers. They carried out observations and filled out the QTIquestionnaire. In total 170 pupils participated. The pupils varied in age between 14 and 17 years old. Participation ofthe pupil groups was obligatory, so the response rate was close to 100%. 70 out of the 170 pupils also completed the evaluation questionnaire on a voluntary basis.

Four different colleagues observed each single senior teacher; in total 32 teachers participated in the new assessment program as observers and 16 other teachers helped providing written feedback. In total 48 teachers were involved in the assessment program of their eight colleagues. Only 4 out of the 48 peer teachers completed the evaluation ofthe quality ofthe assessment program. This difference in participation between the pilot study and the evaluative part of the pilot study might be due to the period ofthe year (at the end ofthe second semester just before the summer holidays) and the fact that peers and pupils were invited to participate on a voluntarily basis.

Data analyses

For the assessment of the senior teachers’ competence, available data were (1) results of pupils on the QTI-questionnaire, together with the scores of the senior teachers themselves on the QTI-scales; (2) the scores ofthe questionnaires on the competence profile for senior teachers, completed by the colleagues; and (3) portfolios of the senior teachers including the feedback by peer colleagues. The questionnaires from the colleagues were analysed by computing mean scores per competence (varying between 1 and 4) that subsequently were computed into percentages, indicating on a 100% scale how often the teacher showed a specific competence. In order to make a final judgement on the senior teachers’ competences, the two experts used the following criteria: (1) results on the pupils’ questionnaire should be positive, being in line with the (national) norms of the QTI showing no large negative differences with the national average scores; (2) the results from the peer teachers’ questionnaires should at least have scores of 60% for each competence, being a positive result; (3) the evidence for each competence as included in the portfolio should be valid, reliable and convincing according to the two experts. For a positive judgement, teachers should score positively on all three parts

For the assessment of the quality of the assessment program, available data were the results of the pupils’ evaluation questionnaires and results on the evaluation questionnaire filled in by the senior teachers themselves and their peers. Quantitative data from the evaluation questionnaire were available from 70 pupils, seven of the eight senior teachers and four colleagues. Means and standard deviations were calculated.

Evaluation of the assessment: results

The first results concern the pilot study of the assessment program, describing the assessment of the competences as well as the provision of the opportunity for self-reflection. Second, the evaluation of the assessment program is presented.

Pilot study of the assessment program

Senior teachers’ competences

The separate parts ofthe assessment program, the observations, questionnaires, portfolios and interviews all pointed out towards the same direction: a senior teacher does or does not show the competences as formulated in the competence profile for senior teachers. In none of the cases, the results of the different parts of the assessment program contrasted each other. Table 5 presents the overall results of the teachers on the separate parts of the assessment.

Five teachers proved to be competent for the senior role as described in the competence profile for senior teachers. The two assessors judged this independently from each other based on the portfolio assessment. This positive judgement was confirmed by the interview with these teachers. In contrast, two other teachers had not been able to prove their competences by means of their portfolios; the judgement by the experts was ‘doubtful’. However, these two teachers were capable of adding extra material during the assessment interview by providing extra information on the evidence provided and by adding ‘critical incidents’ during the interview. Eventually, the interview turned their judgements into positive ones. For the last teacher, the portfolio was insufficient and even if there would have been an interview with additional materials, it could not have led to a positive judgement. Therefore, the assessment interview was cancelled and this teacher was requested to construct a new portfolio. In an evaluation interview, the eight teachers, even the one without a positive judgement, stated that they recognized the advice and judgement.

Opportunity for reflection

Working with portfolios regarding professional development is considered valuable when there is a dialogical context. This dialogical context was created by having the senior teachers ask their peers for written feedback. Next, an interview was held with the senior teacher and two experts. All teachers stated that the entire process helped them reflect on their profession, their behaviour and their actions undertaken. Seven out of the eight senior teachers told the experts that it was a developmental process for them to work on the portfolio because of the gathering of evidence proving the competence, reflecting on the competences and writing down, asking peers for feedback and discussing this with the experts. One senior teacher, who did not receive a positive judgement, did not agree with the other teachers on this. He stated that the assessment program also judged the way one could build up a portfolio and use one writing skills, and not only the senior teachers’ competences.

Evaluation of the assessment program

Mean scores of the evaluation of the quality of the assessment program as judged by the teachers and peer teachers on a 1–10 scale are presented in Table 6. The criterion ‘acceptability’ (i.e. ‘‘all stakeholders should approve of the assessment methods, criteria and standards’’) showed the lowest score. The teachers who had been assessed, as well as the teachers who participated in the peer assessment, did not completely support the assessment program used (teachers M = 5.67, colleagues M = 4.75). Especially the teachers who participated in the peer assessment reported low scores on the acceptance of this method. The criterion ‘fairness’ also showed low scores within both groups (teachers M = 5.62, colleagues M = 5.51). This criterion comprises questions like ‘‘do you think the assessment is fair’’ and ‘‘are the assessors unprejudiced’’. The assessed teachers also reported low scores on the criterion ‘educational consequences’. They stated that this assessment program did not really influence their professional behaviour (M = 5.82). The peer assessors on the other hand stated that participation in the assessment program did influence the teachers’ professional behaviour (M = 8.88). Next, the assessed teachers reported that the assessment program was suitable for self-reflection (M = 7.43), which was part of the aim of the assessment program. Especially the portfolio was designed to stimulate the teachers to reflect on their own competence. All eleven teachers (7 senior teachers and 4 peer teachers) reported that the assessment program led to reproducible judgements and decisions (resp. M = 7.38; M = 7.75), which is a measure of reliability. Another measure of reliability is ‘comparability’, which is the use of comparable methods, criteria and standards for all assesses. According to the assessed teachers and the peer teachers, the assessment program was indeed comparable (M = 6.92; M = 6.88 respectively). The (peer) teachers also reported that the assessment program was suitable for the aim set (‘fitness for purpose’: teachers M = 7.30, peers M = 6.65), which was judging whethera teacher was a real seniorteacher having all competences described

Table 7 reports the scores of the evaluation of the assessment program by the pupils. In total, 70 pupils filled out the evaluation questionnaire in which a 10 point scale was used. All four criteria measured showed high scores (see Table 7). The pupils reported that they understood the questionnaire about their teachers’ interpersonal behaviour and that they understood the goal of it, namely to judge the ‘best’ teachers in school for senior positions, also giving the pupils a voice in this. The assessment program (as far as the pupils participate in it) was fair and transparent according to the pupils

The pupils reported that they appreciated the fact that they could participate in the assessment and give their judgement ofthe teacher (M = 8.21). They stated this is a way of giving feedback to their teachers (M = 7.11). Pupils did not perceive changes in teachers’ behaviour (M = 3.34).

Conclusion and discussion

The study presented in this paper focused on the development ofan assessment program for senior teachers, while providing the opportunity for self-reflection by the senior teachers. The development process contained four steps. The first step concerned determining the content ofthe competences to be assessed. In the second step specification of criteria and standards was undertaken and in the third step methods were chosen for carrying out the assessment program. The assessment program was implemented in a pilot study, assessing eight senior teachers.

Theoretical frameworks on good teachers do not present one specific view on good teachers (Berliner, 2001; Fenstermacher & Richardson, 2005). Therefore, three theoretical perspectives on good teaching can be recognized in the final competence profile for seniorteachers’ competences. The profile included aspects from (1) perception studies of ideal teaching, including learning environment research (Allen & Fraser, 2007); (2) effectiveness research (e.g. Seidel & Shavelson, 2007); and (3) studies on teachers’ professional knowledge (e.g. Berliner, 2004; Darling-Hammond & Snyder, 2000; Verloop, 2005). The literature on good teachers was presented to the development team and the team also used their own (literature) resources from e.g. professional development programs. A specific aim was that the school team would recognize the new assessment program and that there would be a strong commitment towards using it. As a consequence, a school-specific competence profile was developed by the school’s development team. This is a rather eclectic approach, using competences fitting to the specific school context, mostly chosen bottom up. Berliner (2005) described the importance oftaking specific demands ofthe school environment into account. The school management agreed a rather eclectic approach in choosing the competences, in order to create a larger commitment of the team.

The assessment program had two goals: judgement of senior teachers’ competences and creating an opportunity for reflection on their competences by the senior teachers participating. The senior teachers stated that the assessment program did not really influence their professional behaviour, but they recognized the possible influence ofthe assessment program on their professional behaviour as teachers. They mentioned the possibility to reflect on their own competence development while working on their portfolio. This forced them to make their competences explicit. With this reflection function, the assessment program – although having a specific summative goal – had a formative purpose as well (Hickey, Zuiker, Taasoobshirazi, Schafer, & Michael, 2006)

Opportunity for reflection

A portfolio can play an important role in the professional development of teachers, and not only in case of an (external) judgement. Working with portfolios regarding professional development is considered valuable when there is a dialogical context. If so, a portfolio can be considered an ‘assessment tool for learning’. However, when there are no reflective discussions regarding the portfolio, it is less valuable ( Mittendorff, Jochems Meijers, & den Brok, 2008). This might be the case in our assessment program: only one interview was conducted in which the teacher could talk about and explain his or her contribution in the portfolio. It is not unlikely that extrinsic motivation played a role considering the portfolio assignment. The teachers completed their portfolios in order to gain another position/salary in the school. The participating teachers stated that the process of working on their portfolios is a very good way of self-reflection (see also Boud, 1995). However, the other colleagues were not convinced of this; they stated that working on a portfolio might contribute to self-reflection, but that this does not has to be the case for everyone. In order to achieve an actual and long-lasting effect on teacher behaviour and professionalization, teacher assessment should be integrated into a larger personnel evaluation system. This could be done by having all teachers work on a professional portfolio, describing their professional development and reflecting on their professional identity as a teacher (Beijaard et al., 2004). This portfolio can then be the base of the annual dialogue of the teacher and the management

Indeed, a pilot study of the assessment program showed that a good view of senior teachers’ competences, as described in this study, seemed to be gained in this way. Seven out of eight senior teachers who were subjected to the assessment indeed demonstrated the specified senior teachers’ competences. The results of the different methods within the assessment program all pointed in the same direction. Multiple different methods, assessors and pieces of evidence were used to demonstrate teacher competence, assuring triangulation of methods and assessors. The fact that observations, questionnaires, interviews and portfolios all pointed in the same direction (a positive or negative judgement of the senior competence) is a first starting point for the determination of construct validity ( Murhpy and Davidshofer, 1994). However, in the pilot study a rather small, selective group of teachers participated. Half of them participated in the development team of the competence profile as well, which might have influenced their views on the assessment program in a positive way. The fact that the eight ‘best’ teachers were selected by the school management for participating in the pilot study might be of some influence of the (positive) findings. In order to validate the assessment program further on, it could be implemented in other, comparable schools, who did not participate in the development process, but who also need a system for personal management and high stake assessments as required by the Dutch government.

It remains an interesting question whether these kinds of assessment tools should be theoretically driven, practically driven, or could be a combination of both, as in our study. We assume that this combination works best in order to gain a valid competence profile as well as an increased commitment of the team. While working mainly theoretically driven might lead towards a ‘not invented here-problem’, ending up with a rejection of the assessment program the school team. By having the best teachers participate in the development team and the input of literature, a certain ‘gathering of competences’ should have been prevented, but this is possible weakness of our study. This way of working might be interesting for others to try out and could be interesting as a subject for further research as well. It might also be interesting for future research to compare our approach to more theoretically driven approaches while assessing teachers, taking the acceptance and commitment towards the assessment program of the school team in mind.

Evaluation of the assessment program

The judgements of the eight senior teachers were recognized and accepted by the teachers assessed as well as by their peer teachers. The development process was carried out with the four steps by the school development team, in order to create a large commitment of the school team towards this new assessment program. As a consequence, the pilot study with the first implementation of the assessment program was evaluated. This evaluation needs to be interpreted with some caution because only four peer assessors (of the 48 peer assessors participating in the assessment program) and 70 pupils (of 170 in total) participated in the evaluative part of this study. This difference in participation might be due to the period of the year (at the end of the second semester just before the summer holidays) and the fact that peers and pupils were invited to participate on a voluntarily basis. School management reported that it was too busy that time of year to gain a higher response.

The participating teachers (teachers assessed and the peers) reported that the assessment program was suitable for providing a competence judgement, but that the acceptance of the assessment methods was rather low and that the commitment towards this way of competence assessment was not very high. This could be partly due to the fact that the assessment methods used were quite time-consuming. Teachers who judged their colleagues invested time and effort, but did not perceive their participation as useful for themselves. Indeed, peer assessment can be valuable for both parties (teacher and peer), but dialogue and exchange are important conditions for learning from each other in school teams (Doppenberg, den Brok, & Bakx, 2012). However, in order to reach this dual learning effect for teachers assessed as well as their peers, specific goals on this should be set, explained and guided. This was not done in the pilot study, but can be a valuable suggestion for others who would use a comparable assessment program. Even though the findings of the study do not support some of the outcomes hoped for (in particular the development of a summative assessment program for senior teachers), it does seem to offer a sound methodology for the development of such a program within each school context. If all teachers in a school site utilized the method for developing the competencies, then acceptance of the outcomes of the evaluation might be more readily embraced by theteachers.

For additional research, it is interesting to gain more insights into the possible psychological rationales in order to find out why the teachers resist supporting the assessment program. For this purpose, another questionnaire could be developed, investigating the possible psychological causes of resistance. Especially open ended questions could be useful in order to do so. This may facilitate an understanding of why teachers did not quite accept this assessment method. This might help to refine the program for the future if the underlying causes for their resistance are identified. From other studies on educational innovation, it is known that teachers are more positive when given ownership, agency and logical sense-making (Ketelaar et al., 2012). In order to create acceptance for the assessment program, a large group of teachers and the management were brought together both to create a valid profile as well as a commitment for the use of this competence profile for senior teachers as the underlying basis for the assessment program. However, not all teachers were involved in the development process and the teachers had not been involved in choosing the assessment methods. Especially these methods (portfolio, observation, questionnaires and interview) were timeconsuming and involved many peers and pupils in order to establish one valid judgement of a senior teacher’s competences. The low acceptance could be due to the large amount oftime, effort and money spent on the assessment of relatively few people (Wiliam, Lee, Harrison, & Black, 2004). The pupils were more positive in this respect. They understood and recognized their part of the assessment program and appreciated the possibility to give feedback to their teachers. They appreciated the fact that their opinion was asked for and perceived the assessment program as fair, transparent and suitable for the purpose of assessing senior teachers’ competence. The difference between teachers’ and pupils’ judgements may be due to the fact that pupils’ voices were heard and that they had a serious role in the assessment of their teachers, which was not a role that pupils commonly get. Because of the anonymity of the methods, the pupils could be honest about their opinions of their teachers, without fearing negative consequences

A question that remains difficult to answer is whether colleagues are capable of judging each other objectively when it considers a summative assessment. The participating peers were in no way dependent on each other in the teaching team, or connected in a hierarchical relation. However, it is possible that colleagues who like each other, judge each other more positively. Observations done by peers have the problem of ‘sympathy of the observed person’. This is especially important when it concerns a summative assessment with salary consequences, like in this study. The same question that can be asked is whether pupils are capable of judging their teachers? In order to reduce the possible bias by colleagues and pupils, three actions were undertaken: (1) judgements were carried out anonymously; (2) relatively large groups of pupils and peers were asked to participate in the assessment (Damon, 2007); and (3) external assessors were included in the assessment program. The assessed teachers were asked to add ‘evidence’ to their portfolios to prove all eight competences which were analysed by the external assessors and, together with the results from the pupil questionnaire, the questionnaires of the colleagues and the portfolio, formed the basis for the final individual interview. The expert judgement and the peer judgement produced comparable results, which is a first indication that peer teachers and pupils could play a role in the summative assessment of their colleagues. However, more research is needed in this respect, for example by comparing the judgements of colleagues who like or dislike each other. For formative purposes, these reliability issues are less of an issue. In case of assessments for formative reasons, colleagues could judge one another as a starting point for professional development and intervision. The organization could create a culture in which collegial consultation and reflection on professional behaviour are generally accepted and appreciated and in which learning from each other is a central goal (Doppenberg et al., 2012).

Summarizing, the assessment program for senior teachers showed that competences of working teachers can be assessed during their teaching career, with this program. First indications of validity and reliability were positive, but the acceptance of the program by the school team was rather low and should be investigated further. Future steps to be undertaken in order to improve the assessment program should contain the validation of the competence profile, acceptance ofthe assessment program and a possibility to lower costs and time-investments. A further validation of the competence profile could be done by external experts and teachers, combining a theoretical approach and practical input. This could possibly improve and specify the competence profile further. Next, in the school conditions for learning from each other and using portfolios as a means for professional development, could contribute towards a decrease oftime and effort, because a portfolio would then be a growing document all teachers already have. This portfolio would then be the base ofthe annual dialogue between teacher and school management. When a teacher would be selected to participate in the assessment program, then only an addition in the portfolio with peer feedback would be needed. Indeed, this requires a change in school culture, directed at a learning organization. Our study provides a first direction for the development of an assessment program that would fit in a learning organization.

google adsense

0 Response to "Studies and Eavaluation part 2"

Post a Comment