Special Education Research
How satisfaction and self-efficacy for inclusive education matter for Swedish special educators’ assessment practices of students with intellectual disability
1Department of Education and special education,
The University of Gothenburg, Sweden
2Department of Education, Umea University, Sweden
Online first: 26-June-2019
Assessment of learning outcomes (knowledge, skills, beliefs, and attitudes) is integral to both mainstream and special needs comprehensive schools for students with intellectual disabilities (ID). Assessment matters (a) to clarify if students have met the curriculum goals (b) to support the development of students (c) for teachers´ lesson planning and for decisions concerning special needs instruction. Without assessment, special educators cannot evaluate students with ID. However, assessment of students with ID poses a challenge both to special educators and their cooperation with mainstream teachers in cases of fully included students with ID with an individualised curriculum.
For example, a student with ID (a) might have language and communication difficulties or (b) have limited reading and writing abilities. Many standardised tests require reading and writing abilities. Consequently, special educators often feel that they have insufficient training for assessing students with ID. Because of the difficulty of assessing students with ID, special educators often avoid assessing them by using standardised tests (Turnbull, Turnbull, Wehmeyer, & Park, 2003).
This raises the question of whether special educators should continue to use standardised tests that require writing or multiple choice format or switch to or complement with other forms of assessment. Alternative forms of assessment have gained prominence in education, such as assessing verbal participation or oral assessment. Such forms of assessment offer a complement or even an alternative to the traditional forms of assessment.
The difficulty of assessing students with ID also poses a problem of accountability (Looney, Cumming, van Der Kleij, & Harris, 2018). Without assessment, teachers and policy makers do not know if students with ID meet the standards of the curriculum. Thus, assessment for holding schools responsible for the learning outcomes of students with ID becomes difficult.
The ability to assess students is an ability that many teachers learn through practice at work. Therefore, several studies have been conducted on how teachers in mainstream schools assess students. However little is known about how special educators in special needs comprehensive schools assess students with ID. In the present study, we sought to gain knowledge about the assessment practices of Swedish special educators in special needs comprehensive schools. Swedish special needs comprehensive schools represent a case of schools criticised for not meeting the curriculum standards and for implementing poor assessment practices (Swedish Schools Inspectorate, 2010).
Our purpose was to describe and predict the type of assessment practices Swedish special educators in special needs comprehensive schools use for students with ID. We posed the following research questions:
- What type of assessment of students with ID do special educators find the most difficult to conduct?
- What predicts special educators’ assessment practice score of students with ID?
- What predicts the types of assessment special educators use for students with ID?
- What predicts special educators’ satisfaction with assessment of students with ID?
1.1 Teachers’ assessment practices
We define educational assessment as the process that establishes what students know and are capable of doing. Assessment can be divided into summative and formative assessment. Summative assessment refers to evaluating the sum of students’ learnings outcomes through tests (e.g. written or multiple choice) for public reporting, certification, selection, and system accountability. Examples of formative assessment include constant feedback, peer assessment, self-assessment, teacher strategic questioning, and teacher feedback (Black & William, 2005). Summative assessment is typically administrated at the end of a curriculum unit. By contrast, formative assessments are designed for teaching and learning in the classroom and provide diagnostic feedback and progress evaluation (Hattie & Timperley, 2007).
However, we agree with Looney, Cumming, van Der Kleij, and Harris (2018) that the distinction between the two types of assessment can be misleading. For example, on occasion, summative assessment and point-in-time judgement of student achievement can be designed for improving teaching and learning. On other occasions, formative assessment can be used to provide feedback to students and teachers regarding the next steps for teaching and learning, as assessment for reporting or certification.
Assessing students with ID is pedagogically difficult and requires competence. The difficulty lies in establishing validity and reliability in assessments. Thus, various types of assessments have their pros and cons in terms of validity and reliability. Assessment types with a low degree of validity and reliability impose problems in terms of accountability and equity for students with ID.
Assessments requiring written questions and written answers are inaccessible to many people with ID. However, there are alternative approaches to written tests (Black & William, 2005; Davies, Stock, King, Wehmayer & Shogren, 2017). For example, Bolt, Decker, Lloyd, and Morlock (2011) and Thurlow, Lazarus, and Hodgson (2012) used read-aloud adjustments where another person read aloud test instructions, questions, and answer sets and then recorded the students’ responses. Nevertheless, alternative approaches to assessment tend to be underused (Davies et al., 2017; Fujiura et al., 2012; Tanis et al., 2012; Wehmayer & Abery, 2013).
1.2 Special educators’ self-efficacy beliefs and assessment practices
As part of socio-cognitive theory, Bandura (1997) developed the concept of self-efficacy to explain how people’s beliefs influence their course of action. Special educators’ self-efficacy refers to their belief in their capacity to execute an action in teaching. Special educators develop efficacy from prior experience (observing others), self-persuasion, and interactions with colleagues or students. Several studies have stressed the importance of self-efficacy as a predictor of educators’ attitudes and behaviour. However, the concept is multidimensional. For example special educators’ self-efficacy tends to be domain specific. Special educators might vary in their self-efficacy concerning teaching a subject matter, using Internet communication technology, classroom management, or teaching inclusive education (students with ID). Being confident about a topic is not necessarily the same as being confident in teaching students with ID. Therefore, the predictive accuracy of self-efficacy might vary depending on the dimension of self-efficacy.
Self-efficacy has been cited as one of the most important variables in special education research. It has predicted a number of teacher work outcomes (e.g. job satisfaction and burnout) (Viel-Ruma, Houchins, Jolivette, & Benson, 2010).
The concept predicts attitudes and willingness to implement various types of special needs instruction. Consequently, we expand the concept of self-efficacy for inclusive education to predict special educators’ assessment practice of students with ID. We suggest that self-efficacy for inclusive education varies with the assessment practice applied for students with ID. Greater confidence should promote more complex forms of assessment practice.
1.3 Special educators’ emotions and assessment practices
Drawing on the psychology and sociology of emotions, Hargreaves (2000) developed a conceptual framework for understanding emotions in teaching. Hargreaves (2000) argued that teaching is emotionally embedded. Emotional embeddedness means that emotions (e.g. satisfaction, pride, shame, guilt) contribute to explaining teachers’ actions and interactions at the workplace. Special educators develop emotions in relationships with colleagues, students, parents, and principals. Therefore, emotions at work mix individual and professional expectations. As an example, dissatisfaction with curricula might keep special educators from implementing new educational policies.
Expanding on Hargreaves’ (2000) framework, we contend that satisfaction (or happiness) matters for teachers’ assessment practices. First, special educators vary in their satisfaction with tests for conducting assessments. Dissatisfaction with tests might reflect either professional criticism or lack of knowledge.
Second special educators depend on satisfaction with working conditions as well as students, parents, principals, and colleagues. Satisfaction encourages action. In addition, satisfaction estimates special educators’ capacity for action. Thus, special educators who can get things done at schools have a greater sense of satisfaction. By contrast, special educators with low satisfaction might exhibit frustration and alienation at work. Such feelings might reduce their feelings of power over the assessment practices. Such emotions develop during the special educator’s career. Therefore emotions might interact with age and experience.
Third, feelings of satisfaction might be grounded in a sense of fairness. Special educators might be more or less satisfied with tests depending on how fair they consider them towards students with ID.
In this study, 148 special educators from northern and western Sweden participated in a non-random sample (response rate =74%). Nevertheless, we cannot generalise to the Swedish population of special educators. However, we can still make inferences about the process that generated the data. In other words, we can still say something about what is going on in the sample, which might yield insights into what influences special educators’ assessment practices. All the special educators teach students with ID. According to The Swedish National Agency for Education (2017), roughly 85% of the teachers in special needs comprehensive schools are females. In our sample, almost all are females (94%). About 85% have a teaching degree, whereas only 29% have a special teacher education degree. However, oversampled educators with a special education degree (52%). The teacher student ratio is close to four, i.e. very small classes and the special educators have worked about, on average, 8 years at their present school.
For our study, we analysed several outcome variables (see Table 1). The first set of outcome variables included type of assessment of students with ID: assessment of verbal participation, oral assessment, multiple choice, written assessment, other assessment, and no assessment. The participants responded to six questions (yes/no) concerning type of assessment, and multiple answers were allowed. These types of assessment cannot be classified as either summative or formative.
To study satisfaction with assessment of students with ID, we used three ordinal variables. The variables asked the participants to rate fully agree (= 4), agree (= 3), disagree (= 2), or fully disagree (= 1) on the fairness of tests for reading ability, writing ability, and mathematical ability for students with ID.
Outcome and Predictor Variables. Factor Analysis (PAF). A factor analysis with principal axis factoring (PAF), above .3 as blank
As predictors, we measured participants’ age and total years of teaching (discretized into three categories). We also measured efficacy for inclusive education using five questions, scored on a 1–4 scale. To measure job satisfaction, we used a set of 13 questions. To validate the measures, we conducted a factor analysis using principal factoring with the psych package in R (Revelle, 2018). Principal factoring provides a robust alternative to maximum likelihood. At the same time, principal factoring works better with small data sets and has theoretical plausibility compared to principal component analysis.
As can be seen in Table 1, the variables with loadings (coefficients) above 0.3 are in bold. The variables load on the principal factors as expected. The factor loadings indicate no problems with cross-loadings, suggesting that the factors can be treated as perpendicular. The Cronbach’s alpha of the lower bound of the reliability of the scale is good for job satisfaction (), acceptable for self-efficacy for inclusion ()), and good for self-efficacy for assessment (). Having validated the scales, we computed the average z-scores for each principal factor.
2.3 Strategies for data analysis
To analyse type of assessment of students with ID, we conducted a latent variable analysis (i.e. as a disposition) with the ltm-package (Rizopoulos, 2006) using the item response theory (IRT). We treated assessment practices as a continuous latent variable. A likelihood ratio test indicated that a one-parameter IRT model fit better with the data compared to a Rasch model (LR = 19.36, df = 1, p < 0.001). A two-parameter IRT model did not improve the fit (LR = 5.78, df = 3, p = 0.123). In practical terms, we estimated four different difficulty parameters for each variable (i.e. how hard the type of assessment is). However, we only estimated one (unconstrained) discrimination parameter (i.e. how good the variable is at distinguishing between high and low assessment disposition). Consequently, we considered type of assessment of students with ID as a measure of the special educators’ assessment practice disposition.
To improve the fit of our model, we used four out of six variables for the one-parameter model and removed “no assessment” and “other assessment” using a bootstrapped Pearson’s chi-squared test (p = 0.205, B = 1000). Removing these variables seemed logical because they differed substantively from the others. We checked that the measure was unidimensional (i.e. one and not several). The measure was also highly correlated with the total score (ranging from 0.58 to 0.63) with a Cronbach’s alpha of 0.63.
To analyse the binary variables, we conducted binary and ordinal logistic regression (Agresti, 2015). We focused on reporting the probability and marginal change in probability (i.e. the derivative). In addition, we used the latent variable score as a predictor in linear regression. Finally, we generated tables and graphs with the aid of the following packages: stargazer (Hlavac, 2015), effects (Fox, 2003), and margin (Leeper, 2018). All statistical analyses were conducted using R (R Core Team, 2018).
We conducted diagnostics of the residuals of the linear and logistic regressions. For the ordinal regression, we ran a proportional odds test (Agresti, 2015). We found no issues.
In Figure 1, we summarise the descriptive results. Multiple choice and written assessments were the least popular type of assessment, whereas oral assessment, verbal assessment, and other types of assessment were the most popular among the special educators. Clearly, special educators find speech-based assessment to be most useful. However, we do not know why more than half use other forms of assessment. To get some indication, we considered the responses to assessment satisfaction.
Figure 1. Proportions for assessment variables.
The pattern could explain why special educators are unlikely to use written or multiple choice assessments. Most special educators did not strongly agree that they have the adequate tests to assess students in mathematics and writing. By contrast, a large share disagreed or strongly disagreed. Overall, special educators were fairly satisfied with reading tests. Next, we considered how difficult special educators find the various types of assessment.
Figure 2. One-parameter item response theory model.
Multiple choice is the most difficult type of assessment, closely followed by written assessment (Figure 2). Verbal participation and oral assessment are the easiest types of assessment. Consequently, special educators with a low assessment practice score can still conduct these types of assessments. We plotted the coefficients for the Rasch model with their confidence intervals. None of the coefficients overlapped with zero. The difficulty parameters with greater values suggest greater difficulty. We observed that the discrimination coefficient was about 2, which suggests that the model has good discrimination between variables.
Next, we attempted to predict the assessment practice using the latent variable score (see Table 2). The variable score can be interpreted similar to logit scale. The analysis suggests that there is a curved linear relationship (inverse U-shape) between satisfaction with assessment and the assessment practice score ( in logits). In addition, we found an interaction between total years of teaching and job satisfaction for special educators with intermediate experience (compared to inexperienced special educators). We can interpret the interaction by taking the derivative with respect to job satisfaction. One additional standard deviation in job satisfaction for special educators with moderate teaching experience is associated with a 0.122 increase in the latent score, on average, adjusting for other predictors. Overall, the model explains 22% of the variation in the outcome. Moreover, the average deviation of the error is 0.68.
Linear Regression and Binary Logistic Regression
We graphed the results in Figure 3: the latent score on the y-axis, and the satisfaction with assessment on the x-axis. The line indicates the predicted score with a 95% shaded confidence interval. The x-axis also includes a “rug” indicating sample observations. Those with moderate satisfaction with assessment had the highest expected assessment practice score after adjusting for other predictors. In other words, those special educators with low/high satisfaction with assessment had a low assessment practice score when compared to those with moderate assessment satisfaction.
Figure 3. Predicted values for types of assessment and assessment practices.
In Table 2, we also report the individual types of assessment. We estimated four logistic regressions. Three of four indicate the same pattern, namely a curved linear relationship between assessment satisfaction and the propensity of assessment. Written assessment, verbal participation, and oral assessment all convey a curved linear relationship. By contrast, conducting multiple choice is unrelated to assessment satisfaction. Instead, self-efficacy for inclusion seems to be positively related to conducting multiple choice assessments. An additional standard deviation increase in self-efficacy for inclusion is associated with more than twice the odds of conducting multiple choice assessment (exp(β)= 2.7), on average, after adjusting for other predictors.
All other predictors are not statistically significant, with the exception of the interaction between job satisfaction and total years of teaching concerning verbal participation and written assessment. Note that when interpreting interactions, we focused on the interaction term.
As an indication of the model fit to the data, we computed the correlation (Agresti, 2015) between the fitted values and the outcome. We used a jack-knife approach (and averaged). The model of verbal participation provides the best fit (rjack=0.53), or 29% reduction in the misclassification. The multiple choice, written assessment, and oral assessment provide comparably low fit to the data (rjack=[0.36,0.40,0.38]).
To understand the patterns, we plotted the predicted probabilities for the outcomes and the self-efficacy, assessment satisfaction, and job satisfaction in Figure 3. The fitted probabilities are on the y-axis, and the predictor is on the x-axis. Again, the graphs include a shaded 95% confidence interval and a “rug” indicating sample observations. The panels show how the propensity for each type of assessment follows a U-curve, with the exception of self-efficacy for inclusion. However, the curve for the two predictors seems nonlinear.
Figure 4. Marginal change in predicted values for types of assessment and assessment practices.
We also included a plot of the marginal change for each predictor in Figure 4. The marginal change is simply the derivative. For the linear regression, this means that we get a linear approximation for the score, which is negative. We suggest that these plots provide a better understanding of the pattern than can be deduced from the estimates in Table 2 alone.
Finally, we turn to the issue of what predicts assessment satisfaction. In Table 3, we present an ordinal logistic regression model of each individual type of assessment satisfaction: reading, writing, and mathematics. The results of writing and mathematics were similar. We also included an interaction with job satisfaction and total teaching years. We found that job satisfaction for special educators with intermediate experience was associated with a higher likelihood of being satisfied with mathematics tests for assessments, compared to special educators with low teaching experience, on average, after adjusting for other predictors. The same pattern holds for writing tests but not for reading tests. To assess the model fit, we computed the correlation between the fitted probabilities and the outcome using jack-knife estimation. For the mathematics model, we estimated a low to moderate correlation (). The correlation was slightly lower for satisfaction with written assessment ().
Ordinal Logistic Regression
Although the lower order terms in the written assessment model are not statistically significant, this does not invalidate the interaction. Because interpreting coefficients in logistic regression models is difficult (Agresti, 2015), and interactions in particular, we plotted the fitted probabilities for mathematics assessment satisfaction in Figure 5.
Figure 5. Predicted values for satisfaction with mathematics assessment.
Figure 5 strengthens our interpretation. The probability of scoring a 1 (strongly disagree) was higher for inexperienced special educators for higher values of job satisfaction, after adjusting for other predictors. By contrast, the probability of scoring a 1 (strongly disagree) was lower for intermediate experienced special educators for higher values of job satisfaction, on average, after adjusting for other predictors. The probability of scoring a 3 was lower for inexperienced special educators for higher values of job satisfaction, on average, after adjusting for other predictors. By contrast, the probability of scoring a 3 was higher for intermediate experienced special educators for higher values of job satisfaction, on average, after adjusting for other predictors.
Assessment of learning outcomes (knowledge, skills, beliefs, and attitudes) is central in both mainstream and special needs comprehensive schools (Black & Wiliam, 2010; Bolt et al., 2011; Davies et al., 2017; Thurlow, Lazarus, & Hodgson, 2012). Assessment matters to clarify if students have met the curriculum goals, to support the development of students, for decisions concerning special needs instruction and for teachers´ lesson planning. Although most studies have focused on teachers’ assessment in mainstream schools, less is known about special educators’ assessment in special needs comprehensive schools. Our purpose was to describe and predict the type of assessment practices Swedish special educators conduct in special needs comprehensive schools for students with ID.
Our findings contribute to the refined understanding of how assessment satisfaction, self-efficacy, and job satisfaction matter for special educators’ assessment of students with ID. Because special educators require greater assessment practice of students with ID, we believe these findings contribute to educational theory, special education programmes, and educational policy. Several studies have shown the importance of self-efficacy for special educators (Viel-Ruma, Houchins, Jolivette, & Benson, 2010). However, our study stresses the importance of the specific assessment satisfaction as opposed to self-efficacy for inclusion. Although self-efficacy for inclusion clearly matters, it was not as important as predicting assessment conduct. We choose to interpret the predictive importance of satisfaction as lending support to Hargreaves (2000) that assessment is indeed emotionally embedded.
In the context of education, our study relates to the educational discussion on assessment to ensure accountability of the curriculum. Although, assessment has been a topic of much criticism, it still serves to safeguard students with ID:s progression. Our study also relates to the educational discussion on educators´ emotions and self-efficacy in predicting special educators´ practices and attitudes. Here assessment seems to be one specific example out of many to understand why educators tend to implement a policy or not.
Our study has several limitations. First, we used a non-random sample. Although this does not invalidate the use of inferential statistics, we cannot generalise our findings. At best, we can predict patterns in the data that might be interesting to researchers, teachers, and special educators as such. Second, due to the small sample size, our study lacks statistical power. Thus, we can at best discover large differences. Third, our study has measurement issues. Ideally, we should have fitted a measurement model for our predictors. Beyond reliability, our measure of satisfaction with assessment only included questions about testing and lacked other aspects of assessment. We are also aware that self-efficacy for inclusion can be measured in several ways. Fourth, we have not assessed the possibility of a reciprocal association: assessment practices may perhaps also predict greater feelings of job satisfaction.
We suggest that special teacher educators equip pre-service special educators (and in service) with the necessary competence to include different kind of assessments. Specifically, special educators may need skills to both develop and administrate written tests and multiple choice tests. Including exposing pre-service special educators (and in service) to a variety of test methods. Promoting inclusion is important but not sufficient. From a policy perspective, we suggest that more attention to assessment is given in special education programmes for assessing students with ID. Finally, we encourage test-developers to design more adequate mathematics tests for students with ID.
We found that special educators had the greatest difficulty conducting multiple choice and written assessments. The special educators found it easier to conduct oral assessments and assess verbal participation.
We found that special educators’ assessment satisfaction had a curvilinear relation to their assessment practice score. In practical terms, special educators with moderate assessment satisfaction had the highest assessment practice score. In addition, job satisfaction interacted with moderate educator experience for assessment.
We found that there is a curvilinear relationship between special educators’ assessment satisfaction with verbal participation, oral assessment, and written assessment. By contrast, self-efficacy for inclusive education predicted conducting multiple choice assessments. In addition, job satisfaction interacted with moderate teacher experience for assessment by verbal participation and written assessment.
We found an interaction between total teaching years and job satisfaction. Special educators with higher job satisfaction and moderate teaching experience are more likely to be satisfied with writing and mathematics tests compared to inexperienced special educators.
Conflict of interests:
The authors declare no conflict of interests.
Agresti, A. (2015). Foundations of linear and generalized linear models. Hoboken, NJ: John Wiley & Sons.
Bandura, A. (1997). Self-efficacy: The exercise of control. New York, NY: W H Freeman/Times Books/ Henry Holt & Co.
Black, P., & Wiliam, D. (2010). Inside the black box: Raising standards through classroom assessment. Phi Delta Kappan, 92(1), 81–90.
Bolt, S. E., Decker, D. M., Lloyd, M., & Morlock, L. (2011). Students’ perceptions of accommodations in high school and college. Career Development for Exceptional Individuals, 34(3), 165–175.
Davies, D. K., Stock, S. E., King, L., Wehmeyer, M. L., & Shogren, K. A. (2017). An accessible testing, learning and assessment system for people with intellectual disability. International Journal of Developmental Disabilities, 63(4), 204–210.
Fox, J. (2003). Effect displays in R for generalised linear models. Journal of Statistical Software, 8(15), 1–27.
Fujiura, G. T., & RRTC Expert Panel on Health Measurement. (2012). Self-reported health of people with intellectual disability. Intellectual and Developmental Disabilities, 50(4), 352–369.
Hargreaves, A. (2000). Mixed emotions: Teachers’ perceptions of their interactions with students. Teaching and Teacher Education, 16(8), 811–826.
Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81–112.
Hlavac, M. (2015). Stargazer: Well-formatted regression and summary statistics tables. R package (Version 5.2).
Leeper, T. J. (2018). margins: Marginal Effects for Model Objects. R package (Version 0.3.23).
Looney, A., Cumming, J., van Der Kleij, F., & Harris, K. (2018). Reconceptualising the role of teachers as assessors: Teacher assessment identity. Assessment in Education: Principles, Policy & Practice, 25(5), 442–467.
R Core Team. (2018). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.
Revelle, W. (2018). psych: Procedures for personality and psychological research. Evanston, IL: Northwestern University.
Rizopoulos, D. (2006). ltm: An R package for latent variable modelling and item response theory analyses. Journal of Statistical Software, 17(5), 1–25.
Swedish Schools Inspectorate. (2010). https://www.skolinspektionen.se/sv/Beslut-och-rapporter/Publikationer/Granskningsrapport/Kvalitetsgranskning/Undervisningen-i-svenska-i-grundsarskolan/
Tanis, E. S., Palmer, S., Wehmeyer, M. L., Davies, D., Stock, S., Lobb, K., & Bishop, B. (2012). Self-report computer-based survey of technology use by people with intellectual and developmental disabilities. Intellectual and Developmental Disabilities, 50(1), 53–68.
The Swedish National Agency for Education. (2017). Grundsärskolan Personalstatistik Läsåret 2016/17.
Thurlow, M. L., Lazarus, S. S., & Hodgson, J. R. (2012). Leading the way to appropriate selection, implementation, and evaluation of the read-aloud accommodation. Journal of Special Education Leadership, 25(2), 72-80.
Turnbull III, H. R., Turnbull, A. P., Wehmeyer, M. L., & Park, J. (2003). A quality of life framework for special education outcomes. Remedial and Special Education, 24(2), 67–74.
Viel-Ruma, K., Houchins, D., Jolivette, K., & Benson, G. (2010). Efficacy beliefs of special educators: The relationships among collective efficacy, teacher self-efficacy, and job satisfaction. Teacher Education and Special Education, 33(3), 225–233.
Wehmeyer, M. L., & Abery, B. H. (2013). Self-determination and choice. Intellectual and Developmental Disabilities, 51(5), 399–411.
Copyright ©2019 Reichenberg, M., Lofgren, M. This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0)