Background Reliable interpretation of the Assessment of Physiotherapy Practice (APP) tool is necessary for consistent assessment of physiotherapy students in the clinical setting. However, since the APP was implemented, no study has reassessed how consistently a student performance is evaluated against the threshold standards. Therefore, the primary aim of this study was to determine the consistency among physiotherapy educators when assessing a student performance using the APP tool. Methods Physiotherapists (n = 153) from Australia with a minimum 3 years clinical experience and who had supervised a physiotherapy student within the past 12-months were recruited. Three levels of performance (not adequate, adequate, good/excellent) were scripted and filmed across outpatient musculoskeletal, neurorehabilitation, cardiorespiratory and inpatient musculoskeletal. In the initial phase of the study, scripts were written by academic staff and reviewed by an expert panel (n = 8) to ensure face and content validity as well as clinical relevance prior to filming. In the second phase of the study, pilot testing of the vignettes was performed by clinical academics (n = 16) from Australian universities to confirm the validity of each vignette. In the final phase, study participants reviewed one randomly allocated vignette, in their nominated clinical area and rated the student performance including a rationale for their decision. Participants were blinded to the performance level. Percentage agreement between participants was calculated for each vignette with an a priori percentage agreement of 75% considered acceptable. Results Consensus among educators across all areas was observed when assessing a performance at either the ‘not adequate’ (97%) or the ‘good/excellent’ level (89%). When assessing a student at the ‘adequate’ level, consensus reduced to 43%. Similarly, consensus amongst the ‘not adequate’ and ‘good/excellent’ ranged from 83 to 100% across each clinical area; while agreement was between 33 and 46% for the ‘adequate’ level. Percent agreement between clinical educators was 89% when differentiating ‘not adequate’ from ‘adequate’ or better. Conclusion Consistency is achievable for ‘not adequate’ and ‘good/excellent’ performances, although, variability exists at an adequate level. Consistency remained when differentiating an ‘adequate’ or better from a ‘not adequate’ performance. |