Feasibility and Reproducibility of the NIH Consensus Criteria To Evaluate Response in Chronic Graft Versus Host Disease (cGvHD).
Mitchell, S; Jacobsohn, D; Thormann, K; Cowen, E; Fall-Dickson, J; Turner, M; Schubert, M; Baird, K; Bolanos-Meade, J; Boyd, K; Gerber, L ...
Published in: Blood
Background The lack of standardized response criteria is a major obstacle to the development of therapeutic agents for cGvHD. Consensus criteria for evaluating response in cGvHD have been recently published (BBMT, 2006;12:252). We report on 3 pilot trials evaluating the feasibility and reproducibility of these proposed response criteria.Methods Oncology clinicians (n=27) with limited experience with cGvHD, participated in a 2.5 hour training session and received a syllabus and a photo atlas illustrating common manifestations of cGvHD. Feasibility and inter-rater agreement between experts in cGvHD (transplantation, dermatology, oral medicine, and rehabilitation medicine) and novice raters were evaluated using 15 pediatric and adult patients with varying manifestations of cGvHD. Data from each trial were used to strengthen the criteria and the teaching tools, and these materials were then re-tested. Intraclass correlation coefficients (ICCs) and percent agreement (#Agreements/[#Agreements + #Disagreements] × 100) were used in the analysis.ResultsResponse Criterion Trial 1 (8 novices; 4 adult patients) Trial 2 (10 novices; 6 pediatric patients) Trial 3 (9 novices; 5 adult patients) Oral(Median ICC/Range) .50(−.62–.84) .44(.14–.81) .57(−.06–.74) Erythema(Median ICC/Range) .88(.12–.93) .07(−.01–.97) .47(.08–.82) Movable Sclerosis(Median ICC/Range) .33(−.65–.71) .21(−.38–.81) .60(.11–.91) Non-movable Sclerosis(Median ICC/Range) .23(−.06–.62) .16(−.45–.77) .62(.37–.96) Ulcers(%agreement) 97% 95% 67% Gastrointestinal(%agreement) 83%–100% 57%–93% 68%–100% Functional Performance(% novices within 95% confidence interval of expert) 2 Minute Walk:66%;Grip Strength:75% Not evaluated 2 Minute Walk:60%;Grip Strength:66% Performance Status(novices within +/−20% of expert) 100% 80% 100%The concordance between expert and novices for global ratings of cGvHD severity and disease course was also evaluated. The median time to perform and document the evaluation, which includes a Schirmer test, ranged from 32–36 minutes. In the third trial, the agreement among experts for the dimensions of oral manifestations (ICC=0.7) and skin erythema (ICC=0.54) approached satisfactory values. Overall, interobserver agreement was modest; although in the third trial the median ICCs between novices and experts for movable sclerosis, non-movable sclerosis, and oral findings began to approach the 0.7 level.Conclusions These data provided critical information to guide the successive refinement of the response criteria and training materials resulting in improved reproducibility in specific domains, such as oral and movable and non-movable sclerosis. While these data offer preliminary evidence of feasibility, it was challenging for novices to acquire in a single session the skills necessary to grade these patients reliably. Work is ongoing to refine the training materials, and future efforts will examine whether experienced transplant clinicians who receive this educational program can apply the response criteria with greater reproducibility.