Advertisement

FACE-Q craniofacial module: Part 2 Psychometric properties of newly developed scales for children and young adults with facial conditions

Open AccessPublished:March 24, 2021DOI:https://doi.org/10.1016/j.bjps.2021.03.009

      Summary

      Background

      The FACE-Q Craniofacial Module is a patient-reported outcome measure designed for patients aged 8 to 29 years with conditions associated with a facial difference. In part 1, we describe the psychometric findings for the original CLEFT-Q scales tested in patients with cleft and noncleft facial conditions. The aim of this study was to examine psychometric performance of new FACE-Q Craniofacial Module scales.

      Methods

      Data were collected between December 2016 and December 2019 from patients aged 8 to 29 years with conditions associated with a visible or functional facial difference. Rasch measurement theory (RMT) analysis was used to examine psychometric properties of each scale. Scores were transformed from 0 (worst) to 100 (best) for tests of construct validity.

      Results

      1495 participants were recruited with a broad range of conditions (e.g., birthmarks, facial paralysis, craniosynostosis, craniofacial microsomia, etc.) RMT analysis resulted in the refinement of 7 appearance scales (Birthmark, Cheeks, Chin, Eyes, Forehead, Head Shape, Smile), two function scales (Breathing, Facial), and an Appearance Distress scale. Person separation index and Cronbach alpha values met criteria. Three checklists were also formed (Eye Function, and Eye and Face Adverse Effects). Significantly lower scores on eight of nine scales were reported by participants whose appearance or functional difference was rated as a major rather than minor or no difference. Higher appearance distress correlated with lower appearance scale scores.

      Conclusion

      The FACE-Q Craniofacial Module scales can be used to collect and compare patient reported outcomes data in children and young adults with a facial condition.

      Keywords

      Introduction

      Patient-reported outcome measures (PROM) used in research with children and young adults with conditions associated with a facial difference lack content validity in terms of appearance and facial function.
      • Wickert N.M.
      • Riff K.W.
      • Mansour M.
      • et al.
      Content validity of patient-reported outcome instruments used with pediatric patients with facial differences: a systematic review.
      • Tapia V.J.
      • Epstein S.
      • Tolmach O.S.
      • Hassan A.S.
      • Chung N.N.
      • Gosman A.A.
      Health-related quality-of-life instruments for pediatric patients with diverse facial deformities: a systematic literature review.
      A new PROM for such patients is needed to inform clinical care and to include the patient perspective in research efforts. Our team previously created the CLEFT-Q to address the most common craniofacial anomaly.
      • Wong Riff K.W.
      • Tsangaris E.
      • Goodacre T.
      • et al.
      International multiphase mixed methods study protocol to develop a cross-cultural patient-reported outcome instrument for children and young adults with cleft lip and/or palate (CLEFT-Q).
      The CLEFT-Q was developed and refined using qualitative methods
      • Wong Riff K.W.Y.
      • Tsangaris E.
      • Goodacre T.E.E.
      What matters to patients with cleft lip and/or palate: an international qualitative study informing the development of the CLEFT-Q.
      • Tsangaris E.
      • Wong Riff K.W.Y.
      • Goodacre T.
      • et al.
      Establishing content validity of the CLEFT-Q: a new patient-reported outcome instrument for cleft lip/palate.
      and field-tested internationally with 2434 patients from 12 countries.
      • Klassen A.F.
      • Riff K.W.Y.
      • Longmire N.M.
      • et al.
      Psychometric findings and normative values for the CLEFT-Q based on 2434 children and young adult patients with cleft lip and/or palate from 12 countries.
      The CLEFT-Q includes an Eating/Drinking checklist and 12 scales designed to measure appearance (of the face, nose, nostrils, teeth, lips, jaws and cleft lip scar), health-related quality of life (psychological, school and social function and speech distress), and speech function.
      After developing CLEFT-Q, in order to address noncleft craniofacial conditions, we interviewed 84 patients aged 8 to 29 years with 28 different congenital and acquired conditions (e.g., microtia, facial paralysis, craniosynostosis, craniofacial microsomia and birthmarks).
      • Longmire N.M.
      • Wong Riff K.W.Y.
      • O'Hara J.L.
      • et al.
      Development of a new module of the FACE-Q for children and young adults with diverse conditions associated with visible and/or functional facial differences.
      This qualitative study provided the evidence to support the use of CLEFT-Q scales with patients with noncleft craniofacial conditions. The qualitative study also identified the need for additional scales to measure constructs not covered by the CLEFT-Q. Our team used the qualitative data to design new scales measuring additional aspects of appearance, facial function and health-related quality of life not captured in the CLEFT-Q. The full set of scales that form the FACE-Q Craniofacial Module are shown in Table 1.
      Table 1FACE-Q Craniofacial Module for children and young adults.
      AppearanceFunctionHealth-Related Quality of LifeAdverse Effects
      Birthmark
      FACE-Q scales described in this paper.
      Head Shape
      FACE-Q scales described in this paper.
      Breathing
      FACE-Q scales described in this paper.
      Appearance Distress
      FACE-Q scales described in this paper.
      Ears+
      Cheeks
      FACE-Q scales described in this paper.
      Jaws
      Scales originally part of CLEFT-Q; + Scales part of EAR-Q.
      Eating/ Drinking
      Scales originally part of CLEFT-Q; + Scales part of EAR-Q.
      PsychologicalǂEye
      FACE-Q scales described in this paper.
      Chin
      FACE-Q scales described in this paper.
      Lips
      Scales originally part of CLEFT-Q; + Scales part of EAR-Q.
      Eye
      FACE-Q scales described in this paper.
      Social
      Scales originally part of CLEFT-Q; + Scales part of EAR-Q.
      Face
      FACE-Q scales described in this paper.
      Ears+Nose
      Scales originally part of CLEFT-Q; + Scales part of EAR-Q.
      Facial
      FACE-Q scales described in this paper.
      School
      Scales originally part of CLEFT-Q; + Scales part of EAR-Q.
      Eyes
      FACE-Q scales described in this paper.
      NostrilsǂSpeech
      Scales originally part of CLEFT-Q; + Scales part of EAR-Q.
      Speech Distress
      Face
      Scales originally part of CLEFT-Q; + Scales part of EAR-Q.
      Teethǂ
      Forehead
      FACE-Q scales described in this paper.
      Smile
      FACE-Q scales described in this paper.
      low asterisk FACE-Q scales described in this paper.
      ǂ Scales originally part of CLEFT-Q; + Scales part of EAR-Q.
      The psychometric findings for the scales that form the FACE-Q are published in separate papers. Elsewhere, we describe 2 scales developed for patients with a variety of ear conditions (i.e., EAR-Q).

      Klassen A.F., Rae C., Bulstrode N.W., et al. An international study to develop the EAR-Q patient-reported outcome measure for children and young adults with ear conditions. J Plast Reconstr Aesthet Surg. 2021 Feb 5.

      In this journal, we have also published Part 1 that describes the findings for the validation of a set of CLEFT-Q scales/checklist used in patients with noncleft facial conditions.

      Klassen A.F., et al. FACE-Q craniofacial module: part 1 validation of CLEFT-Q scales for use in children and young adults with facial conditions. J Plast Reconstr Aesthet Surg [Submitted].

      The aim of this paper (Part 2) is to describe the reliability and validity findings for 13 new FACE-Q scales tested in patients with a broad range of craniofacial conditions.

      Methods

      We obtained ethics board approval for the study coordinating site (Hamilton Integrated Research Ethics Board) and from the ethics board at each participating site prior to starting the study. Written and informed assent and/or consent was obtained from the study participants and their guardians.

      Data collection

      The psychometric analysis included data from two studies as follows

      FACE-Q field-test study

      Data were collected from patients aged 8 to 29 years with a wide range of craniofacial conditions as part of the FACE-Q Craniofacial Module field-test study. Participants included anyone with a congenital or acquired visible facial and/or facial function difference. Participants who could not complete the scales independently were excluded. For the Birthmark scale, recruitment included patients aged 8 to 29 years with birthmarks anywhere on the face or body. Data for these participants without a facial difference were only used in the validation of the Birthmark scale.
      Patients were recruited from hospital clinics and social media sites. In the hospital clinics, data collection took place face-to-face during clinic visits using electronic (tablets) or paper-and pencil (booklets) means depending on each site's preference. Data collection took place between December 2016 and December 2019 at 24 sites in nine countries. We also recruited through social media sites (i.e., Microtia UK, US Moebius Syndrome Foundation, Bell's Palsy and Facial Paralysis Foundation, and Facial Palsy UK). Members were sent study recruitment materials and invited to complete the survey online.
      A clinical form was used by site recruiters. The form comprised of a matrix that listed the facial areas (e.g., jaw, lips, nose) and functional concerns (e.g., eating, speaking) related to each FACE-Q scale, by the severity (no, yes-minor, yes-major) of each appearance or functional concern. Additional questions asked child's age and diagnoses, and whether the child had facial surgery in the past six months. The form was used to ensure participants completed only relevant scales. For example, the Cheeks scale was completed by participants with a minor or major difference in cheek appearance, and/or patients with specific diagnoses (i.e., Craniofacial Microsomia, and Syndromic Craniosynostosis). All data were collected and managed using the secure REDCap® electronic data capture tools
      • Harris P.A.
      • Taylor R.
      • Thielke R.
      • et al.
      Research electronic data capture (REDCap) — A metadata-driven methodology and workflow process for providing translational research informatics support.
      • Harris P.A.
      • Taylor R.
      • Minor B.L.
      The REDCap consortium: Building an international community of software partners.
      hosted at McMaster University (Canada).

      Pediatric head and neck cancer study

      FACE-Q Craniofacial Module scales were included in an international follow-up study of pediatric head and neck cancer. Participants, now aged 8 to 29 years, were aged 0 to 18 years and treated with chemotherapy, and local therapy consisting of surgery and/or radiotherapy for a head and neck tumor. This study collected data with questionnaire booklets during outpatient clinics held in the Netherlands, France, the United Kingdom, and United States. Participants were invited to complete a range of FACE-Q scales. Data were entered into the REDCap® database hosted at McMaster University (Canada).

      Statistical analysis

      Data were analyzed using SPSS® version 26.0 (IBM Corporation, Armonk NY, USA for Windows®/Apple Mac®) and RUMM2030 software (RUMM version 2030, RUMM Laboratory Pty Ltd., Duncraig, Western Australia, 1998–14). To examine reliability and validity, Rasch Measurement Theory (RMT) analysis was performed.
      • Rasch G.
      Probabilistic models for some intelligence and attainment tests.
      • Andrich D.
      Rasch Models for Measurement. Sage University Papers Series Quantitative Applications in the Social Sciences, Vol. 07-068.
      Specifically, a set of statistical and graphical tests were conducted to examine whether the observed data fit the Rasch model for each scale.
      • Rasch G.
      Probabilistic models for some intelligence and attainment tests.
      • Andrich D.
      Rasch Models for Measurement. Sage University Papers Series Quantitative Applications in the Social Sciences, Vol. 07-068.
      • Hobart J.
      • Cano S.
      Improving the evaluation of therapeutic intervention in MS: the role of new psychometric methods.
      The following tests were performed:
      Item fit: To determine if the items of each scale worked together clinically and statistically, item fit was examined. We examined item response options to determine if the item thresholds were properly ordered.
      • Wright BD M.G
      Rating Scale Analysis.
      We also examined graphical (item characteristic curves) and statistical (log residuals (item–person interaction) and Chi-square values (item–trait interaction)) indicators of item fit. Ideal fit residuals fall between −2.5 and +2.5 with Chi-square values that are nonsignificant after Bonferroni adjustment.
      • Andrich D.
      Rasch Models for Measurement. Sage University Papers Series Quantitative Applications in the Social Sciences, Vol. 07-068.
      For the Appearance Distress scale, due to the large sample size, we amended the analysis to 500 for tests of fit statistics.
      • Hobart J.
      • Cano S.
      Improving the evaluation of therapeutic intervention in MS: the role of new psychometric methods.
      Targeting: Scales should be designed such that they have a set of items that provide information for all levels of the concept as experienced by the sample.
      • Hobart J.
      • Cano S.
      Improving the evaluation of therapeutic intervention in MS: the role of new psychometric methods.
      We examined the items in each scale to determine their spread and whether that matched the range of the construct reported by the sample. Scales were examined graphically (person-item threshold distribution) and statistically (proportion of the sample to score outside the range of each scale's measurement).
      Differential Item Function (DIF): We examined DIF for age, gender, and language (English versus other). DIF was computed for any scale when there were 150 or more participants per subgroup (to allow for 50 participants in each of three class intervals). Based on sample size, we were able to examine gender, language and age for four subgroups (8–10, 11–13, 14–17, 18–29 years) for Appearance Distress. For the remaining scales, we were able to examine gender and age for two subgroups (8–12, 13–17 years). DIF analysis was repeated three times, each time selecting a random sample to ensure the subgroups were of equal size. Since the analysis for Appearance Distress included a large sample size, we computed DIF with and without adjusting the sample size to 500. Items with significant chi-square p-values after Bonferroni adjustments were split on the sample characteristic that evidenced DIF, and the new and original person locations were correlated (Spearman Correlation) to determine the impact of DIF on scoring.
      • Andrich D.
      Rasch Models for Measurement. Sage University Papers Series Quantitative Applications in the Social Sciences, Vol. 07-068.
      Reliability: Scale reliability was examined by computing Person Separation Index (PSI) and Cronbach alpha.
      • Cronbach L.J.
      Coefficient alpha and the internal structure of tests.
      Reliability coefficients greater than or equal to 0.70 were considered adequate.
      • Nunnally J.C.
      Psychometric Theory.
      To determine whether items were influenced by responses to other items in a scale (which can artificially inflate reliability), we identified residual correlations between items over 0.20 and performed a subtest to measure their impact on the PSI value.
      • Wright BD M.G
      Rating Scale Analysis.
      To examine construct validity, we transformed the Rasch logit scores into 0 (worse) to 100 (best) to test specific hypothesis. P-values less than 0.05 were considered significant.   Normality was assessed using Kurtosis (absolute >2) and Skewness (absolute > 2),
      • Kim H.Y.
      Statistical notes for clinical researchers: assessing normal distribution (2) using skewness and kurtosis.
      and non-parametric statistics were applied if distributions were non-normal. First, we hypothesised that FACE-Q scale scores would be lower in patients with a major versus a minor or no difference in appearance and function. Second, based on published findings that most CLEFT-Q scale scores were lower for older patients, and some scales scores were lower for female gender,
      • Klassen A.F.
      • Riff K.W.Y.
      • Longmire N.M.
      • et al.
      Psychometric findings and normative values for the CLEFT-Q based on 2434 children and young adult patients with cleft lip and/or palate from 12 countries.
      we hypothesised that FACE-Q scale scores would also be lower in both older patients and female patients, and further that lower scores on the Appearance Distress scale would moderately correlate with lower scores on the appearance scales. Finally, we hypothesised that scale scores would correlate more strongly within their domain (e.g., appearance) than with scales in other domains. Correlation coefficients were interpreted as follows: <0.3 negligible, 0.30 to 0.49 low, 0.50 to 0.69 moderate, 0.70 to 0.89 high, 0.9 to 1.00 very high.
      • Mukaka M.M.
      A guide to appropriate use of correlation coefficient in medical research.

      Results

      Table 2 shows characteristics for the 1495 participants who provided a total of 1509 assessments. Participants with a range of facial conditions were recruited. Of the 271 participants with a birthmark, 60 had the birthmark on their body and no facial condition. These participants were only included in the RMT analysis for the Birthmark scale.
      Table 2Characteristics (Number,%) for the 1495 participants.
      N%
      Country
      Australia382.5
      Brazil17811.9
      Canada82855.4
      Chile70.5
      France60.4
      Ireland1137.6
      Sweden130.9
      United Kingdom18512.4
      United States1268.4
      Other10.1
      Language
      English129086.3
      French60.4
      Portuguese17811.9
      Spanish70.5
      Swedish130.9
      Age in years
      8–1033522.4
      11–1335523.7
      14–1742928.7
      18–2937625.2
      Gender
      Male65543.8
      Female83555.9
      Other40.3
      Missing10.1
      Main Condition
      Condition listed represents the main diagnosis, classifications may have varied by site. 14.7% of participants had multiple conditions.
      N%
      BIRTHMARK
      Congenital melanocytic naevus442.9
      Haemangioma734.9
      Sebaceous naevus181.2
      Vascular malformation1429.5
      Birthmark other40.3
      EAR CONDITION
      Microtia453.0
      Prominent ears372.5
      Ear other100.7
      SKELETAL
      Acquired Skeletal553.7
      Craniofacial microsomia785.2
      Craniofrontonasal condition271.8
      Craniosynostosis non-syndromic17511.7
      Craniosynostosis syndromic1117.4
      Fibrous dysplasia302
      Mandibular condition392.6
      Multiple bony anomalies191.3
      Post-traumatic bony defect422.8
      Other congenital skeletal211.4
      SOFT TISSUE
      Acquired soft tissue302
      Congenital soft tissue151
      Neurofibromatosis type 1312.1
      Parry-Romberg Syndrome442.9
      Soft tissue other151
      TRAUMA
      Bite100.7
      Fracture714.7
      Laceration120.8
      Burn201.3
      Trauma other241.6
      OTHER
      Cancer181.1
      Facial paralysis614.1
      Other syndrome211.4
      Orthodontic15310.2
      low asterisk Condition listed represents the main diagnosis, classifications may have varied by site. 14.7% of participants had multiple conditions.
      RMT analysis provided evidence of reliability and validity for 10 of the 13 scales tested in this study. The three scales that did not work psychometrically were Eye Function, Eye Adverse Effects and Face Adverse Effects. Each scale had one or more items with disordered thresholds. After we rescored each scale's items across their two middle response options, and deleted seven items deemed redundant, the item fit statistics in the three scales were acceptable, but scale reliability was low in terms of the PSI values. Table 3 shows the three sets of items used as problem checklist.
      Table 3Number (%) of participant to report each eye and facial problem.
      Very muchQuite a bitA little bitNot at allMissing
      n%n%n%n%n%
      EYE FUNCTION
      …eyelids close unexpectedly42.321.1158.615387.410.6
      …opening eyelids95.152.91910.914180.610.6
      …blinking eyes137.4105.7148.013677.121.1
      …seeing properly137.474.03218.312269.710.6
      …closing eyelids2212.6105.7148.012772.621.1
      …eyelids closed when asleep2011.4137.42112.011766.942.3
      …one eye works better4626.3126.93017.18649.110.6
      EYE ADVERSE EFFECTS
      …eyelids twitch10.663.44022.912772.610.6
      …eyes are sore (hurt)10.6105.74525.711968.000
      …eyes are itchy31.763.44726.911968.000
      …whites of eyes are red52.9105.73218.312772.610.6
      …something in eye(s)63.4158.63318.912169.100
      …eyes water too much74.0179.74726.910358.910.6
      …eyes are dry148.0137.43117.711766.900
      FACE ADVERSE EFFECTS
      …face is bruised21.263.62515.013480.200
      …face feels sore21.2116.63319.812172.500
      …face feels tingly31.8106.02313.813077.810.6
      …face feels sensitive31.8137.83822.811267.110.6
      …face feels itchy31.8148.43319.811669.510.6
      …face feels numb84.874.22414.412776.010.6
      …face is puffy or swollen84.8127.22716.211971.310.6
      …face feels uncomfortable95.4116.63420.411267.110.6
      …face feels tight84.8137.82917.411669.510.6
      …face feels firm74.21911.43118.610864.721.2
      The number of items tested across the remaining ten scales was reduced by 32 to 85 items. Thresholds were disordered for 9 of 10 items in the Facial Function scale. When items for this scale were rescored across the two middle options, all had ordered thresholds. The RMT analysis proceeded for this scale using the rescored data. All 85 items had nonsignificant Chi-square p values after Bonferroni adjustment (see Appendix 1). The item fit was within +2.5 for 74 items.
      Figure 1, Figure 2, Figure 3 shows the distribution of person measurement and item location for an appearance (Smile), health-related quality of life (Appearance Distress) and function (Breathing) scale to illustrate targeting. The proportion of the sample to score within the range of each scale's measurement was 88.9% (Smile), 92.7% (Breathing) and 78.9% (Appearance Distress). Participants who scored outside the range (to the right in each figure) were participants with high scores on each scale indicating better outcomes.
      Figure 1
      Figure 1Person-item threshold distributions as examples of targeting for the Smile scale.
      Figure 2
      Figure 2Person-item threshold distributions as examples of targeting for the Appearance Distress scale.
      Figure 3
      Figure 3Person-item threshold distributions as examples of targeting for the Breathing scale.
      Based on the sample sizes in subgroups, we were able to examine DIF for age, gender and language for the Appearance Distress scale, and by age for four appearance scales (Eyes, Forehead, Head Shape, Smile) and gender for five appearance scales (Cheeks, Eyes, Forehead, Head Shape, Smile). For the Appearance Distress scale, the unadjusted analysis DIF was evident for one item for gender (people stare), three items for age-group (self-conscious, people stare, unhappy) and four items by language (self-conscious, people stare, mirror, going out). In the adjusted analysis, there was evidence of DIF in one item by age (self-conscious). For the five appearance scales where DIF was examined, one item (match) in the Head Shape scale evidenced DIF by age. All items that evidenced DIF in the unadjusted analysis, had very negligible impact on the scoring when the items with DIF were split and person locations correlated (all ≥0.99).
      Data from the sample fit the Rasch model with nonsignificant p-values for six scales (see Table 4). For the remaining four scales, the p-values showed slight misfit to the Rasch model. For the health-related quality of life and appearance scales, reliability was high with PSI values ≥0.83 with and without extremes, and Cronbach alpha values ≥0.87 with and without extremes. Reliability for the two function scales was lower, with PSI values ≥0.71 with and ≥0.69 without extremes, and Cronbach alpha values >0.80 with and >0.74 without extremes, respectively. Residuals in one or more item pairs in seven scales were correlated above 0.20. The impact of these correlations on the PSI values for five scales (Appearance Distress, Chin, Cheeks, Eyes, Smile) represented a drop in PSI value of ≥0.01. For the remaining two scales the drop in PSI was larger at 0.05 (Birthmark) and 0.09 (Facial Function).
      Table 4Rasch Measurement Theory scale level statistics.
      Scale# items tested# items retainedFull sampleSample in RMT analysis% scored on scaleChi-squareDFp-valuePSI +extPSI -extCronbach alpha +extCronbach alpha -ext
      Appearance Distress1081402110678.976.16640.140.830.840.930.89
      Birthmark14827120475.316.71160.400.870.850.950.89
      Chin12925820880.616.68180.550.930.910.970.93
      Cheeks10939630577.044.45360.160.930.910.970.93
      Eyes14946835175.053.78360.030.890.890.960.91
      Forehead151055446583.970.35600.170.890.880.940.91
      Head Shape8642734179.940.30240.020.880.840.930.87
      Smile15949744288.970.62450.010.910.890.940.91
      Breathing7719117792.717.90140.210.740.690.800.74
      Facial Function121013210982.638.36200.010.710.720.890.85
      DF - Degrees of freedom; PSI - Person Separation Index; ext - extremes.
      Based on Skewness and Kurtosis values, all data were normally distributed and parametric statistics were applied. Figure 4 shows the mean score on each FACE-Q scale by the severity rating. The hypothesis that participants with a major difference in appearance and function would score lower on FACE-Q scales was supported. Differences between group means was significant for all scales (p≤0.001 on ANOVA).
      Figure 4
      Figure 4Mean score for each FACE-Q Craniofacial Module scale by severity of appearance or functional difference. Significant association between severity and scale score for 8 scales (p ≤ 0.001);
      Footnote- *Post hoc no significant differences none vs minor (p ≥ 0.184) for Chin, Breathing and Facial Function; none vs major for Chin (p = 0.085) – sample size for chin difference 'none' category n = 12.
      Females reported lower scores on independent samples t-tests for the Appearance Distress (mean diff=4.3; SE 1.1; p<0.001), Eyes (mean diff=6.2; SE=2.5 p = 0.013) and Smile (mean diff=4.3; SE=2.1; p = 0.041) scales. Differences between the three age groups (8–13 yrs;14–19 yrs; 20–29 yrs) were observed for all scales (p<0.001 on ANOVA), except for the following scales: Birthmark (p = 0.270), Breathing (p = 0.523) and Facial Function (p = 0.059) (See Figure 5). Appearance Distress correlated moderately with the scores for appearances based on Pearson's correlation co-efficient (see Table 5). Finally, correlations between scales within domains were stronger, as hypothesised, than with other domains.
      Figure 5
      Figure 5Mean score for each FACE-Q Craniofacial Module scale by age group. Significant association between age group and scale score for 7 scales (p ≤ 0.001); No association between age group and the following scale scores: Birthmark (p = 0.217), Breathing (p = 0.523) and Facial Function (p = 0.059).
      Footnote -Post hoc tests showed no significant difference between 8-13 and 14-17 (p ≥ 0.223) for Forehead and Smile; and between 14-17 and 18-29 (p ≥ 0.089) for Cheek, Chin, Eyes, Head Shape, and Smile.
      Table 5Correlations between scales.
      ScaleHRQOLAPPEARANCEFUNCTION
      DistressBirthmarkCheeksChinEyesForeheadHead ShapeSmileBreathing
      Distress
      Birthmark0.431
      Correlation is significant at the 0.01 level (2-tailed).;
      Cheeks0.489
      Correlation is significant at the 0.01 level (2-tailed).;
      0.228
      Chin0.588
      Correlation is significant at the 0.01 level (2-tailed).;
      0.392
      Correlation is significant at the 0.05 level (2-tailed). HRQOL - health-related quality of life.
      0.677
      Correlation is significant at the 0.01 level (2-tailed).;
      Eyes0.556
      Correlation is significant at the 0.01 level (2-tailed).;
      0.3210.579
      Correlation is significant at the 0.01 level (2-tailed).;
      0.535
      Correlation is significant at the 0.01 level (2-tailed).;
      Forehead0.476
      Correlation is significant at the 0.01 level (2-tailed).;
      0.442
      Correlation is significant at the 0.01 level (2-tailed).;
      0.571
      Correlation is significant at the 0.01 level (2-tailed).;
      0.588
      Correlation is significant at the 0.01 level (2-tailed).;
      0.713
      Correlation is significant at the 0.01 level (2-tailed).;
      Head Shape0.565
      Correlation is significant at the 0.01 level (2-tailed).;
      0.519
      Correlation is significant at the 0.05 level (2-tailed). HRQOL - health-related quality of life.
      0.672
      Correlation is significant at the 0.01 level (2-tailed).;
      0.591
      Correlation is significant at the 0.01 level (2-tailed).;
      0.677
      Correlation is significant at the 0.01 level (2-tailed).;
      0.734
      Correlation is significant at the 0.01 level (2-tailed).;
      Smile0.500
      Correlation is significant at the 0.01 level (2-tailed).;
      0.2780.674
      Correlation is significant at the 0.01 level (2-tailed).;
      0.503
      Correlation is significant at the 0.01 level (2-tailed).;
      0.514
      Correlation is significant at the 0.01 level (2-tailed).;
      0.491
      Correlation is significant at the 0.01 level (2-tailed).;
      0.735
      Correlation is significant at the 0.01 level (2-tailed).;
      Breathing0.370
      Correlation is significant at the 0.01 level (2-tailed).;
      0.2210.190
      Correlation is significant at the 0.05 level (2-tailed). HRQOL - health-related quality of life.
      0.413
      Correlation is significant at the 0.01 level (2-tailed).;
      0.326
      Correlation is significant at the 0.01 level (2-tailed).;
      0.315
      Correlation is significant at the 0.01 level (2-tailed).;
      0.254
      Correlation is significant at the 0.01 level (2-tailed).;
      0.218
      Correlation is significant at the 0.05 level (2-tailed). HRQOL - health-related quality of life.
      Facial Function0.379
      Correlation is significant at the 0.01 level (2-tailed).;
      0.0670.349
      Correlation is significant at the 0.01 level (2-tailed).;
      0.305
      Correlation is significant at the 0.05 level (2-tailed). HRQOL - health-related quality of life.
      0.267
      Correlation is significant at the 0.05 level (2-tailed). HRQOL - health-related quality of life.
      0.330
      Correlation is significant at the 0.01 level (2-tailed).;
      0.2810.285
      Correlation is significant at the 0.01 level (2-tailed).;
      0.382
      Correlation is significant at the 0.01 level (2-tailed).;
      low asterisklow asterisk Correlation is significant at the 0.01 level (2-tailed).;
      low asterisk Correlation is significant at the 0.05 level (2-tailed). HRQOL - health-related quality of life.

      Discussion

      Surgical treatments for conditions associated with a facial difference are often complex and burdensome for patients. Outcome measures used to evaluate operations that aim to change how someone looks and/or their facial function should measure the patient perspective given the subjective nature of such outcomes. Our research here and elsewhere
      • Longmire N.M.
      • Wong Riff K.W.Y.
      • O'Hara J.L.
      • et al.
      Development of a new module of the FACE-Q for children and young adults with diverse conditions associated with visible and/or functional facial differences.

      Klassen A.F., Rae C., Bulstrode N.W., et al. An international study to develop the EAR-Q patient-reported outcome measure for children and young adults with ear conditions. J Plast Reconstr Aesthet Surg. 2021 Feb 5.

      Klassen A.F., et al. FACE-Q craniofacial module: part 1 validation of CLEFT-Q scales for use in children and young adults with facial conditions. J Plast Reconstr Aesthet Surg [Submitted].

      • Tassi A.
      • Tan J.
      • Piplani B.
      • et al.
      Establishing content validity of an orthodontic subset of the FACE‐Q Craniofacial Module in children and young adults with malocclusion.
      • Klassen A.F.
      • Rae C.
      • Gallo L.
      • et al.
      Psychometric Validation of the FACE-Q Craniofacial Module for Facial Nerve Paralysis.
      shows that the FACE-Q Craniofacial Module for provides reliable and valid measurement of outcomes that matter to children and young adults with a broad range of facial conditions. The use of a modern psychometric approach (RMT analysis) made it possible to identify any problems within each scale. We dropped some items and rescored some response options after which the psychometric findings provided evidence of reliability and validity for ten scales. Each scale measured a clinical hierarchy for their concepts and worked as hypothesised, with lower scores associated with older age, female gender and having a major facial difference.
      The Eye Function, Eye Adverse Effects, and Face Adverse Effects represented exceptions. While the Rasch approach aims to develop scales that measure unidimensional constructs via a set of items that map out a clinical hierarchy, contrary to our hypotheses, these three sets of items did not work together statistically. We reported a similar finding in the CLEFT-Q field-test, whereby Eating and Drinking did not function like a scale.
      • Klassen A.F.
      • Riff K.W.Y.
      • Longmire N.M.
      • et al.
      Psychometric findings and normative values for the CLEFT-Q based on 2434 children and young adult patients with cleft lip and/or palate from 12 countries.
      Although Eye Function, Eye Adverse Effects and Face Adverse Effects had acceptable Cronbach alpha values, we recommend their use as problem checklists since the overall findings do not support the summing of items to form scale scores. Even though the three checklists do not have a Rasch-based scoring algorithm, they can provide clinically important information, such as monitoring for post-operative complications.
      Recent reviews have drawn attention to the challenge of assessing appearance and body image in patients with craniofacial conditions. Research has shown that patients generally having positive scores for satisfaction with appearance, and that dissatisfaction is generally associated only with the impacted facial area.
      • Stock N.M.
      • Feragen K.B.
      Psychological adjustment to cleft lip and/or palate: a narrative review of the literature.
      • Stock N.M.
      • Feragen K.B.
      Comparing psychological adjustment across cleft and other craniofacial conditions: implications for outcome measurement and intervention.
      The FACE-Q Craniofacial Module addresses this issue by having feature specific appearance scales (e.g., eyes, nose lips). These specific scales can be used in conjunction with the Face scale to capture overall appearance as well as the facial features that are of most concern to the patient.
      The uptake and use of PROMs are rapidly expanding around the world. PROMs provide a means to measure the burden of a condition and the impact of treatments provided to patients. Previously we reported findings about the impact of completing the CLEFT-Q from 2056 children and young adults. Specifically, the majority of participants reported that they liked completing the CLEFT-Q, most liked the questions about how they look (82%), and most felt the same or better about how they look after completing the CLEFT-Q (67%).
      • Klassen A.F.
      • Dalton L.
      • Goodacre T.E.E.
      • et al.
      Impact of completing a patient-reported outcome measure that asks about appearance: an international study to develop the CLEFT-Q.
      A small minority of participants reported that they felt worse about how they look after completion. These findings suggest that patients who complete the FACE-Q Craniofacial Module may have different experiences both positive and negative. Therefore, to minimize the negative impact of completing a PROM, it is important that researchers and cliniciansthoughtfully select which outcome tools to use. While the FACE-Q Craniofacial Module may appear long, no patient needs to complete all the scales. Healthcare professionals and researchers can pick-and-choose from the full set of independently functioning scales the subset best suited to address their specific questions or clinical need. To facilitate benchmarking, five of the FACE-Q scales are applicable to any patient with a facial condition, i.e., Face, Appearance Distress, Psychological, Social and School. The remaining scales are specific to facial area or specific facial functions and would be more useful in the evaluation of specific treatment outcomes.
      Our study has several limitations. First, the sample accrued for the Facial Function scale was slightly less than 150. Rasch analysis uses Chi-square where a sample of 150 provides 50 participants in each of three class intervals for tests of item fit to the Rasch model. We did not collect information about the number of patients that the recruitment staff might have missed, nor about characteristics of patients who refused to participate. The severity ratings of major and minor difference in appearance and facial function were based on the judgement of the recruiter. The sample included a small number of participants with birthmarks who did not have a facial difference. However, data for these participants were excluded from the RMT analysis for any other scales to ensure that only patients with a facial difference were included. COSMIN criteria
      • Prinsen C.A.
      • Mokkink L.B.
      • Bouter L.M.
      • et al.
      COSMIN guideline for systematic reviews of patient-reported outcome measures.
      for psychometric properties of PROMs includes tests that we did not perform in our study due to the length of the field-test questionnaire. These tests, which include test-retest reliability, responsiveness, and correlation with other PROMs, can be examined in future studies.

      Conclusion

      In order to improve care provided to patients with conditions associated with a facial difference, highly specific, carefully designed PROMs are needed. The FACE-Q Craniofacial Module provides healthcare professionals and researchers with a set of tools to measure the patient perspective of outcomes associated with craniofacial care for anyone aged 8 to 29 years.

      Declaration of Competing Interest

      Anne Klassen and Karen Wong are co-developers of the patient-reported outcome scales described in this publication and share in any license revenues as royalties based on their institutions’ inventor sharing policy for their use in for-profit study. The other authors have no conflict of interest to declare in relation to this work.

      Financial disclosure

      The research described in this paper was supported by a grant from the Canadian Institute of Health Research (FRN 148779). The authors have no financial interest to declare in relation to the content of this article. The Article Processing Charge was paid from the CIHR grant.

      Acknowledgment

      We are grateful for the operating grant we received from the Canadian Institutes for Health Research. We are also grateful to the many healthcare professionals and research staff in craniofacial sites around the world for their dedication and help with our research.

      Appendix. Supplementary materials

      References

        • Wickert N.M.
        • Riff K.W.
        • Mansour M.
        • et al.
        Content validity of patient-reported outcome instruments used with pediatric patients with facial differences: a systematic review.
        Cleft Palate Craniofac J. 2018; 55: 989-998
        • Tapia V.J.
        • Epstein S.
        • Tolmach O.S.
        • Hassan A.S.
        • Chung N.N.
        • Gosman A.A.
        Health-related quality-of-life instruments for pediatric patients with diverse facial deformities: a systematic literature review.
        Plast Reconstr Surg. 2016; 138: 175-187
        • Wong Riff K.W.
        • Tsangaris E.
        • Goodacre T.
        • et al.
        International multiphase mixed methods study protocol to develop a cross-cultural patient-reported outcome instrument for children and young adults with cleft lip and/or palate (CLEFT-Q).
        BMJ Open. 2017; 7e015467
        • Wong Riff K.W.Y.
        • Tsangaris E.
        • Goodacre T.E.E.
        What matters to patients with cleft lip and/or palate: an international qualitative study informing the development of the CLEFT-Q.
        Cleft Palate Craniofac J. 2018; 55: 442-450
        • Tsangaris E.
        • Wong Riff K.W.Y.
        • Goodacre T.
        • et al.
        Establishing content validity of the CLEFT-Q: a new patient-reported outcome instrument for cleft lip/palate.
        Plast Reconstr Surg Glob Open. 2017; 5: e1305
        • Klassen A.F.
        • Riff K.W.Y.
        • Longmire N.M.
        • et al.
        Psychometric findings and normative values for the CLEFT-Q based on 2434 children and young adult patients with cleft lip and/or palate from 12 countries.
        CMAJ. 2018; 190: E455-E462
        • Longmire N.M.
        • Wong Riff K.W.Y.
        • O'Hara J.L.
        • et al.
        Development of a new module of the FACE-Q for children and young adults with diverse conditions associated with visible and/or functional facial differences.
        Facial Plast Surg. 2017; 33: 499-508
      1. Klassen A.F., Rae C., Bulstrode N.W., et al. An international study to develop the EAR-Q patient-reported outcome measure for children and young adults with ear conditions. J Plast Reconstr Aesthet Surg. 2021 Feb 5.

      2. Klassen A.F., et al. FACE-Q craniofacial module: part 1 validation of CLEFT-Q scales for use in children and young adults with facial conditions. J Plast Reconstr Aesthet Surg [Submitted].

        • Harris P.A.
        • Taylor R.
        • Thielke R.
        • et al.
        Research electronic data capture (REDCap) — A metadata-driven methodology and workflow process for providing translational research informatics support.
        J Biomed Inform. 2009; 42: 377-381
        • Harris P.A.
        • Taylor R.
        • Minor B.L.
        The REDCap consortium: Building an international community of software partners.
        J Biomed Inform. 2019; 95: 103208
        • Rasch G.
        Probabilistic models for some intelligence and attainment tests.
        Vol. 1 of Studies in Mathematical Psychology. Danmarks Paedagogiske Institut, Copenhagen1960
        • Andrich D.
        Rasch Models for Measurement. Sage University Papers Series Quantitative Applications in the Social Sciences, Vol. 07-068.
        Sage, Thousand Oaks (CA)1988
        • Hobart J.
        • Cano S.
        Improving the evaluation of therapeutic intervention in MS: the role of new psychometric methods.
        Health Technol Assess. 2009; 13 (iii, ix-x): 1-177
        • Wright BD M.G
        Rating Scale Analysis.
        MESA Press, 1982
        • Cronbach L.J.
        Coefficient alpha and the internal structure of tests.
        Psychometrika. 1951; 16: 297-334
        • Nunnally J.C.
        Psychometric Theory.
        3rd Ed. McGraw-Hill, New York, NY1994
        • Kim H.Y.
        Statistical notes for clinical researchers: assessing normal distribution (2) using skewness and kurtosis.
        Restor Dent Endod. 2013; 38: 52-54
        • Mukaka M.M.
        A guide to appropriate use of correlation coefficient in medical research.
        Malawi Med J. 2012; 24: 69-71
        • Tassi A.
        • Tan J.
        • Piplani B.
        • et al.
        Establishing content validity of an orthodontic subset of the FACE‐Q Craniofacial Module in children and young adults with malocclusion.
        Orthod Craniofac Res. 2021; ([EPub ahead of print])
        • Klassen A.F.
        • Rae C.
        • Gallo L.
        • et al.
        Psychometric Validation of the FACE-Q Craniofacial Module for Facial Nerve Paralysis.
        Facial Plast Surg Aesthet Med. 2021;
        • Stock N.M.
        • Feragen K.B.
        Psychological adjustment to cleft lip and/or palate: a narrative review of the literature.
        Psychol Health. 2016; 31: 777-813
        • Stock N.M.
        • Feragen K.B.
        Comparing psychological adjustment across cleft and other craniofacial conditions: implications for outcome measurement and intervention.
        Cleft Palate Craniofac J. 2019; 56: 766-772
        • Klassen A.F.
        • Dalton L.
        • Goodacre T.E.E.
        • et al.
        Impact of completing a patient-reported outcome measure that asks about appearance: an international study to develop the CLEFT-Q.
        Cleft Palate Craniofac J. 2020; 57: 840-848
        • Prinsen C.A.
        • Mokkink L.B.
        • Bouter L.M.
        • et al.
        COSMIN guideline for systematic reviews of patient-reported outcome measures.
        Qual Life Res. 2018; 27: 1147-1157

      Linked Article

      • FACE-Q Craniofacial Module: Part 1 validation of CLEFT-Q scales for use in children and young adults with facial conditions
        Journal of Plastic, Reconstructive & Aesthetic SurgeryVol. 74Issue 9
        • Preview
          The CLEFT-Q includes 12 independently functioning scales that measure appearance (face, nose, nostrils, teeth, lips, jaws), health-related quality of life (psychological, social, school, speech distress), and speech function, and an eating/drinking checklist. Previous qualitative research revealed that the CLEFT-Q has content validity in noncleft craniofacial conditions. This study aimed to examine the psychometric performance of the CLEFT-Q in an international sample of patients with a broad range of facial conditions.
        • Full-Text
        • PDF
        Open Access