Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Updated Systematic Review and Meta-Analysis of the Performance of Risk Prediction Rules in Children and Young People with Febrile Neutropenia

Abstract

Introduction

Febrile neutropenia is a common and potentially life-threatening complication of treatment for childhood cancer, which has increasingly been subject to targeted treatment based on clinical risk stratification. Our previous meta-analysis demonstrated 16 rules had been described and 2 of them subject to validation in more than one study. We aimed to advance our knowledge of evidence on the discriminatory ability and predictive accuracy of such risk stratification clinical decision rules (CDR) for children and young people with cancer by updating our systematic review.

Methods

The review was conducted in accordance with Centre for Reviews and Dissemination methods, searching multiple electronic databases, using two independent reviewers, formal critical appraisal with QUADAS and meta-analysis with random effects models where appropriate. It was registered with PROSPERO: CRD42011001685.

Results

We found 9 new publications describing a further 7 new CDR, and validations of 7 rules. Six CDR have now been subject to testing across more than two data sets. Most validations demonstrated the rule to be less efficient than when initially proposed; geographical differences appeared to be one explanation for this.

Conclusion

The use of clinical decision rules will require local validation before widespread use. Considerable uncertainty remains over the most effective rule to use in each population, and an ongoing individual-patient-data meta-analysis should develop and test a more reliable CDR to improve stratification and optimise therapy. Despite current challenges, we believe it will be possible to define an internationally effective CDR to harmonise the treatment of children with febrile neutropenia.

Introduction

Febrile neutropenia (FNP) is a common and potentially life-threatening complication of therapy for childhood cancer, which has increasingly been subject to targeted treatment based on clinical risk stratification [1]. For children this move towards risk-directed care is based upon evidence of the low incidence of death [2], the majority of patients being without identified significant infection or sepsis [3], and small randomised trials demonstrating the feasibility of out-patient based treatment for patients at low-risk of septic complications [4]. A large proportion of the evidence for risk stratifications has originated from adult oncology [5] It is acknowledged that children are not ‘little adults’ but distinct in the biology of their malignancies, treatment regimens, infections and psychosocial setting and therefore specific evidence for stratification of children with FNP is needed [6].

Since we undertook a systematic review and meta-analysis of risk stratification systems in 2008 [3], further studies have been published which address this issue [7]. Accordingly we have updated our review to summarise the most recent advances in our knowledge of evidence on the discriminatory ability and predictive accuracy of such risk stratification clinical decision rules (CDR) for children and young people with cancer.

Methods

This update review was conducted in accordance with “Systematic reviews: CRD's guidance for undertaking reviews in health care” [8] and registered on the PROSPERO Registry of systematic reviews: CRD42011001685. It sought studies which aimed to derive or validate a CDR in children or young people (aged 0–18 y) presenting with febrile neutropenia. Both prospective and retrospective cohorts were included, but those using a case-control (“two-gate”) approach were excluded as these have been previously shown to exaggerate diagnostic accuracy estimates [9].

Search strategy and selection criteria

The electronic search strategy [3] was reviewed and repeated on the following databases from February 2009 to September 2011:

  • MEDLINE
  • MEDLINE In-Process & Other Non-Indexed Citations
  • EMBASE
  • CINAHL
  • Literatura Latinoamericana y del Caribe en Ciencias de la Salud (LILACS)

Reference lists of relevant systematic reviews and included articles were reviewed for further relevant articles. Published and unpublished studies were sought and no language restrictions applied. Non-English language studies were translated. Two reviewers independently screened the title and abstract of studies for inclusion, and then the full text of retrieved articles. Disagreements were resolved by consensus.

Validity assessment and data extraction

The validity of each study was assessed as with our previous review using 11 of the 14 questions from the QUADAS assessment tool for diagnostic accuracy studies [10].

Data were extracted by one reviewer and checked by the other. The data extracted included age and sex distribution of the included participants, geographical location of the study, the participant inclusion/exclusion criteria, and the performance of the CDR as a 2*k table (where k refers to the number of strata described) or as sensitivity/specificity, as well as aspects of the methods used to derive the CDR (where applicable).

Methods of analysis/synthesis

Where possible, data from new publications were added to meta-analyses undertaken in the original review [3]. Quantitative synthesis was undertaken when more than 2 studies tested the same CDR, and where appropriate, was investigated for sources for heterogeneity. For this update review, only dichotomous test data were found. For CDR with 3 datasets, a univariate approach was used (pooling sensitivity and specificity separately) [11]. For those with 4 or more, a bivariate model was fitted using ‘metandi’ in STATA10 [12]. The protocol specified a random-effects meta-analysis was undertaken using WinBUGS 1.4.3 [13] for tests with 3 or more risk strata, but no data were found eligible for this analysis.

Heterogeneity between study results was explored through consideration of study populations, study design, CDR and outcomes chosen, although the small number of studies in each category limited this approach. Sensitivity analysis was undertaken by comparing results when the original (derivation) data set was included and excluded.

For those areas where a quantitative synthesis was not possible, a narrative approach was used.

Results

9 articles reporting on 8 studies were eligible for inclusion in the review (see Figure 1). The studies included patients from 2 month to 22 years old, with a wide range of malignancies, and a total of 2591 episodes of FNP describing four groups of outcomes: death, critical care requirement, serious medical complication, and bacteraemia. Six studies undertook prospective data collection, two retrospective. Details of the CDR included in this review are given in Table 1.

Quality assessment

The studies varied in quality. Potential biases due to threats to independent outcome assessment were present in two studies [2], [14], verification bias in two [2], [7], and two were presented only as abstracts [14], [15]. Five definitions of febrile neutropenia were used, with five definitions of fever and two of neutropenia. However, all definitions were clinically similar, with variation was mainly in the duration of time for a lower temperature to be considered ‘prolonged’.

New CDR derivations

Five studies attempted to derive at least one CDR. Four studies examined rules to predict significant medical complications; a group of outcomes generally encompassing death, intensive care admission, significant bacterial or fungal infection, and need for organ support such as supplemental oxygen, inotropes or dialysis [7], [14], [16], [17]. Two examined rules to predict bacteraemia [16], [18], and one intensive care admission [15]. In one case a clear CDR could not be assessed [15]. The CDR used data from the initial/admission assessment, or from a later assessment after approximately 24 hours of observation. The new CDR generally had high sensitivity for the chosen outcome at the expense of poor specificity (see Table 2) and considered patient-disease, patient-episode and laboratory factors. Considerable imprecision in the estimates was seen, due mainly to the small numbers in individual studies (fewer than 350 patients).

thumbnail
Table 2. Diagnostic test accuracy of newly described CDR.

https://doi.org/10.1371/journal.pone.0038300.t002

The newly derived CDR were subject to validation by internal statistical means (cross-validation) or in one alternative data set (see Table 3). In all except one case [15], multivariable regression analysis was used to build the model. One rule was built with a classification and regression tree (CART) approach [15].

Validation of CDR

Four studies [2], [7], [19], [20] were explicit in undertaking validations of 9 previously described CDR. These universally demonstrated poorer discriminatory ability when tested in alternative data sets (see Table 3). The geographical settings for validations of the rules varied from those where the rule had been derived.

Synthesis of CDR accuracy

Meta-analysis was undertaken for two CDR; the “Klaassen” rule and the “Ammann” rule. Two further CDR, the PINDA rule and the “Alexander” rule, have not been subject to meta-analysis as the results are too heterogeneous, these results are presented graphically. Two further CDR, the Rondellini rule and the SPOG2003 rule, have been assessed in two datasets, too few to perform meaningful meta-analysis. No data were available to update the three-stratum “Rackoff” rule meta-analysis of the previous study [3].

The “Klaassen” rule is based on a single feature: an absolute monocyte count of greater than 100/mm3 to predict patients less likely to have significant infection. Data were pooled from 4 studies from the previous review [21], [22], [23], [24] and two new sources [7], [20]. The results of this analysis give a pooled average sensitivity of 88% (95% CI 84 to 91%) and specificity of 36% (95% CI 27 to 45%), see Figure 2.

thumbnail
Figure 2. Individual and pooled diagnostic test accuracy of ‘Klaassen’ rule.

The ROC space plots show each study estimates of sensitivity and specificity as a marker at the point estimate, with 95% confidence intervals demonstrated by lines. In reading such graphs, tests with a better discriminatory ability fall in the top left corner of the plot, and non-discriminatory tests fall on a 45° line between the bottom left and top right. The light lines and circles represent individual studies, with the darker dashed lines showing the study from which the rule was derived. The dark circle is the pooled estimate of sensitivity and specificity, and the dashed ellipse represents the bivariate 95% confidence intervals of this result.

https://doi.org/10.1371/journal.pone.0038300.g002

The “Ammann” rule describes patients at low risk of significant bacterial infection as from weighted factors including: bone marrow involvement, clinical signs of viral infection, serum C-reactive protein (CRP) level, leukocyte count, presence of a central venous catheter, high haemoglobin level, and diagnosis of pre-B-cell leukaemia (see Table 1 for details). Three studies provide data to test this rule [7], [18], [20]. The pooled average sensitivity was 98% (95%CI 91 to 99%) but pooled average specificity only 13% (95% CI 8% to 21%), see Figure 3.

thumbnail
Figure 3. Individual and pooled diagnostic test accuracy of ‘Ammann’ rule.

The ROC space plots here has the light lines and circles represent individual studies, with the darker dashed lines showing the study from which the rule was derived, and the heavy dark lines the pooled estimate of sensitivity and specificity, with the univariate 95% confidence intervals.

https://doi.org/10.1371/journal.pone.0038300.g003

The “Alexander” rule examined adverse clinical consequences, using a combination of clinical features which predict prolonged neutropenia, and significant co-morbidities at presentation. This rule was assessed by two further studies [2], [18]. There was marked heterogeneity in the results of these three studies (see Figure 4). When used at reassessment after 48 hrs of hospitalisation, there was marked improvement in the discriminatory ability of the rule [2] (sensitivity = 100%, specificity = 39%).

thumbnail
Figure 4. Individual diagnostic test accuracy of ‘Alexander’ rule.

The light lines and circles represent individual studies, with the darker dashed lines showing the study from which the rule was derived.

https://doi.org/10.1371/journal.pone.0038300.g004

The PINDA rule again describes patients at low risk of significant bacterial infection as from weighted factors including laboratory and chemotherapy related parameters. This has been examined in two studies from the Santolya group [25], [26] and by two validations from European centres [7], [20]. There was marked heterogeneity (see Figure 5), potentially explained through geographical variation: the rule worked well applied in the population in Chile, but failed to differentiate patients in French and Swiss/German studies.

thumbnail
Figure 5. Individual and pooled diagnostic test accuracy of ‘PINDA’ rule.

The light lines and circles represent individual studies, with the darker dashed lines showing the study from which the rule was derived.

https://doi.org/10.1371/journal.pone.0038300.g005

The rule of Rondellini [27] is a weighted score of clinical and haematological parameters (see Table 1 for details) and was assessed in two validation datasets. These demonstrated a sensitivity of 84% [7] and 62% [20] and both estimated specificity at 43%.

The SPOG2003 is a weighted score of haematological parameters with intensity of chemotherapy. It is applied after 8–24 hours of hospitalisation. This model was shown to have a sensitivity 92% and specificity of 45% [7]. A validation of this model demonstrated poorer sensitivity (82%) and slightly better specificity (57%) [19].

Discussion

This update systematic review builds on previous work to bring our knowledge of currently developed clinical decision rules for risk stratification in paediatric febrile neutropenia up to date. Now nine further models have been described, bringing the total to 25, and have included 10,000 episodes. It remains the case that no one rule is clearly better than any other, but we are now more clearly aware of the limitations of CDR which have not been subject to temporal and geographical validation.

The majority of CDR in this review focus upon defining a group at ‘low risk’ of complications. These rules once again have clinical and physiological similarities. The dominant themes are of a relationship between underlying diagnosis, chemotherapeutic regime, and clinical and laboratory parameters at the outset of the episode of fever. A further finding from this review is the demonstration that undertaking risk stratification at 24–48 hours after the onset of the episode leads to much greater discrimination, as many occult infections will have declared in this period.

Two rules have shown relative consistency of results. These are the simplest stratification of patients using the criteria of absolute monocyte count >100/mm3 to define a low risk group [24]. This has a pooled average sensitivity of 88% (95% CI 84 to 91%) and specificity of 36% (95% CI 27 to 45%), and if we assume serious infectious events occur in 30% of the group, the low-risk group has a 9% risk of serious infection, and accounts for approximately 29% of the total population. The high risk group has a 37% risk of infectious complications.

The Ammann 2003 rule [7] has much better sensitivity (estimated at 98%), leading to a risk of serious infectious complications in around 5% cases, but would only class 9% of patients as low risk, making it of little practical use.

Other further rules have shown marked heterogeneity: the Alexander rule [28] and the PINDA rule. The data support the use of the PINDA rule in Chile, where it has been successfully validated [25], but do not support its use in Europe. A similar situation exists with the Brazilian rule [27] which again was not successfully validated in European data sets. The Alexander rule did not successfully differentiate patients at admission in the UK and Europe, but its use at a 48 hour reassessment was associated with successful reductions in hospital stay. A further, newer, rule from the SPROG group requires more validation before a decision can be made on its usefulness.

These findings, that validation of CDR may be poor in comparison to derivation, and that geographical variation may mean CDR fail to work universally, have important clinical implications. There is a wealth of examples in the statistical and methodological literature regarding the over-optimism of newly derived CDR [29], [30]. The core concept is that rules derived from one dataset fit the idiosyncrasies and anomalies of the data collected, rather than reflecting the predictive power in the whole population of children experiencing FNP. However, these frequently equation-laden papers are uncommonly read by clinicians, and the complex approaches suggested to ‘shrinking’ the CDR values to increase their reproducibility are tricky to understand and to implement. The finding of geographical variation is potentially through different interpretations of similar findings; for example, how “unwell” should a child appear before they fall into this diagnostic category? There may also be subtle differences in the regimes used, as an example the use of steroid pulses in maintenance treatment for acute lymphoblastic leukaemia varies across Europe, and this may affect the CDR discriminatory ability.

This review has demonstrated there is an increasingly wide range of rules mainly for the prediction of an absence of adverse outcomes during episodes of febrile neutropenia in children, despite the existence of at least sixteen other applicable CDR [3]. Six rules have been subject to further verification, each demonstrating a variable degree of over-optimism in the original reports when the CDR is applied in different settings. The small size of these reports, with low ratios of events per variable examined may explain some of the variability in factors selected and poor reproducibility, as may undefined aspects of geographical differences between populations.

The practical application of these CDR requires it to be appropriate to the healthcare setting, and validated in the setting in which it is to be used. There remains a need for further research to reduce uncertainty around the efficiency of CDR, and potentially generate a very robust model on the basis of a much larger dataset, with well over 20 events per variable under examination. Importantly, rules should also identify a group at the highest risks of complications, to concentrate hopefully lifesaving early sepsis interventions in this group [31]. This project is already underway, with the PICNICC collaboration having collected data on around 5000 episodes of febrile neutropenia from 18 collaborating groups across North & South America, Europe and Asia.

Author Contributions

Conceived and designed the experiments: RSP TL SA LS. Performed the experiments: RSP SA. Analyzed the data: RSP SA. Contributed reagents/materials/analysis tools: RSP. Wrote the paper: RSP LS TL SA.

References

  1. 1. Phillips B, Selwood K, Lane SM, Skinner R, Gibson F, et al. (2007) Variation in policies for the management of febrile neutropenia in United Kingdom Children's Cancer Study Group centres. Arch Dis Child 92: 495–498.
  2. 2. Dommett R, Geary J, Freeman S, Hartley J, Sharland M, et al. (2009) Successful introduction and audit of a step-down oral antibiotic strategy for low risk paediatric febrile neutropaenia in a UK, multicentre, shared care setting. EJC 45: 2843–2849.
  3. 3. Phillips B, Wade R, Stewart LA, Sutton AJ (2010) Systematic review and meta-analysis of the discriminatory performance of risk prediction rules in febrile neutropaenic episodes in children and young people. EJC 46: 2950–2964.
  4. 4. Teuffel O, Ethier MC, Alibhai SM, Beyene J, Sung L (2011) Outpatient management of cancer patients with febrile neutropenia: a systematic review and meta-analysis. Ann Oncol 22: 2358–2365.
  5. 5. Freifeld AG, Sepkowitz KA (2011) No place like home? Outpatient management of patients with febrile neutropenia and low risk. J Clin Oncol 29: 3952–3954.
  6. 6. Sung L, Phillips R, Lehrnbecher T (2011) Time for paediatric febrile neutropenia guidelines – Children are not little adults. EJC 47: 811–813.
  7. 7. Ammann RA, Bodmer N, Hirt A, Niggli FK, Nadal D, et al. (2010) Predicting adverse events in children with fever and chemotherapy-induced neutropenia: the prospective multicenter SPOG 2003 FN study. J Clin Oncol 28: 2008–2014.
  8. 8. Centre for Reviews and Dissemination (2009) Systematic review: CRD's guidance for undertaking reviews in health care. York: University of York.
  9. 9. Lijmer JG, Mol BW, Heisterkamp S, Bonsel GJ, Prins MH, et al. (1999) Empirical evidence of design-related bias in studies of diagnostic tests. JAMA 282: 1061–1066.
  10. 10. Whiting P, Rutjes A, Reitsma J, Bossuyt P, Kleijnen J (2003) The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Medical Research Methodology 3: 25–25.
  11. 11. Simel DL, Bossuyt PM (2009) Differences between univariate and bivariate models for summarizing diagnostic accuracy may not be large. J Clin Epidemiol 62: 1292–1300.
  12. 12. Harbord R (2008) METANDI: Stata module to perform meta-analysis of diagnostic accuracy. S456932 ed: Department of Social Medicine, University of Bristol.
  13. 13. Lunn D, Thomas A, Best N, Spiegelhalter D (2000) WinBUGS – A Bayesian modelling framework: Concepts, structure, and extensibility. Statistics and Computing 10: 337, 325–337, 325:
  14. 14. Delebarre M, Dubos F, Macher E, Garnier N, Mazingue F, et al. (2010) Identifying high-risk patients for severe infection in children with chemotherapy-induced febrile neutropenia: A new decision rule. Pediatr Blood Cancer 55: 823.
  15. 15. Mian A, Prodhan P, Bhutta A, Watkins B (2009) Clinical variables and acute phase reactants at initial hospitalization as predictors of admission to intensive care unit (ICU) among febrile neutropenic children. Crit Care Med 37: A344.
  16. 16. Hakim H, Flynn PM, Srivastava DK, Knapp KM, Li C, et al. (2010) Risk prediction in pediatric cancer patients with fever and neutropenia. Pediatr Infect Dis J 29: 53–59.
  17. 17. Badiei Z, Khalesi M, Alami MH, Kianifar HR, Banihashem A, et al. (2011) Risk factors associated with life-threatening infections in children with febrile neutropenia: a data mining approach. J Pediatr Hematol Oncol 33: e9–e12.
  18. 18. Agyeman P, Aebi C, Hirt A, Niggli FK, Nadal D, et al. (2011) Predicting bacteremia in children with cancer and fever in chemotherapy-induced neutropenia: results of the prospective multicenter SPOG 2003 FN study. Pediatr Infect Dis J 30: e114–119.
  19. 19. Miedema KG, de Bont ES, Oude Nijhuis CS, van Vliet D, Kamps WA, et al. (2011) Validation of a new risk assessment model for predicting adverse events in children with fever and chemotherapy-induced neutropenia. J Clin Oncol 29: e182–184; author reply e185:
  20. 20. Macher E, Dubos F, Garnier N, Delebarre M, De Berranger E, et al. (2010) Predicting the risk of severe bacterial infection in children with chemotherapy-induced febrile neutropenia. Pediatr Blood Cancer 55: 662–667.
  21. 21. Rackoff WR, Gonin R, Robinson C, Kreissman SG, Breitfeld PB (1996) Predicting the risk of bacteremia in childen with fever and neutropenia. J Clin Oncol 14: 919–924.
  22. 22. Baorto EP, Aquino VM, Mullen CA, Buchanan GR, DeBaun MR (2001) Clinical parameters associated with low bacteremia risk in 1100 pediatric oncology patients with fever and neutropenia. Cancer 92: 909–913.
  23. 23. Madsen K, Rosenman M, Hui S, Breitfeld PP (2002) Value of electronic data for model validation and refinement: bacteremia risk in children with fever and neutropenia. J Pediatr Hematol Oncol 24: 256–262.
  24. 24. Klaassen RJ, Goodman TR, Pham B, Doyle JJ (2000) “Low-risk” prediction rule for pediatric oncology patients presenting with fever and neutropenia. J Clin Oncol 18: 1012–1019.
  25. 25. Santolaya ME, Alvarez AM, Avil's CL, Becker A, Cofr' J, et al. (2002) Prospective evaluation of a model of prediction of invasive bacterial infection risk among children with cancer, fever, and neutropenia. Clinical Infectious Diseases 35: 678–683.
  26. 26. Santolaya ME, Alvarez AM, Becker A, Cofre J, Enriquez N, et al. (2001) Prospective, multicenter evaluation of risk factors associated with invasive bacterial infection in children with cancer, neutropenia, and fever. J Clin Oncol 19: 3415–3421.
  27. 27. Rondinelli PIP, Ribeiro KdCB, de Camargo B (2006) A proposed score for predicting severe infection complications in children with chemotherapy-induced febrile neutropenia. J Pediatr Hematol Oncol 28: 665–670.
  28. 28. Alexander SW, Wade KC, Hibberd PL, Parsons SK (2002) Evaluation of risk prediction criteria for episodes of febrile neutropenia in children with cancer. J Pediatr Hematol Oncol 24: 38–42.
  29. 29. Janssen KJ, Moons KG, Kalkman CJ, Grobbee DE, Vergouwe Y (2008) Updating methods improved the performance of a clinical prediction model in new patients. J Clin Epidemiol 61: 76–86.
  30. 30. Toll DB, Janssen KJM, Vergouwe Y, Moons KGM (2008) Validation, updating and impact of clinical prediction rules: A review. J Clin Epidemiol 61: 1085–1094.
  31. 31. Dellinger RP, Carlet JM, Masur H, Gerlach H, Calandra T, et al. (2004) Surviving Sepsis Campaign guidelines for management of severe sepsis and septic shock. Crit Care Med 32: 858–873.
  32. 32. Ammann RA, Hirt A, Luthy AR, Aebi C (2003) Identification of children presenting with fever in chemotherapy-induced neutropenia at low risk for severe bacterial infection. Medical & Pediatric Oncology 41: 436–443.