![]() |
![]() |
|
Advanced Search | ||
![]() |
Ferraris VA, Ferraris SP. Risk Stratification and Comorbidity.
In: Cohn LH, Edmunds LH Jr, eds. Cardiac Surgery in the Adult. New York: McGraw-Hill, 2003:187224.
|
Chapter 6 |
![]() |
?? HISTORICAL PERSPECTIVES AND THE PURPOSE OF OUTCOME ASSESSMENT: NIGHTINGALE, CODMAN, AND COCHRANE |
---|
![]() ![]() ![]() |
---|
It may seem a strange principle to enunciate as the very first requirement in a Hospital that it should do the sick no harm. It is quite necessary, nevertheless, to lay down such a principle, because the actual mortality in hospitals ... is very much higher than ... the mortality of the same class of diseases among patients treated out of hospital....Florence Nightingale, 1863
The formal assessment of patient care had its beginnings in the mid-1800s. One of the earliest advocates of analyzing outcome data was Florence Nightingale, who was troubled by observations that hospitalized patients died at rates higher than those of patients treated outside of the hospital.1,2 She also noted a vast difference in mortality rates among different hospitals, with London hospitals having a mortality rate as high as 92%, while smaller rural hospitals had a much lower mortality rate (12%15%). Although England had tracked hospital mortality rates since the 1600s, the analysis of these rates was in its infancy during Nightingale's era. Yearly mortality statistics were calculated by dividing the number of deaths in a year by the average number of hospitalized patients on a single day of that year. Nightingale made the important observation that raw mortality rates were not an accurate reflection of outcome, since some patients were sicker when they presented to the hospital, and therefore would be expected to have a higher mortality. This was the beginning of risk adjustment based on severity of disease. She was able to carry her observations to the next level by suggesting simple measures, such as improved sanitation, less crowding, and locating hospitals distant from crowded urban areas, that would ultimately result in dramatic improvement in patients' outcomesan example of a quality improvement project (see below).
Ernest Amory Codman, a Boston surgeon, was one of the most outspoken early advocates of outcome analysis and scrutiny of results. Codman was a classmate of Harvey Cushing, and he became interested in the issues of outcome analysis after a friendly bet with Cushing about who had the lowest complication rate associated with the delivery of anesthesia. In the early 1900s as medical students, they were responsible for administering anesthesia. Since vomiting and aspiration were common upon induction of anesthesia, many operations were over before they started. Cushing and Codman compared their results and kept records concerning the administration of anesthesia while they were medical students. This effort not only represented the first intraoperative patient records, but also served as a foundation for Codman's later interest (almost passion) for the documentation of outcomes. Codman actually paid a publisher to disseminate the results obtained in his privately owned Boston hospital.3 Codman was perhaps the first advocate of searching for a cause of all complications. He linked specific outcomes to specific interventions (or errors). He believed that most bad outcomes were the result of errors or omissions by physicians, and completely ignored any contribution to outcome from hospital-based and process-related factors. His efforts were not well received by his peers, and eventually his private hospital closed because of lack of referrals.
Both Codman and Nightingale viewed outcome analysis as an intermediate step toward the improvement of patient care. It was not enough to know the rates of a given outcome. While it is axiomatic that any valid comparison of quality of care or patient outcome must account for severity of illness, this is only the initial step toward improving patient outcome.
Further definition of outcome assessment occurred in the mid-1900s. As more and more therapeutic options became available to treat the diseases that predominated in the early 20th century (e.g., tuberculosis), a need arose to determine the best alternative among multiple therapies, leading to the advent of the controlled randomized trial and tests of effectiveness of therapy. One of the earliest randomized trials was conducted to determine whether streptomycin was effective against tuberculosis.4 Although the trial proved streptomycin's effectiveness, it also stimulated a great deal of controversy. After World War II, several clinicians advocated the use of randomized, controlled trials to better identify the optimal treatment to provide the best outcome. Foremost among these was Archie Cochrane. Every physician should know about Archie Cochrane (Fig. 6-1). He is as close to a true hero as a physician can get, but there may be those who see him as the devil incarnate. As you can see from some of the highlights of his career (see Fig. 6-1), he lived during an exciting time. In the 1930s Professor Cochrane was branded as a "Trotskyite" because he advocated a national health system for Great Britain. His advocacy was tempered by 4 years as a prisoner of war in multiple German POW camps during World War II. He saw soldiers die from tuberculosis, and he was never sure what the best treatment was. He could choose among collapse therapy, bed rest, supplemental nutrition, or even high-dose vitamin therapy. A quote from his book sums up his frustration:
I had considerable freedom of clinical choice of therapy: my trouble was that I did not know which to use and when. I would gladly have sacrificed my freedom for a little knowledge.5?
View larger version (176K):
[in this window]
[in a new window]
?FIGURE 6-1 Portrait of Archie Cochrane with brief biography.
His experience with the uncertainty about the best treatment for tuberculosis and other chest diseases continued after the war, when he became a researcher in pulmonary disease for the Medical Research Council in Great Britain. His continued interest in tuberculosis was now heightened by the fact that he had contracted the disease. Archie wanted to know the best drug therapy for tuberculosis, since there were now drugs available that could treat this disease, with streptomycin being the first really effective drug against mycobacterium tuberculosis.4 He was a patron of the randomized controlled trial (or RCTs, as he liked to refer to them) to test important medical hypotheses. He used the evidence gained from these RCTs to make decisions about the best therapy based on available evidencethe beginning of "evidence-based" practice. He felt that RCTs were the best form of evidence to support medical decision making (so-called "class 1" evidence). Initially he was a voice in the wilderness, but this eventually changed. In 1979 he criticized the medical profession for not having a critical summary, organized by specialty and updated periodically, of relevant RCTs. In the 1980s, a database of important RCTs dealing with perinatal medicine was developed at Oxford. In 1987, the year before Cochrane died, he referred to a systematic review of randomized controlled trials (RCTs) of care during pregnancy and childbirth as "a real milestone in the history of randomized trials and in the evaluation of care," and suggested that other specialties should copy the methods used. This led to the opening of the first Cochrane center (in Oxford) in 1992 and the founding of the Cochrane Collaboration in 1993.
The Cochrane Web site (https://www.cochrane.org/) has summaries of all available RCTs on a wide range of medical subjects. Thus it is fair to call Archie Cochrane the "father of evidence-based medicine," Evidence-based medicine has, at its heart, the imperative to improve outcomes by comparing alternative therapies to determine which is the best. Evidence-based studies that involve randomized trials have the advantage of being able to infer cause and effect (i.e., a new therapy or drug causes improved outcome). On the other hand, observational studies (or retrospective studies) are able to define only associations between therapies and outcome, not prove cause and effect.
![]() |
?? DEFINITIONS |
---|
![]() ![]() ![]() ![]() |
---|
Risk stratification means arranging patients according to the severity of their illness. Implicit in this definition is the ability to predict outcomes from a given intervention based on preexisting illness or the severity of intervention. Risk stratification is therefore defined as the ability to predict outcomes from a given intervention by arranging patients according to the severity of their illness. The usefulness of any risk stratification system arises from how the system links severity to a specific outcome.
There have been numerous attempts at describing severity of illness by means of a tangible score or number. Table 6-1 is a partial listing of some of the severity measures commonly used in risk assessment of cardiac surgical patients. This list is not meant to be comprehensive, but it does give an overview of the types of risk stratification schemes that have been used for cardiac patients. The risk stratification systems listed in Table 6-1 are in constant evolution, and the descriptions in the table may not reflect current or future versions of these systems. All of these severity measures share 2 common features. First, they are all linked to a specific outcome. Second, all measures view a period of hospitalization as the episode of illness. The severity indices listed in Table 6-1 define severity predominantly based on clinical measures (e.g., risk of death, clinical instability, treatment difficulty, etc.). Two of the severity measures shown in Table 6-1 (MedisGroups used in the Pennsylvania Cardiac Surgery Reporting System and the Canadian Provincial Adult Cardiac Care Network of Ontario) define severity based on resource use (e.g., hospital length-of-stay, cost, etc.) as well as on clinical measures.6,7 Of the 9 severity measures listed in Table 6-1, only one, the APACHE III system, computes a risk score independent of patient diagnosis.8 All of the others in the table are diagnosis-specific systems that use only patients with particular diagnoses in computing severity scores.
|
Outcomes and Risk Stratification
There are at least 4 outcomes of interest to surgeons dealing with cardiac surgical patients: mortality, serious nonfatal morbidity, resource utilization, and patient satisfaction. Which patient characteristics constitute important risk factors may depend largely on the outcome of interest. For example, Table 6-2 lists the multivariate factors and odds ratios associated with various outcomes of interest for our patients having cardiac operations.1012 The clinical variables associated with increased resource utilization after operation are different than those associated with increased mortality risk. As a generalization, the risk factors associated with in-hospital death are likely to reflect concurrent, disease-specific variables, while factors associated with increased resource utilization reflect serious comorbid illness.10,11,13 For example, mortality risk after coronary artery bypass graft (CABG) is associated with disease-specific factors such as ventricular ejection fraction, recent myocardial infarction, and hemodynamic instability at the time of operation. Risk factors for increased resource utilization (as measured by length of stay and hospital cost) include comorbid illnesses such as peripheral vascular disease, renal dysfunction, hypertension, and chronic lung disease. It is not surprising that comorbid conditions are important predictors of hospital charges, since patients with multiple comorbidities often require prolonged hospitalization, not only for treatment of the primary surgical illness but also for treatment of the comorbid conditions.
|
Comorbidities are coexisting diagnoses that are indirectly related to the principal surgical diagnosis but may alter the outcome of an operation. Physicians or hospitals that care for patients with a higher prevalence of serious comorbid conditions are clearly at a significant disadvantage in unadjusted comparisons. The prevalence of comorbid illness in patients with cardiac disease has been well demonstrated. In one series of patients with myocardial infarction, 26% also had diabetes, 30% had arthritis, 6% had chronic lung problems, and 12% had gastrointestinal disorders.16
Several indices of comorbidity are available. Table 6-3 compares 5 commonly used comorbidity measures: the Charlson index, the RAND Corporation index, the Greenfield index, the Goldman index, and the APACHE III scoring system.1623 There are many limitations of comorbidity indices, and they are not applied widely in studies of efficacy or medical effectiveness. Perhaps the most serious drawback of comorbidity scoring systems is the imprecision of the databases used to form the indices. Most of the data used to construct the indices come from two sources: (1) administrative databases in the form of computerized discharge abstract data, and (2) out-of-hospital follow-up reports. Discharge abstracts include clinical diagnoses that are often assigned by nonphysicians who were not involved in the care of the patient. Comprehensive entry of correct diagnoses is not a high priority for most clinicians, and problems with discharge coding have been identified by Iezzoni and others.2426 These authors found that many conditions that are expected to increase the risk of death are actually associated with a lower mortality. The presumed explanation for this paradoxical finding is that less serious diagnoses are unlikely to be coded and entered in the records of the most seriously ill patients. Likewise, the accuracy of out-of-hospital follow-up studies is hard to validate, and they may contain significant inaccuracies. Because of these shortcomings, analyses that compare physician or hospital outcomes and that do not provide adequate adjustment for patient comorbidity are likely to discriminate against providers or hospitals that treat disproportionate numbers of elderly patients with multiple comorbid conditions.
|
Risk adjustment for severity of illness and comorbidity is equally important for patients about to undergo stressful interventions such as surgical operations or chemotherapy. For example, Goldman et al reported that preexisting heart conditions and other comorbid diseases were important predictors of postoperative cardiac complications for patients undergoing noncardiac procedures.23 The Goldman scoring system is commonly used by anesthesiologists in assessing patients preoperatively, especially prior to noncardiac procedures.
![]() |
?? USES OF OUTCOMES ANALYSIS AND RISK STRATIFICATION |
---|
![]() ![]() ![]() ![]() |
---|
|
The ultimate goal of risk stratification and outcome assessment is to account for differences in patient risk factors so that patient outcomes can be used as an indicator of quality of care. A major problem arises in attaining this goal because uniform definitions of quality of care are not available. This is particularly true of cardiovascular disease. For example, there are substantial geographic differences in the rates at which patients with cardiovascular diseases undergo diagnostic procedures and, incidentally, there is little, if any, evidence that these variations are related to survival or improved outcome.2831 In one study, coronary angiography was performed after acute myocardial infarction in 45% of patients in Texas compared to 30% of patients in New York State (p 29 In these patient populations the differences in the rates of coronary revascularization were not as dramatic, and the survival in these patients was not related to the type of treatment or diagnostic procedures. Regional variations of this sort suggest that a rigorous definition of the "correct" treatment of acute myocardial infarction, and other cardiovascular diseases, is elusive, and the definition of quality of care for such patients is imperfect. Similar imperfections exist for nearly all outcomes in patients with cardiothoracic disorders.
Recognizing the difficulties in defining "best practices" for a given illness, professional organizations have opted to promote practice guidelines or "suggested therapy" for given diseases.32 These guidelines represent a compilation of available published evidence, including randomized trials and risk-adjusted observational studies, as well as consensus among panels of experts proficient at treating the given disease.33 For example, the practice guideline for coronary artery bypass grafting is available for both practitioners and the lay public on the Internet (https://www.acc.org/clinical/guidelines/bypass/execIndex.htm). Table 6-5 summarizes the 1999 AHA/ACC guidelines for coronary artery bypass grafting in patients with acute (Q-wave) myocardial infarction. These guidelines were developed using available randomized controlled trials, risk-adjusted observational studies, and expert consensus. They are meant to provide clinicians with accepted standards of care that most would agree upon, with an ultimate goal of limiting deviations from accepted standards.
|
EFFICACY STUDIES VERSUS EFFECTIVENESS STUDIES
There have been many efficacy studies relating to cardiothoracic surgery. These studies attempt to isolate one procedure or device and evaluate its effect on patient outcomes. The study population in efficacy studies is specifically chosen to contain as uniform a group as possible. Typical examples of efficacy studies include randomized, prospective, clinical trials (RCTs) comparing use of a procedure or device in a well-defined treatment population compared to an equally well-defined control population.
Efficacy studies are different from effectiveness studies.5 The latter deal with whole populations and attempt to determine the treatment option that provides optimal outcome in a population that would typically be treated by a practicing surgeon. An example of an effectiveness study is a retrospective study of outcome in a large population treated with a particular heart valve. Risk stratification is capable of isolating associations between outcome and risk factors. Methodological enhancements in risk adjustment are capable of reducing biases inherent in population-based, retrospective studies,35 but they can never eliminate all confounding biases in observational studies.
One reasonable strategy for using risk stratification to improve patient care is to isolate high-risk subsets from population-based, retrospective studies (i.e., effectiveness studies), and then to test interventions to improve outcome in high-risk subsets using RCTs. This is a strategy that should ultimately lead to the desired goal of improved patient care. For example, a population-based study on postoperative blood transfusion revealed that the following factors were significantly associated with excessive blood transfusion (defined as more than 4 units of blood products after CABG): (1) template bleeding time, (2) red blood cell volume, (3) cardiopulmonary bypass time, and (4) age.36 Cross-validation of these results was carried out on a similar population of patients undergoing CABG at another institution. Based on these retrospective studies, it was reasonable to hypothesize that interventions aimed at reducing blood transfusion after CABG were most likely to benefit patients with prolonged bleeding time and low red blood cell volume. A prospective clinical trial was then performed to test this hypothesis using two blood conservation techniques, platelet-rich plasma saving and whole blood sequestration, in patients undergoing CABG. The results of this stratified, prospective clinical trial showed that blood conservation interventions were beneficial in the high-risk subset of patients.37 The implications of these studies are that more costly interventions such as platelet-rich plasma saving are only justified in high-risk patients, with the high-risk subset being defined by risk stratification methodologies. Other strategies have been developed that use risk-adjustment methods to improve quality of care, and these methods will be discussed below.
Other Goals of Outcomes Analysis
Financial factors are a major force behind health care reform. America's health care costs amount to 15% to 20% of the gross national product, and this figure is rising at a rate of 6% annually. Institutions who pay for health care are demanding change, and these demands are fueled by studies that suggest that 20% to 30% of care is inappropriate.38 Charges of inappropriate care stem largely from the observation that there are wide regional variations in the use of expensive procedures.39,40 This has resulted in a shift in emphasis, with health care costs being emphasized on equal footing with clinical outcomes of care. Relman suggested that clinical outcomes will be used by patients, payors, and providers as a basis for distribution of future funding of health care.41 While wide differences in use of cardiac interventions initially fueled charges of overuse in certain areas,42 recent evaluations suggest that underuse of indicated cardiac interventions (either PTCA or CABG) may be a cause of this variation.4247 Whether caused by underuse or overuse of cardiovascular services, regional variations in resource utilization make it difficult to use outcomes as an indicator of quality of care.
If the causes of regional variations in the use of cardiac interventions seem puzzling, then physician practice behavior might seem bizarre. One study showed that there were unbelievably large variations in care delivered to patients having cardiac surgery.48 Among 6 institutions that treated very similar patients (Veteran's Administration medical centers), there were large differences in the percentage of elective, urgent, and emergent cases at each institution, ranging from 58% to 96% elective, 3% to 31% urgent, and 1% to 8% emergent.48 There was also a 10-fold difference in the preoperative use of intra-aortic balloon counterpulsation for control of unstable angina, varying from 0.8% to 10.6%.48 Similar variations in physician-specific transfusion practices,49 ordering of blood chemistry tests,50 anesthetic practices,51 treatment of chronic renal failure,52 and use of antibiotics5355 have been observed. This variation in clinical practice may reflect uncertainty about the efficacy of available interventions, or differences in practitioners' clinical judgment. Some therapies with proven benefit are underused.51,52 Whatever the causes of variations in physician practice, they distort the allocation of health care funds in an inappropriate way. Solutions to this problem involve altering physician practice patterns, something that has been extremely difficult to do.56 How can physician practice patterns be changed in order to improve outcome? Evidence suggests that the principal process of outcome assessmentthe case-by-case review (traditionally done in the morbidity and mortality conference format)may not be cost-effective and may not improve quality57 and should be replaced by profiles of practice patterns at institutional, regional, or national levels. One proposed model for quality improvement involves oversight that emphasizes the appropriate balance between internal mechanisms of quality improvement (risk-adjusted outcome analysis) and external accountability.57
![]() |
?? TOOLS FOR RISK STRATIFICATION AND OUTCOMES ANALYSIS |
---|
![]() ![]() ![]() ![]() |
---|
Perhaps the most important tool of any outcome assessment endeavor is a database that is made up of a representative sample of the study group of interest. The accuracy of the data elements in any such database cannot be overemphasized.5860 Factors such as the source of data, the outcome of interest, the methods used for data collection, standardized definitions of the data elements, data reliability checking, and the time frame of data collection are essential features that must be considered when either constructing a new database or deciding how to use an existing database.59,60 The quality of the database of interest must be evaluated.
Data obtained from claims databases have been criticized. Because these data are generated for the collection of bills, their clinical accuracy is inadequate, and it is likely that these databases overestimate complications for billing purposes.61,62 Furthermore, claims data were found to underestimate the effects of comorbid illness and to have major deficiencies in important prognostic variables for CABG, namely left ventricular function and number of diseased vessels.63 The Duke Databank for Cardiovascular Disease found major discrepancies between clinical and claims databases, with claims data failing to identify more than half of the patients with important comorbid conditions such as congestive heart failure, cerebrovascular disease, and angina.64 The Health Care Financing Administration (HCFA) uses claims data to evaluate variations in the mortality rates in hospitals treating Medicare patients. After an initially disastrous effort at risk adjustment from claims data,65,66 new algorithms were developed. Despite these advances, the HCFA administration halted release of the 1993 Medicare hospital mortality report because of concerns about the database and fears that the figures would unfairly punish inner-city public facilities.67 The importance of the quality of databases used to generate comparisons cannot be overemphasized.
Analytic Tools of Risk Stratification
Implicit in risk adjustment is the use of some analytic technique to determine the significant risk factors that are predictive of the outcome of interest. Some physicians take the "ostrich" approach when it comes to any statistical concept more sophisticated than the t test. This approach is both unscientific and potentially harmful to patient care. The current shift to outcomes analysis carries with it a more intensive reliance on statistical techniques that are capable of evaluating large populations with multiple variables of interest in an interdependent manneri.e., multivariate analyses. A modicum of statistical knowledge helps unravel the intricacies of risk adjustment and provides confidence in the results of risk-adjustment methodologies. The following sections are not intended to provide the reader with exhaustive knowledge of the statistics of outcome analysis, but rather to provide a resource for critical assessment of these methods and to stimulate the interest of readers to learn more about this important field. Perhaps the biggest single benefit of risk adjustment for outcome analysis will come from physicians increasing their knowledge base about these analytic techniques and gaining confidence in the methodology.
The starting point for understanding multivariate statistical methods is a firm grasp of elementary statistics. Several basic texts on statistics are available that are enjoyable reading for the interested health care professional.6870 These texts are a painless way to become familiar with the basic terminology regarding variable description, simple parametric (normally distributed) univariate statistics, linear regression, analysis of variance, nonparametric (not normally distributed) statistical techniques, and ultimately multivariate statistical methods.
A statistical technique that is commonly used to describe how one variable (the dependent or outcome variable) depends on or varies with a set of independent (or predictor) variables is regression analysis. The dependent or outcome variable of interest can be either continuous (e.g., hospital cost or length of stay) or discrete (e.g., mortality). Discrete outcome variables can be either dichotomous (two discrete values, such as alive or dead) or nominal (multiple discrete values, such as improved, unimproved, or worse). The relationship between the outcome variable and the set of descriptor variables can be any type of mathematical relationship. The books by Glantz and Slinker and by Harrell provide an enjoyable primer on regression analysis and are geared to the biomedical sciences.70,71
Regression analysis means determining the relationship that describes how an outcome variable depends on (or is associated with) a set of independent predictor variables. Put in simple terms, multivariate regression analysis is "model building." The resultant model is useful only if it accurately predicts outcomes for patients by determining significant risk factors associated with the outcome of interesti.e., risk adjustment of outcome. When the outcome variable of interest is a continuous variable such as hospital cost, linear multivariate regression is often used to construct a model to predict outcome. A multivariate linear regression model contains a set of independent variables that are linearly related to, and can be used to predict, an outcome variable. These significant independent variables are termed risk factors, and knowledge of these risk factors allows separation of patients according to their degree of riski.e., risk stratification. The linear regression model has two important features. First, the model allows one to estimate the expected risk of a patient based on his/her risk characteristics. Second, various health care providers can be compared by comparing their observed outcomes to the outcomes that would be predicted from consideration of the risk factors of the patients that they treat (so-called observed to expected ratio or "O/E" ratio).
Statistical terminology used to describe variables and variable distribution patterns is particularly important in understanding linear regression statistical modeling. An important concept is statistical variance (R2). R2 is a summary measure of performance of the statistical model. R2 is often described by saying that it is the fraction of the total variability of the dependent variable explained by the statistical model. Most investigators routinely report R2 as a measure of the performance of linear regression risk-adjustment models.72 For example, the APACHE III risk-adjustment scoring system described in Table 6-1 can be used to predict ICU length of stay. When this is done, the model is associated with an R2 value of 0.15.19,72 This implies that 15% of the variability in ICU length of stay can be explained by the variables encompassed in the APACHE III score. Another way of saying this is that 85% of the variability in ICU length of stay is not explained by the APACHE III scoring system. This R2 value does not rate the APACHE III scoring system very highly for predicting ICU length of stay. The APACHE III scoring system uses patient data obtained within 24 hours of admission to the ICU to predict outcome and was developed to predict in-hospital mortality, not hospital costs or length of stay. We have found that the outcome of patients admitted to the ICU depends on events that happen after admission (especially iatrogenic events occurring in the ICU) more so than patient characteristics present on admission to the ICU.73 Hence, it is not surprising that the APACHE III score does not account for all of the variability in ICU length of stay. In addition, it is not clear what level of R2 can be expected when the APACHE III system is used in a very different context than the one for which it was designed. Shwartz and Ash give an excellent review of evaluating the performance of risk-adjustment methods using R2 as a measure of model performance, and their work provides an enlightening insight into the tools of risk adjustment.74 Hartz et al point out that it is unlikely that any large multivariate regression model will completely account for all of the variability of any complex outcome.75 When the outcome variable of interest is a discrete variable (e.g., mortality), then nonlinear regression analysis is used. Logistic regression is the nonlinear method most widely used to model dichotomous outcomes in the health sciences. Logistic regression makes use of the mathematical fact that the expression, ex / (1 - ex), assumes values between 0 and 1 for all positive values of x. The value x in the expression can be a linear sum of predictor variables (either continuous or discrete) and the value of ex / (1 - ex) is the probability of outcome between 0 (e.g., survival) and 1 (e.g., death) for any value of predictor variables. Computer iteration techniques can be used to produce a model consisting of a set of independent variables that best predict the occurrence of a dichotomous outcome variable. The significant independent variables identified by the logistic regression model are risk factors that allow risk stratification of patients according to their risk of experiencing the dichotomous outcome (e.g., survival versus death).
The performance of logistic regression models can be assessed in several ways. However, there is less agreement about how best to measure performance for models that predict binary outcomes than there is about the use of R2 to evaluate linear regression models. One commonly used parameter to evaluate the performance of logistic regression models is the c-statistic.76 The c-statistic is equal to the area under the receiver operator characteristic (ROC) curve and can be generated from the sensitivity and specificity of measurements of any dichotomous outcomes. Table 6-6 describes the formulas for the statistical terms commonly used to describe dichotomous outcomes.
|
|
Logistic regression models have been used to develop risk profiles for providers (both hospitals and individual surgeons), or so-called report cards.7986 This has caused anguish on the part of providers,87 and concern on the part of statisticians and epidemiologists.79,80 The typical approach that report cards take is to grade surgeons by their operative mortality for CABG. In order to grade a provider, the expected number of deaths (E or expected rate) calculated from deaths observed in the entire provider group is compared to the observed number of risk-adjusted deaths for the provider (O or observed rate). This gives an O/E ratio, which is a ratio of the risk-adjusted observed mortality rate to the expected mortality rate based on the group logistic model. In order to make comparisons between providers, a confidence interval (usually a 95% confidence interval) is assigned to the observed mortality rates, and the between-provider mortality rates are presented as a range of values for each provider (Fig. 6-3A). The expected mortality rates are assumed to be independent of the observed mortality ratesan incorrect assumption. Furthermore, no sampling error is attached to the expected valuesanother incorrect assumption. The effect of making these two assumptions is to identify too many outliers (in either direction).
|
Traditional logistic regression modeling to rank surgeons according to their risk-adjusted mortality rates results in exaggerated (incorrect) provider profiles.28,79,80,8890 Goldstein and Spiegelhalter reexamined the 1994 New York State publicly disseminated mortality data using hierarchical logistic regression.88 Figure 6-3 summarizes their findings. The hierarchical analysis dampens the surgeon mortality rates towards the mean of all providers. More importantly, the New York State report identified three outliers (2 low and 1 high) based on simple logistic regression, while hierarchical analysis identified only one outlier (1 high). Similar results have been obtained by Grunkemeier et al when they applied the principles of hierarchical regression analysis to the Providence Health System logistic regression model for CABG operative mortality.80
The use of simplistic models, as in the case of the New York State or the Providence Health System logistic regression models, is incorrect, and probably unethical, given the potential impact that outlier status might have on a surgeon's practice. Two impediments to widespread use of hierarchical models are the absence of the necessary large data sets and the lack of readily available easy-to-use software packages. There are statistical "workarounds" that adjust data sets to obtain similar results to those obtained from hierarchical models,28,91 but it is unclear if such adjustment produces a result that is qualitatively different from that produced by standard hierarchical analysis. At present, hierarchical regression is the "gold standard" for risk adjustment of dichotomous outcomes and producing provider report cards. Unfortunately, this gold standard is rarely used.
When the outcome of interest is a time-dependent variable (e.g., hospital length of stay or survival after valve implantation), then regression modeling may be a more complex but still manageable process. Regression models for time-dependent outcome variables can be developed using computer iteration methods. Several excellent texts are available that cover the gamut of technical information from the relatively simple92,93 to the complex.9496 One model that has been used extensively in the biomedical sciences is the Cox proportional hazards regression model.94 In some regression models, such as logistic regression, the dependent or outcome variable is known with precision. With time-dependent outcome variables, the possibility exists that only a portion of the survival time is observed for some patients. Thus the data available for analysis will consist of some outcomes that are incomplete or "censored." Some regression models, such as logistic regression, do not easily adapt to censored data. Cox's model overcomes these technical problems by assuming that the independent variables are related to survival time by a multiplicative effect on the hazard functionthus it is a "proportional hazards" model. The hazard function is defined as the slope of the survival curve (or the time decay curve) for a series of time-dependent observations. In the Cox model, one assumes that the hazard functions are proportional, a reasonable assumption when comparing survival in two or more similar groups. Hence, it is not necessary to know the underlying survival function in order to determine the relative importance of independent variables that contribute to the overall survival curve. Table 6-7 shows an example of the use of Cox regression to evaluate the independent variables that are predictive of hospital length of stay (a time-dependent outcome variable) for patients undergoing CABG.11 For the purposes of this analysis, hospital deaths were considered censored observations. The independent variables shown in Table 6-7 are considered risk factors for increased length-of-stay, and can be used to stratify patients into groups with varying risk of prolonged hospitalization.
|
|
Thomas Bayes was a nonconformist minister and mathematician who is given credit for describing the probability of an event based on knowledge of prior probabilities that the same event has already occurred.97 Using the Bayesian approach, three sets of probabilities are defined: (1) the probability of an event before the presence of a new finding is revealed (prior probability); (2) the probability that an event is observed given that an independent variable is positive (conditional probability); and (3) the probability of an event occurring after the presence of a new finding is revealed (posterior probability). The mathematical relationship between the three probabilities is Bayes' theorem. The prior and posterior probabilities are defined with respect to a given set of independent variables. In the sequential process common to all Bayesian analyses, the posterior probabilities for one finding become the prior probabilities for the next, and a mathematical combination of prior and conditional probabilities produces posterior probabilities. Bayes' theorem can be expressed in terms of the nomenclature of Table 6-6 as:
![]() |
where PBayes is defined as the probability of a given outcome if prior probabilities are known. The principles of Bayesian statistics have been used widely in decision analysis98 and can also be used to generate multivariate regression models based on historical data about independent variables.99102 Bayesian multivariate regression models are generated using computer-based iterative techniques99,102104 and have been used in the past, but not at present, to develop the risk stratification analysis for the Society of Thoracic Surgeons National Cardiac Database.103,105 Evaluation of the performance of the Bayesian statistical regression models is usually done by cross-validation studies similar to those used to validate the Cox survival regression model in Figure 6-4. Performance of Bayesian models can also be expressed in terms of ROC curves and the c-statistic as for logistic regression models. Marshall et al have shown that Bayesian models of risk adjustment give comparable results and produce ROC curves similar to those generated from logistic regression analysis using conventional models.106
An implicit part of assessing outcome is the development of a best standard of care for a given illness or disease process. Once the most efficacious treatment is known, then comparisons with, or deviations from, the standard can be assesseda process called "benchmarking." As mentioned above, the "best standard" is not always known. Meta-analysis is a quantitative approach for systematically assessing the results of multiple previous studies to determine the best or preferred outcome. The overall goal of meta-analysis is to combine the results of previous studies to arrive at a consensus conclusion about the best outcome. Stated in a different way, meta-analysis is a tool used to summarize efficacy studies (preferably RCTs) of an intervention in a defined population with a disease in order to determine which intervention is likely to be effective in a large population with a similar disorder. Meta-analysis is a tool that can relate efficacy studies to effectiveness of an intervention by summarizing available medical evidence.
In summarizing available medical evidence on a given subject, information retrieval is king. Nowhere is this more evident than in the Cochrane Collection of available randomized trials on various medical subjects. For example, a recent Cochrane review found 17 trials that evaluated postoperative neurological deficit in patients having hypothermic cardiopulmonary bypass (CPB) compared to normothermic CPB.107 This compares to a recently published meta-analysis on a similar topic that found only 11 trials with which to perform a similar analysis.108 The Cochrane reviewers perform an exhaustive search of all available literature, using not only MEDLINE but also unpublished trials, and so-called "fugitive literature" (government reports, proceedings of conferences, published Ph.D. theses, etc.). The average thoracic surgeon has not heard of "publication bias," but the Cochrane reviewers are acutely aware of it. They realize that RCTs that have a negative result are less likely to pass the peer-review editorial process into publication than RCTs with a significant treatment effectso called "publication bias" in favor of positive clinical trials.109 So for each of the Cochrane reviews, attempts are made to find unpublished and/or negative trials to add to the body of evidence about a given subject.
All trials or observational studies that address the same outcomes of a given intervention are not the same. There are almost always subtle differences in study design, sample size, analysis of results, and inclusion/exclusion criteria. The object of comparing multiple observational studies and RCTs on the same treatment outcome is to come up with a single summary estimate of the effect of the intervention. Calculating a single estimate in the face of such diversity may give a misleading picture of the truth. There are no statistical tricks that account for bias and confounding in the original studies. Heterogeneity of the various RCTs and observational studies on the same or similar treatment outcome is the issue. This heterogeneity makes comparison of RCTs a daunting task, about which volumes have been written.15
There are at least two types of heterogeneity that confound summary estimates of multiple RCTsclinical heterogeneity and statistical heterogeneity. Statistical heterogeneity is present when the between-study variance is large; i.e., similar treatments result in widely varying outcomes in different trials. This form of heterogeneity is easiest to measure. For example, Berlin et al evaluated 22 separate meta-analyses and found that only 14 of 22 had no evidence of statistical heterogeneity.110 Three of the remaining 8 comparative studies gave different results depending on the type of statistical methods used for the analysisthe more statistical heterogeneity, the less certain the statistical inferences from the analysis.
Clinical heterogeneity of groups of RCTs that assess similar outcomes is much more difficult to assess. Measurement of treatment outcomes has plagued reviewers who try to summarize RCTs. Many RCTs address similar treatment options (e.g., hypothermic CPB versus normothermic CPB) but measure slightly different outcomes (e.g., stroke or neuropsychological dysfunction). For example, the Cochrane Heart Group found 17 RCTs that addressed the effect of CPB temperature on postoperative stroke.107 Only 4 of these 17 RCTs measured neuropsychological function, while all 17 measured neurological deficit associated with CPB. In summarizing the results of multiple RCTs comparing a given treatment, it is necessary to match "apples with apples" when looking at outcomes. In this analysis by the Cochrane Heart Group, there was a trend towards a reduction in the incidence of nonfatal strokes in the hypothermic group (OR = 0.68; 95% CI, 0.431.05). Conversely, the number of nonstroke-related perioperative deaths tended to be higher in the hypothermic group (OR = 1.46; 95% CI, 0.92.37). When all "bad" outcomes (stroke, perioperative death, myocardial infarction, low-output syndrome, intra-aortic balloon pump use) were pooled, neither hypothermia nor normothermia had a significant advantage (OR = 1.07; 95% CI, 0.921.24). This suggests that there is clinical heterogeneity among the various RCTs evaluated. There are statistical "tricks," such as stratification or regression, that can investigate and explore the differences among studies, but it is unlikely that clinical heterogeneity can be completely removed from the meta-analysis. Importantly, the Cochrane Group concludes from these data that there is no definite advantage of hypothermia over normothermia in the incidence of clinical events following CPB. This constitutes good evidence (multiple well-done RCTs) to support the notion that normothermic and hypothermic CPB have equal efficacy for most outcomes. An expert panel reviewing the Cochrane evidence might suggest that there is class I evidence (according to the ACC/AHA guideline nomenclature) that neither normothermic nor hypothermic CPB results in increased incidence of perioperative complications. This is an entirely different conclusion than Bartels et al made in their meta-analysis about the same interventions. These authors suggest that there is little evidence to support the usefulness/efficacy of hypothermia in CPB.108 No one can say which meta-analysis is closer to the truth. Much depends on the details of the meta-analyses, but logic suggests that the higher quality study including more available RCTs and statistically rigorous analysis comes closer to the scientific truth.
There is some concern about the findings of meta-analysis.111113 LeLorier et al found significant discrepancies between the conclusions of meta-analyses and subsequent large RCTs.112,113 On review of selected meta-analyses, Bailar found that "problems were so frequent and so serious, including bias on the part of the meta-analyst, that it was difficult to trust the overall best estimates that the method often produces."114,115 Great caution must be used in the interpretation of meta-analyses, but the technique has gained a strong following among clinicians since it may be applied even when the summarized studies are small and there is substantial variation in many of the factors that may have an important bearing on the findings.
The use of complex statistics is becoming more common in assessing medical data. Arguably, the understanding of this complex material on the part of clinicians has not advanced at a similar rate. In an attempt to address this knowledge gap, Blackstone coined the term "breakthrough statistics" to denote newer methods that are available to handle complex, but clinically important, research questions.35 His goal was to acquaint clinicians with the methods in a nontechnical fashion "so that you may read reports more knowledgeably, interact with your statistical collaborators more closely, or encourage your statistician to consider these methods if they are applicable to your clinical research."35 These worthy goals have direct relevance to outcomes assessment and risk stratification.
One of Blackstone's breakthrough statistical methods deals with a very common problem: the assessment of nonrandomized comparisons. Observational studies, or nonrandomized comparisons, can detect associations between risk and outcome but cannot, strictly speaking, determine which risks cause the particular outcome. Traditionally only RCTs have been able to determine cause and effect. A so-called breakthrough technique to allow nonrandomized comparisons to come closer to inferring cause and effect is the use of balancing scores.
Simple comparison of two nonrandomized treatments is confounded by selection factors. That means that a clinician decided to treat a particular patient with a given treatment for some reason that was not obvious and was not necessarily evidence-based. The selection factors used in this nonrandomized situation are difficult to control, and RCTs eliminate this type of bias. However, RCTs are often not applicable to the general population of interest since they are very narrowly defined.116
Use of nonrandomized comparisons is more versatile and less costly. One of the earliest methods used to account for selection bias was patient matching. Two groups who received different treatment were matched as closely as possible for all factors except the variable of interest. Balancing scores were developed as an extension of patient matching. In the early 1980s, Rosenbaum and Rubin introduced the idea of balancing scores to analyze observational studies.117 They called the simplest form of a balancing score a propensity score. Their techniques were aimed at drawing causal inference from nonrandomized comparisons. The propensity score is a probability of group membership. For example, in a large group of patients having CABG, some receive aspirin before operation and others do not. One might ask whether preoperative aspirin causes increased postoperative blood transfusion. The propensity score is a probability between 0 and 1 that can be calculated for each patient, and this score represents their probability of getting an aspirin before operation. If the aspirin and nonaspirin patients are matched by their propensity scores, the patients will be as nearly matched as possible for every preoperative characteristic excluding the outcome of interest. Not all the patients may be included in the analysis because some aspirin users may have a propensity score that is not closely matched to a nonaspirin user. But those aspirin users who have a matching propensity score to a nonaspirin user will be very closely matched for every variable except for the outcome variable of interest. This is as close to a randomized trial comparison as you can get without actually doing a randomized trial.
How is the propensity score calculated? The relevant question asked to construct the propensity score is which factors predict group membership (e.g., who will receive aspirin and who will not). The probability of receiving an aspirin is a dichotomous variable that can be modeled like any other binary variable. For example, logistic regression can be used to identify factors associated with aspirin use. In the logistic regression analysis to develop the propensity score, as many risk factors as possible are included in the model, and the logistic equation is solved (or modeled) for the probability of being in the aspirin group. This probability is the propensity score. An example of the results obtained from this type of analysis is shown in Table 6-8. 77 In this analysis, 2606 patients (1900 preoperative aspirin users and 606 nonusers) were "balanced" by being divided into 5 equal quintiles according to their propensity scores. Quintile 1 had the least chance of receiving aspirin while quintile 5 had the greatest chance of receiving aspirin before operation. Within each quintile, the patients were matched as closely as possible for all variables except for the outcome variable of interest, i.e., receiving any blood transfusion after CABG, almost like a randomized trial. Notice that within each quintile, aspirin users and nonusers were closely matched for other variables, such as preoperative renal function, gender, and cardiopulmonary bypass time. This indicates that the propensity score matching did what it was intended to do: match the patients for all variables except for the outcome variable of interest (i.e., postoperative transfusion). The results show that the propensity scored quintiles are asymmetrici.e., there is not a consistent association between aspirin and blood transfusion across all quintiles. In the strata that are least likely to receive preoperative aspirin, there are patients who are more likely to receive postoperative transfusion (i.e., patients in quintile 1 have the longest cardiopulmonary bypass time, the greatest number of women, and the largest number of patients with preoperative renal dysfunction). This implies that some patients may have been recognized as high risk preoperatively and were not given aspirini.e., selection bias exists in the data set. There is some evidence that well-done observational studies give comparable results to RCTs dealing with the similar outcomes,118,119 and balancing scores provide optimal means of analyzing nonrandomized studies.
|
The importance of risk factor identification for comparing outcomes has already been stressed. Risk factor identification for a given outcome has become commonplace in medicine. A problem arises from this dependence on risk factor analysis, especially logistic regression. Different observers analyzing the same risk factors to predict outcome get different results. Table 6-9 is an example of the variability in risk factor identification that can result. In this table, Grunkemeier et al compared 13 published multivariate risk models for mortality following CABG.80 The number of independent risk factors cited by any one model varied from 5 to 29! Naftel described 9 different factors that contribute to different investigators obtaining different models to predict outcome (i.e., different sets of risk factors associated with the same outcome).120 Some or all of these factors may affect the risk models listed in Table 6-9. One of Naftel's factors that is important in differentiating various models of CABG mortality is variable selection. In Table 6-9, 13 different groups found 13 different variable patterns that apparently adequately predicted operative mortality. How can this be? Recent breakthrough statistical methods have surfaced that address variable selection in statistical modeling.
|
![]() |
?? TOTAL QUALITY MANAGEMENT AND RISK ANALYSIS |
---|
![]() ![]() ![]() ![]() |
---|
Deming's and Juran's principles126129 have been given the acronym of "total quality management," or TQM. The amazing turnaround in Japanese industry has led many organizations to embrace the principles of TQM, including organizations involved in the delivery and assessment of health care.130 Using this approach, health care is viewed as a process requiring raw materials (e.g., sick patients), manufacturing steps (e.g., delivery of care to the sick), and finished products (e.g., outcomes of care). Managerial interventions are important at each step of the process to insure high-quality product. Table 6-10 outlines the key features of TQM.
|
Table 6-11 provides an outline of the sequential steps involved in solving a problem using TQM. Risk stratification plays an important role in the TQM process. One of the most important applications of risk stratification in TQM is in the early stages of the project, when the definition of the problems that affect quality is being considered. Usually a problem is identified from critical observationse.g., excessive blood transfusion after operation may result in increased morbidity, including disease transmission, increased infection risk, and increased cost. Tools such as flow diagrams that document all of the steps in the process (e.g., steps involved in the blood transfusion process after CABG) are helpful in this phase of the analysis. A logical starting point for efforts to improve the quality of the blood transfusion process would be to focus on a high-risk subset of patients who consume a disproportionate amount of resources. An Italian economist, named Pareto, made the observation that a few factors account for the majority of the outcomes of a complex process, and this has been termed the "Pareto principle."
|
|
Several tools that are typically used in industrial quality improvement and process control134 have been applied to medicine and, in particular, cardiothoracic surgery.135137 Shahian et al used control charts to evaluate special and usual variability of outcomes in patients having average-risk CABG.135 Again, for this type of analysis, CABG is viewed as a process with raw materials (patients with coronary artery disease), manufacturing steps (CABG), and output (operative outcomes). The outcomes are tracked over time using control charts, a well-known quality improvement tool. Control charts are plots of data over time. The data points are usually plotted in conjunction with overlying lines that represent upper and lower control limits. The control limits are established by considering historical data (e.g., rate of blood transfusion or the operative mortality rate). When a data point falls outside of the control limits, it is said to be "out of control." Causes of a process being out of control include either usual causes (random) or special (nonrandom) causes. Shahian et al found that certain postoperative complications (e.g., postoperative bleeding, leg wound infections, and total major complications) were out of control in the early part of their study. After implementing quality improvement measures, these complication rates showed progressive improvement, with the net effect being improvement in the length of hospital stay.
A variation on the control chart methodology, called the cumulative sum analysis, or CUSUM, was used by Novick et al to analyze the effect of changing from on-pump CABG to off-pump CABG as a primary means of operative coronary revascularization.136 These authors found that the CUSUM methodology was more sensitive than standard statistical techniques in detecting a cluster of surgical failures or successes.
Another example of a TQM project that has been carried out in clinical cardiothoracic surgery is the study reported by de Levalet al.137 In this study, surgeons identified an increase in the mortality rate of infants undergoing total repair of D-transposition of the great vessels; the authors applied the principles of TQM to the process of care of these infants. They were able to identify risk factors for poor outcome and to separate the sources of variation in mortality rate into either random (common cause) variation or nonrandom (special cause) variation. By identifying and altering nonrandom causes of increased mortality, which were presumably related to the surgeon or to the process of care, they were able to make a positive impact on patient outcome. Examples like this go far beyond simple risk analysis and begin to get at the true value of these techniques, i.e., improving patient outcomes.
Another set of innovative TQM studies has been carried out by the Northern New England Cardiovascular Disease Study Group.138141 These investigators used a risk-adjustment scheme (Tables 6-1 and 6-9) to predict mortality in patients undergoing CABG at 5 different institutions. After risk stratification, significant variability was found among the different institutions and providers. Statistical methods suggested that the variation in mortality rate was nonrandom ("special variability" in the TQM vernacular). A peer-based, confidential TQM project was initiated to address this variability and to improve outcomes in the region. In order to study this nonrandom variability, representatives from each institution visited all institutions and reviewed the processes involved in performing CABG. Surgical technique, communication among providers, leadership, decision making, training levels, and environment were assessed at each institution. Significant variation among many of the processes was observed, and attempts to correct deficiencies were undertaken at each institution. Subsequent publications from these authors suggest that this approach improves outcomes for all providers at all institutions.141
![]() |
?? RISKS OF OPERATION FOR ISCHEMIC HEART DISEASE |
---|
![]() ![]() ![]() ![]() |
---|
By far, the bulk of available experience with risk stratification and outcome analysis in cardiothoracic surgery deals with risk factors associated with operative mortality, particularly in patients undergoing coronary revascularization. Most of the risk stratification analyses shown in Table 6-1 and Table 6-9 have been used to evaluate life or death outcomes in surgical patients with ischemic heart disease, in part because mortality is such an easy end point to measure and track. As previously mentioned, each of the risk stratification systems shown in Table 6-1 and Table 6-9, with the exception of the APACHE III system, computes a risk score based on risk factors that are dependent on patient diagnosis. The definition of operative mortality varies among the different systems (either 30-day mortality or in-hospital mortality), but the risk factors identified by each of the stratification schemes in Table 6-9 show many similarities. Some variables are risk factors in almost all stratification systems; some variables are never significant risk factors. Each of the models has been validated using separate data sets; hence, there is some justification in using any of the risk stratification methods both in preoperative assessment of patients undergoing coronary artery bypass grafting (CABG) and in making comparisons among providers (either physicians or hospitals), but certain caveats exist about the validity and reliability of these models (see below). At present it is not possible to recommend one risk stratification method over another. In general, the larger the sample size, the more risk factors can be found.
A large number of patient variables other than those shown in Table 6-9 have been proposed as risk factors for operative mortality following coronary revascularization. Such variables as serum BUN,142 cachexia,143 oxygen delivery,144 HIV,145 case volume,146149 low hematocrit on bypass,150 use of the internal mammary artery,151 the diameter of the coronary artery,152 and resident involvement in the operation153,154 fit this description. On the surface, the clinical relevance of these variables may seem undeniable in published reports, but very few of these putative risk factors have been tested with the rigor of the variables shown in Table 6-9. The regression diagnostics (e.g., ROC curves and cross-validation studies) performed on the models included in Tables 6-1 and 6-9 suggest that the models are good, but not perfect, at predicting outcomes. In statistical terms this means that all of the variability in operative mortality is not explained by the set of risk factors included in the regression models. Hence, it is possible that inclusion of new putative risk factors in the regression equations may improve the validity and precision of the models. New regression models, and new risk factors, must be scrutinized and tested using cross-validation methods and other regression diagnostics before acceptance. It is uncertain whether inclusion of many more risk factors will significantly improve the quality and predictive ability of regression models. For example, the STS risk stratification model described in Tables 6-1 and 6-9 includes many predictor variables, while the Toronto risk-adjustment scheme includes only 5 predictor variables. Yet the regression diagnostics for these two models are similar, suggesting that both models have equal precision and predictive capabilities. This suggests that the models are effective at predicting population behavior but not necessarily suited for predicting individual outcomes. Further work needs to be done, both to explain the differences in risk factors seen between the various risk stratification models and to determine which models are best suited for studies of quality improvement.
Many critical features of any risk-adjustment outcome program must be considered when determining quality of the risk stratification method or when comparing one to another (see below). Daley provides a summary of the key features that are necessary to validate any risk-adjustment model.59,155 She makes the point that no clear-cut evidence exists that differences in risk-adjusted mortalities across providers reflect differences in the process and structure of care.156 This issue needs further study.
Risk Factors for Postoperative Morbidity and Resource Utilization
Patients with nonfatal outcomes following operations for ischemic heart disease make up more than 95% of the pool of patients undergoing operation. Of approximately 500,000 patients having CABG yearly, between 50% and 75% have what is characterized by both the patient and provider as an uncomplicated course following operation. The complications occurring in surviving patients range from serious organ system dysfunction to minor limitation or dissatisfaction with lifestyle, and account for a significant fraction of the cost of the procedures. We estimate that as much as 40% of the yearly hospital costs for CABG are consumed by 10% to 15% of the patients who have serious complications after operation.11 This is an example of the Pareto principle described above, and also suggests that reducing morbidity in CABG patients would have significant impact on cost reduction.
A great deal of information has been accumulated on nonfatal complications after operation for ischemic heart disease. Several large databases have been used to identify risk factors for both nonfatal morbidity and increased resource utilization. Table 6-12 is a summary of some of the risk factors that have been identified by available risk stratification models using either serious postoperative morbidity or increased resource utilization as measures of undesirable outcomes.
|
Patient Satisfaction as an Outcome
Other post-CABG outcomes, such as patient satisfaction and sense of well-being, have been less well studied. The increasing importance of patient-reported outcomes reflects the increasing prevalence of chronic disease in our aging population. The goals of therapeutic interventions are often to relieve symptoms and improve quality of life, rather than cure a disease and prolong survival. This is especially important in selecting elderly patients for operation. One report from the United Kingdom suggests that as many as one third of patients over the age of 70 did not have improvement in their disability and overall sense of well-being after cardiac operation.161 Risk stratification methodology may prove to be important in identifying elderly patients who are optimal candidates for revascularization based on quality of life considerations.
Surprisingly little published information is available regarding long-term functional status or patient satisfaction following CABG. One comparative study found no difference between patients older than 65 years and those 65 or younger with regard to quality of life outcomes (symptoms, cardiac functional class, activities of daily living, and emotional and social functioning).162 This study also found a direct relationship between clinical severity and quality of life indicators, since patients with less comorbid conditions and better preoperative functional status had better quality of life indicators 6 months after operation. Rumsfeld et al found that improvement in the self-reported quality of life (from Form SF-36) was more likely in patients who had relatively poor health status before CABG compared to those who had relatively good preoperative health status.163 Interestingly, these same authors found that the poor self-reported quality of life indicator, as measured by SF-36 questionnaire, was an independent predictor of operative mortality following CABG.164 These findings suggest that the risks of patient dissatisfaction after CABG are dependent on preoperative comorbid factors as well as on the indications for, and technical complexities of, the operation itself. At present no risk stratification scheme has been devised to identify patients who are likely to report dissatisfaction with operative intervention following CABG.
There are several difficulties with measurement of patient-reported outcomes, and consequently cardiothoracic surgeons have not been deeply involved with systematic measurements of patient satisfaction after operation. One problem is that patient-reported outcomes may be dependent on the type of patient who is reporting them and not on the type of care received. For example, younger Caucasian patients with better education and higher income are more likely to give less favorable ratings of physician care.165 However, considerable research has been done dealing with instruments to measure patient satisfaction. At least two of these measures, the Short-Form Health Survey (SF-36)166 and the San Jose Medical Group's Patient Satisfaction Measure,167 have been used to monitor patient satisfaction over time. The current status of these and other measures of patient satisfaction does not allow comparisons among providers, because the quality of the data generated by these measures is poor. These instruments are characterized by low response rates, inadequate sampling, infrequent use, and unavailability of satisfactory benchmarks. Nonetheless, available evidence indicates that patient-reported outcomes can be measured reliably168,169 and that feedback on patient satisfaction data to physicians can significantly improve physician practices.170 It is likely that managed care organizations and hospitals will use patient-reported outcome measures to make comparisons between institutions and between individual providers. Risk-adjustment methods for patient-reported outcomes will be required to provide valid comparisons of this type.
![]() |
?? VALIDITY AND RELIABILITY OF RISK STRATIFICATION METHODS |
---|
![]() ![]() ![]() ![]() |
---|
Of these components, face and content validity are arguably the most important. Clinicians can readily accept the results of risk stratification efforts if the model uses variables that are familiar and includes risk factors that the clinician recognizes as important in determining outcome. All of the risk models shown in Table 6-9 satisfy some or all of these criteria of validity. There is no objective measure that defines validity, but most clinicians would agree that the risk models have relevance to clinical practice and contain many of the features that one would expect to be predictive of morbidity and mortality for CABG.
The reliability of a risk stratification model is more easily measured than validity. Reliability of a risk-adjustment method refers to the statistical term of "precision," or the ability to repeat the observations using similar input variables and similar statistical techniques with resultant similar outcome findings. There are literally hundreds of sources of variability in any risk stratification model. Some of these include errors in data input, inconsistencies in coding or physician diagnosis, variations in use of therapeutic intervention, data fragility (final model may be very dependent on a few influential outliers), and the type of rater (physician, nurse, or coding technician), to name a few.171 The most common measure of reliability is Cohen's kappa coefficient, which measures the level of agreement between two or more observations compared to agreement due to chance alone.172 The kappa coefficient is defined as:
![]() |
where Pc is the fraction representing the agreement that would have occurred by chance, and Po is the observed agreement between two observers. If two observations agree 70% of the time for an observation where agreement by chance alone would occur 54% of the time (i.e., Po = 0.7 and Pc = 0.54), then kappa = 0.35. Landis and Koch have offered a performance estimate of kappa173 as follows:
Other methods of measuring the agreement between two models include weighted kappa, interclass correlation coefficient, the tau statistic, and the gamma statistic. These methods are discussed in the work of Hughes and Ash,171 and any of these methods offer an objective means of assessing the reliability of a risk-adjustment model.
Surprisingly little work has been done in assessing the reliability and validity of the risk-adjustment methods used with large cardiac surgical databases. It is absolutely essential that validity and reliability be tested in these models, both in order for clinicians to feel comfortable with the comparisons generated by the risk stratification models, and for policy makers (either government or managed care organizations) to feel confident in making decisions based on risk-adjusted outcomes.
Risk Stratification to Measure the Effectiveness of Care
The biggest single shortcoming of risk-adjustment methodology is its lack of proven effectiveness in delineating quality of care. Even though it may seem obvious that differences in risk-adjusted outcomes reflect differences in quality of care, this is far from proven. What little information there is available on this subject is inconsistent. Hartz et al compared hospital mortality rates for patients undergoing CABG.174 They found that differences in hospital mortality rates were correlated with differences in quality of care between hospitals. Hannan et al attempted to evaluate the quality of care in outlier hospitals in the New York State risk-adjusted mortality cohort.175 They concluded, as did Hartz et al, that risk-adjusted mortality rates for CABG were a reflection of quality of care. The measures of quality used in these studies were somewhat arbitrary and did not reflect a complete array of factors that might be expected to influence outcomes after CABG. In TQM jargon, the entire clinical process of surgical intervention for coronary revascularization was not assessed. Other studies have not found a correlation between global hospital mortality rates and quality of care indicators.83,176178 Indeed, one study suggested that nearly all of the variation in mortality among hospitals reflects variation in patient characteristics rather than in hospital characteristics,178 while another study found that identifying poor-quality hospitals on the basis of mortality rate performance, even with perfect risk adjustment, resulted in less than 20% sensitivity and greater than 50% predictive error.83 These studies suggest that reports that measure quality using risk-adjusted mortality rates misinform the public about hospital (or physician) performance.
Two ongoing quality improvement studies are using risk-adjusted outcome measurements to assess and influence the clinical process of coronary artery bypass grafting: the Northern New England Cardiovascular Disease Study138141 and the Veteran's Administration Cardiac Surgery Risk Assessment Program.179181 Results from these studies suggest that using risk-adjusted outcomes (e.g., mortality and cost) as internal reference levels tracked over time (similar to the control charts of TQM described above) can produce meaningful improvements in outcomes. Whether these risk-adjusted outcomes can be used to indicate quality of care or cost-effectiveness of providers across all institutions is another question that remains unanswered. At present, equating risk-adjusted outcome measurements with effective care is not justified.176178,182188
Controversy exists as to whether changes in a physician's report card over time reflect changes in care or whether these changes are due to other factors not related to the care delivery of individual provider's. Hannan et al suggested that the public release of surgeon-specific, risk-adjusted mortality rates led to a decline in overall mortality in the state of New York from 4.17% in 1989 to 2.45% in 1992 and hence to an improvement in the quality of care.189 The cause of this decline in operative mortality is uncertain, but probably represents a combination of improvement in the process of care (especially in the outlier hospitals), the retirement of low-volume surgeons, and an overall national trend toward decreased CABG mortality. Ghali et al found that states adjacent to New York that did not have report cards had comparable decreased CABG operative mortality during the same time period.190 Without any formal quality improvement initiative or report card, the operative morality rate in Massachusetts decreased from 4.7% in 1990 to 3.3% in 1994. This occurred while the expected operative mortality of patients increased from 4.7% to 5.7%.
The decline in New York State operative mortality over time associated with the publication of surgeon-specific mortality rates was greater than the overall national decrease in CABG death rates. Peterson et al found that the reduction in observed CABG mortality was 22% in New York versus 9% in the rest of the nation, a highly significant difference.191 An interesting finding in this study is that the only other area in the United States with a comparable decline in CABG mortality to that of New York was northern New England. The Northern New England Cooperative Group established a confidential TQM approach to improve CABG outcomes139,140,192194 at about the same time that New York State report cards were published in the lay press.
Ethical Implications of Risk Stratification: The Dilemma of Managed Care
An important component of new proposals for health care reform is mandated reports on quality of care.195 While reporting on quality of care sounds appealing, there are many problems associated with this effort, some of which present the clinician with an ethical dilemma.187,196198 There is general agreement that quality indicators should be risk-adjusted to allow fair comparisons among providers. Risk adjustment in this setting is extremely difficult and may be misleading, and, what is worse, may not reflect quality of care at all. The release of risk-adjusted data may alienate providers and result in the sickest patients having less accessibility to care. This may have already happened in New York State87,188,199 and in other regions where risk-adjusted mortality and cost data have been released to the public. Of even more concern is the selection bias that seems to exist in managed care HMO enrollment. Morgan et al suggest that Medicare HMOs benefit from the selective enrollment of healthier Medicare recipients and the disenrollment or outright rejection of sicker beneficiaries.200 This form of separation of patients into unfavorable or favorable risk categories undermines the effectiveness of the Medicare managed care system and highlights the subtle selection bias that can result when financial incentives overcome medical standards. Careful population-based studies that employ risk adjustment are needed to study this phenomenon.
A major concern of the current move toward market-oriented health care delivery is that health plans will only select the best health risk participantsa practice termed "cream skimming" by van de Venet al.201 The result of cream skimming may be to widen the gap between impoverished, underserved patients and affluent patients. In an effort to address these concerns, plans have been proposed that would reward health plans for serving people with disabilities and residents of low-income areas.202205 At the heart of these plans is some form of risk adjustment to allocate payments to health care organizations based on overall health risk and expected need for health care expenditures. The use of risk stratification in this setting is new and unproved but offers great promise.
A related problem is that physicians are being rewarded by hospitals and managed care organizations for limiting costs. Incentives are evolving that threaten our professionalism.197 On the surface, this may seem like a strong statement, but one only has to read some of the "compromise" positions that have been advocated in order to deal with the changing health care climate. "Advice," such as hiring lawyers to optimize care (managed or capitated contracts), recruiting younger patients into and discouraging Medicare-age patients away from a managed care practice, forbidding physicians from disclosing the existence of more costly services not covered by their managed care plan, and using accounting services to track and limit frequency of office visits, has been offered to physicians.197,206208 A particular telling indictment is the finding by Himmelstein et al that investor-owned HMOs deliver lower quality of care than not-for-profit plans.209 Physicians who own managed care organizations live by two standardsthe professional standard of providing high-quality patient care and the financial standard of making a profit from this care delivery. Strategies must be devised that allow physicians both to maintain a professional approach to patients and to participate in the marketplace without compromising patient care.
Collecting risk-adjusted data adds to the administrative costs of the health care system. It is estimated that 20% of health care costs ($150$180 billion per year) are spent on the administration of health care.210 The logistical costs of implementing a risk-adjustment system are substantial. Additional costs are incurred in implementing quality measures that are suggested by risk stratification methodology. A disturbing notion is that the costs of quality care may outweigh the payers' willingness to pay for these benefits. For example, Iowa hospitals estimated that they spent $2.5 million annually to gather MedisGroups severity data that was mandated by the state. Because of the cost, the state abandoned this mandate and concluded that neither consumers nor purchasers used the data anyway.211
It is possible that quality improvement may cost rather than save money, although one of the principles of TQM (often quoted by Deming) is that the least expensive means to accomplish a task (e.g., deliver health care) is the means that employs the highest quality in the process. Ultimately, improved quality will be cost-efficient, but start-up costs may be daunting, and several organizations have already expressed concerns about the logistical costs of data gathering and risk adjustment.211,212 It is imperative that part of any cost savings realized by improved quality be factored into the total costs of gathering risk-adjusted data.
![]() |
?? THE FUTURE OF OUTCOMES ANALYSIS |
---|
![]() ![]() ![]() ![]() |
---|
A great deal of effort has gone into the development of risk models to predict outcomes from cardiac surgical interventions (e.g., Tables 6-1 and 6-9). For populations of patients undergoing operations, the models are fairly effective at predicting outcomes (with certain caveats as mentioned above). The biggest drawback of these risk models is that they exhibit dismal performance at predicting outcomes for an individual patient. Consequently, risk adjustment to predict individual outcomes is extremely difficult to apply at the bedside. For patient-specific needs, risk stratification and outcomes assessment are in their infancy.
What are needed are patient-specific predictors for clinical decision making. On the surface, it would seem that a decision about whether to operate upon a patient with coronary artery disease is straightforward. But a decision of this sort is an extremely complex synthesis of diverse pieces of evidence, similar to the decisions airline pilots make about complicated flight problems. There are enormous variations in the way that surgeons practice. These variations can increase cost, cause harm, and confuse patients. The tools of decision analysis have been applied to the area of physician decision making in an effort to eliminate these variations and to provide accurate and effective decisions at the patient bedside.213,214
In the jargon of decision analysis, the decision to operate on a given patient for the treatment of coronary artery disease is extremely complex because there are more than two alternative treatments, more than two outcomes, and many intervening events that may occur to alter outcomes. Decision models generated to address surgical outcomes typically employ the familiar decision tree. They are commonly perceived as complex and difficult to understand by those unfamiliar with the methods, especially in the context of clinical decision making. Attempts have been made to simplify the methods and to apply them to the medical decision-making process.214 An important part of creating the decision tree is to estimate the probabilities of each of the various outcomes for a given set of interventions. This part of the decision analysis tree relies heavily on the results of risk stratification and regression modeling, especially computer-intensive methods such as Bayesian models,215,216 to arrive at a probability of risk for a particular outcome. All of the diverse pieces of evidence used to form consensus guidelines, including meta-analyses, expert opinions, unpublished sources, randomized trials, and observational studies, are employed to arrive at probabilities of outcome and intervening events. For example, Gage et al performed a cost-effectiveness analysis of aspirin and warfarin for stroke prophylaxis in patients with nonvalvular atrial fibrillation.217 They used data from published meta-analyses of individual-level data from 5 randomized trials of antithrombotic therapy in atrial fibrillation to estimate the rates of stroke without therapy, the percentage reduction in the risk of stroke in users of warfarin, and the percentage of intracranial hemorrhages following anticoagulation that were mild, moderate, or severe. Their decision tree suggests that in 65-year-old patients with nonvalvular atrial fibrillation (NVAF) but no other risk factors for stroke, prescribing warfarin instead of aspirin would affect quality-adjusted survival minimally but increase costs significantly. Application of decision analysis methods to clinical decision making can standardize care and decrease risks of therapy, but these methods are in the early developmental phase and much more work needs to be done before ready acceptance by surgeons.
Volume/Outcome Relationship and Targeted Regionalization
At least 10 large studies have addressed the notion that hospitals performing small numbers of CABG operations have higher operative mortality. Seven of these 10 studies found increased operative mortality in low-volume providers.148,218223 In three other large studies there was no such association.147,224,225 Interestingly, in the three studies done more recently (since 1996) there was no clear relationship between outcome and volume. In two separate studies done on some of the same patients in the New York State cardiac surgery database, completely opposite results were obtained.223,225 The Institute of Medicine summarized the relationship between higher volume and better outcome (https://www.nap.edu/catalog/10005.html) and concluded that procedure or patient volume is an imprecise indicator of quality, even though a majority of the studies reviewed showed some association between higher volumes and better outcomes.226
The dilemma is that some low-volume providers have excellent outcomes while some high-volume providers have poor outcomes. These observations on operator volume and outcome prompted some authorities to suggest "regionalization" to refer nonemergent CABG patients to large-volume centers.222,227,228 A role for "selective regionalization" was advocated by Nallamothu et al, since they found that low-risk patients did equally well in high-volume or low-volume hospitals.147 They suggest regional referral to high-volume institutions for elective high-risk patients. Crawford et al pointed out that a policy of regionalized referrals for CABG might have several adverse effects on health care, including increased cost, decreased patient satisfaction, and reduced availability of surgical services in remote or rural locations.229 It is simplistic to suggest that hospital volume is a principal surrogate of outcome, and much more sophistication is required to sort out this relationship. Nonetheless, decisions about utilization of health care resources will undoubtedly be made based on the presumed association between high volume and good outcome.
The Institute of Medicine (IOM) released a startling report on medical errors that occur in the U. S. health system (https://books.nap.edu/catalog/9728.html). Based mainly on two large studies, one using 1984 data in New York State and the other using 1992 data in Colorado and Utah,230,231 this report suggested that at least 44,000 Americans die each year from preventable hospital errors.232 This estimate was not much different than ones obtained from similar analyses on patients in Australia,233 in England,234 and in Israel.235 This IOM report caused a storm of controversy, with many experts fearing that the report could harm quality improvement initiatives.236,237 Some had doubts about the methodology used to derive estimates of medical errors,238,239 while others made an emotional plea for drastic measures to reduce errors.240
The airline industry has had dramatic success in limiting errors, and its experience is used as a model of successful implementation of error avoidance behavior and process improvement.237 These principles have been applied to pediatric cardiac surgery by de Leval et al with some success.241 These workers evaluated patient and procedural variables that resulted in adverse outcomes. In addition they employed self-assessment questionnaires and human factors researchers who observed behavior in the operating room, an approach similar to the quality improvement steps used in the airline industry. Their study highlighted the important role of human factors in adverse surgical outcomes, but, more importantly, they found that appropriate behavioral responses in the operating room can overcome potential harmful events that occur in the operating room. Studies of this sort that emphasize behavior modification and process improvement hold great promise for future error reduction in cardiac surgery.
Computer applications have been applied to the electronic medical record in hopes of minimizing physician errors in ordering. Computerized physician order entry (CPOE) is one of these applications that monitors and offers suggestions when physician's orders do not meet a predesigned computer algorithm. CPOE is viewed as a quality indicator and private employer-based organizations have used the presence of CPOE to judge whether hospitals should be part of their preferred network (https://www.leapfroggroup.org/). One of these private groups is the Leapfrog Group; their initial survey in 2001 found that only 3.3% of responding hospitals currently had CPOE systems in place (https://www.ctsnet.org/reuters/reutersarticle.cfm?article=19325). In New York State, several large corporations and health care insurers have agreed to pay hospitals that meet the CPOE standards a discount bonus on all health care billings submitted. Other computer-based safety initiatives that involve the electronic medical record are likely to surface in the future. The impact of these innovations on the quality of health care is untested and any benefit remains to be proven.
Information technology used to reduce medical errors has met with mixed success. Innovations that employ monitoring of electronic medical records may reduce errors.242,243 However, one study that tried to program guidelines for treating congestive heart failure into a network of physicians' interactive microcomputer workstations found it difficult.244 The task proved difficult because the guidelines often lack explicit definitions (e.g., for symptom severity and adverse events) that are necessary to navigate the computer algorithm. Another study attempted to implement prophylactic care measures (e.g., update of tetanus immunization in trauma patients) using reminders in the electronic inpatient medical record.245,246 These investigators were unable to increase the use of prophylactic measures in hospitalized patients using this computer-based approach. Much work needs to be done before computer-aided methods lead to medical error reduction, but the future will see more efforts of this type made.
Public Access and Provider Accountability
Multiple factors bring surgical results to the public attention. Such things as publication of individual surgeon's risk-adjusted mortality rates, limitation of referrals to high-mortality surgeons by insurers, legislative initiatives to reduce medical errors, and the abundant proliferation of the Internet as an information resource lead to increased public awareness of surgical outcomes. Like it or not, thoracic surgeons must be prepared to accept this scrutiny, and perhaps even to benefit from it, since the trend of public scrutiny of surgical results is an increasing one.
The World Wide Web provides ready access to medical facts of all sorts, including information about thoracic surgery. Everything from access to the thoracic literature, to outcomes of randomized trials, to surgeon-specific risk-adjusted mortality rates, to comparison of hospital outcomes can be obtained by the lay public with rather simple searches on the Internet. This ready public access will undoubtedly increase. Examples of available information sources for the public are listed in Table 6-13. There is very little external scrutiny attached to most of the information sources listed in Table 6-13. The public accepts almost all information available on these sites at face value, and quality control of the information sources is limited to self-imposed efforts on the part of the authors of the various information sources. The Agency for Healthcare Research and Quality (AHRQ) attempted to empower the public to critically evaluate the various Web-based sources of health care information in order to limit the spread of misinformation that may creep into various Web sites. The success of these efforts is uncertain but is becoming extremely critical as the amount of health care information available on the Web skyrockets.
|
The goal of risk adjustment is to account for the contribution of patient-related risk factors to the outcome of interest. This allows patient outcomes to be used as an indicator of the care rendered by physicians or administered by hospitals. This chapter has outlined some of the risk-adjustment methods commonly used for this purpose, including use of multivariate analyses to predict patient outcomes based on patient risk factors. Inevitably, refinements and use of newer techniques will be brought to bear on the problem of risk adjustment. One of the most promising newer risk-adjustment methods is the use of neural networks (also termed artificial intelligence) to develop prediction models based on patient risk factors.247250 A weakness of multivariate regression techniques is that some variables have too low an incidence to be used in multivariate regression models but still contribute significantly to outcome. This weakness is overcome by use of neural networks and cluster analysis. Both of these techniques use computer iteration to look for patterns of variables associated with outcome and are far less affected by low frequency of a particular variable. These methods may find more than one solution to the best prediction of outcome and may produce a combination of variables that reflect a unique clinical situation. Neural network modeling has been used to predict length of stay in the ICU after cardiac surgery249 and to predict valve-related complications following valve implantation.248 Preliminary evidence suggests that the performance of neural network models may be superior to multivariate regression models, with the c-statistic for neural network models approaching 95% in ideal situations,250 but one study did not find any added benefit from modeling in-hospital death following PTCA using neural networks compared to logistic regression.251 Computer-intensive methods including use of neural networks, bootstrapping, and balancing scores (see earlier discussion) will become more important as refinements in risk-adjustment methodology progress.
Information Management: Electronic Medical Records
As already mentioned, accurate patient data are essential in order to apply the principles of risk stratification and quality improvement outlined in this chapter. The quality and accuracy of administrative (or claims) databases have been questioned.24,61,6366 Therefore, risk-adjustment methodology has placed greater reliance on data extracted from the medical record. The American College of Surgeons was among the earliest advocates of the utility of medical records for quality review.252 In the 1960s, Weed advocated standardization and computerization of medical records.253255 Little substantive progress had been made as far as the computerization of medical records until the need arose for management of large amounts of data of the sort required for risk adjustment and outcomes assessment. Medical records are an invaluable source of information about patient risk factors and outcomes. With these facts in mind, more and more pilot studies are being undertaken to computerize and standardize the medical record in a variety of clinical situations.245,256264 Iezzoni has pointed out the difficulties with computerized medical records and suggests that they may not adequately reflect the importance of chronic disability and decreased functional status.265 Nevertheless, it is apparent that the need for data about large groups of patients exists, especially for managed care and capitation initiatives. It is reasonable to expect that efforts to computerize medical records will expand. Applications of electronic medical records that may be available in the future for cardiothoracic surgeons include monitoring of patient outcomes,256 supporting clinical decision making with real-time analysis of the electronic medical record,257,259,262 and real-time tracking of resource utilization using computerized hospital records.260
![]() |
?? CONCLUSIONS |
---|
![]() ![]() |
---|
![]() |
?? REFERENCES |
---|
![]() |
---|
|
|