A retrospective cohort study of stroke onset: implications for characterizing short term effects from ambient air pollution

Background Case-crossover studies used to investigate associations between an environmental exposure and an acute health response, such as stroke, will often use the day an individual presents to an emergency department (ED) or is admitted to hospital to infer when the stroke occurred. Similarly, they will use patient's place of residence to assign exposure. The validity of using these two data elements, typically extracted from administrative databases or patient charts, to define the time of stroke onset and to assign exposure are critical in this field of research as air pollutant concentrations are temporally and spatially variable. Our a priori hypotheses were that date of presentation differs from the date of stroke onset for a substantial number of patients, and that assigning exposure to ambient pollution using place of residence introduces an important source of exposure measurement error. The objective of this study was to improve our understanding on how these sources of errors influence risk estimates derived using a case-crossover study design. Methods We sought to collect survey data from stroke patients presenting to hospital EDs in Edmonton, Canada on the date, time, location and nature of activities at onset of stroke symptoms. The daily mean ambient concentrations of NO2 and PM2.5 on the self-reported day of stroke onset was estimated from continuous fixed-site monitoring stations. Results Of the 336 participating patients, 241 were able to recall when their stroke started and 72.6% (95% confidence interval [CI]: 66.9 - 78.3%) experienced stroke onset the same day they presented to the ED. For subjects whose day of stroke onset differed from the day of presentation to the ED, this difference ranged from 1 to 12 days (mean = 1.8; median = 1). In these subjects, there were no systematic differences in assigned pollution levels for either NO2 or PM2.5 when day of presentation rather than day of stroke onset was used. At the time of stroke onset, 89.9% (95% CI: 86.6 - 93.1%) reported that they were inside, while 84.5% (95% CI: 80.6 - 88.4%) reported that for most of the day they were within a 15 minute drive from home. We estimated that due to the mis-specification of the day of stroke onset, the risk of hospitalization for stroke would be understated by 15% and 20%, for NO2 and PM2.5, respectively. Conclusions Our data suggest that day of presentation and residential location data obtained from administrative records reasonably captures the time and location of stroke onset for most patients. Under these conditions, any associated errors are unlikely to be an important source of bias when estimating air pollution risks in this population.


Background
Over the past decade, several epidemiological studies have demonstrated that short-term (i.e., within days) fluctuations in ambient air pollution can increase in the risk of stroke [1][2][3][4]. These studies have typically employed either a time-series or case-crossover design to describe associations between the daily number of stroke events and air pollution levels. A variety of measures have been used to identify stroke events including: presentation to emergency departments (EDs) [2,4], admittance to hospital [1,[5][6][7], and death [8,9]. Regardless of the manner in which the stroke outcome is defined, the calendar dates associated with these events, typically extracted from hospital administrative data, has been used to define the time (i.e., day) of stroke onset. However, there may be considerable delays from stroke onset to when an individual presents to hospital or dies that may distort the risk estimates from these studies. Lokken et al [10] found that the onset of stroke symptoms frequently occurred more than one full calendar day before hospital admission, and that the impact from this misclassification of the timing of stroke onset produces air pollution risk estimates that may be understated by as much as 40% [10]. Apart from that study, we know of no other research that has evaluated potential biases that may arise from inaccuracies related to the timing of the onset of stroke.
Another fundamental assumption of the time-series or case-crossover study design is that exposure can be adequately characterized using ambient concentrations from fixed-site monitoring networks. Some stroke studies have regressed the daily number of stroke events against a city-wide measure of ambient pollution [4,8], while in others, exposure has been assigned at a finer spatial scale by incorporating information on place of residence data [6,10]. However, such an approach does not necessarily reduce exposure measurement misclassification because the individual may have been away from home or outside the city at the time of stroke onset. Furthermore, for some pollutants, ambient concentrations at the patient's place of residence may not accurately reflect the salient exposure because they were inside when there stroke occurred. Recently, a study of models of traffic-related pollution estimates that included time-activity patterns suggested that the difference between the mobility-based model and residence-only models may lead to risk estimates biased toward the null [11]. Investigations into the effect of misclassification of ambient air pollution exposure on time-series models of acute health outcomes have shown that the Berkson type error due to differences between individual exposures and city-wide average exposures has a relatively modest influence on observed risk estimates [12,13].
The objective of this study was to examine the extent of misclassification of exposure based on data on patient residence and hospital admission time and date to increase our understanding of the possible biases in ambient air pollution risk estimates arising from the use of administrative datasets. We previously reported positive associations between ambient measures of air pollution and ED stroke visits among residents in the city of Edmonton, Canada [4]. Here, we compared responses from a survey of stroke and transient ischemic attack (TIA) patients presenting to two Edmonton area hospitals, with individual data on patient location at stroke and patient-reported time of stroke onset, and compared their responses to similar characteristics collected from administrative records.

Stroke patient data
The setting for this study was Edmonton, Alberta, Canada. The two study sites were the University of Alberta Hospital (UAH), an academic teaching hospital with designated stroke and neuroscience units, and the Royal Alexandra Hospital, a high-volume clinical teaching hospital with neurosurgical and neurological consultants available. Both sites are regional trauma centers, are staffed by full-time certified emergency physicians, and are involved in learner training. Given the presence of the stroke service at the UAH, more acute stroke patients present to this setting than other hospitals within the region.
Eligible patients were those who presented to the ED of the study sites between June 10, 2009 and May 19, 2010, and who had a discharge diagnosis of hemorrhagic or ischemic stroke or TIA: International Classification of Diseases, 10 th revision (ICD-10) codes I60 through I64, I67, and G45. Data were collected on excluded patients: reason for exclusion, triage date, and stroke classification.
At least one half-hour after examination and no more than 24 hours after presenting to hospital, and following the confirmation of a diagnosis of stroke or TIA by the attending physician, patients or their families were approached by a trained research assistant to participate in the study. Following informed written consent, the research assistant administered a questionnaire that collected information on the patient's place of residence, cigarette smoking behavior, and medical history. Patients were also asked to recall the date and time of stroke onset (i.e., when they first experienced stroke symptoms), where they were at onset (inside, outside, in vehicle), and what activity they were performing (resting, routine daily activity, exercise). They were also asked to indicate where they had spent most of their time on the day of stroke onset with one of the following responses: a) "Most of the time I was at home or within a 15 minute drive from my home", b) "Most of the time I was more than a 15 minute drive from home, but still in the Edmonton area", or c) "Most of the time I was outside of the Edmonton area".
Following the discharge, admission/transfer, or death, additional data for each patient were obtained from medical charts. Data extracted from the medical charts included: sex, age, postal code, ICD-10 code, outcome (e.g., admission, discharge, death), triage time and date, time and date of stroke onset, presence of heart disease, history of stroke, use of anti-hypertensive drugs and insulin or oral hypoglycemic medication (used as indicators of the presence of hypertension and type 2 diabetes, respectively), and the Canadian Triage and Acuity Scale (CTAS) score. The CTAS is an assessment tool for prioritizing ED patients upon arrival based on presenting complaints and classifies patients into five categories: resuscitation, emergent, urgent, less urgent and non-serious complaint [14].

Ambient Air Pollution Exposure
Hourly concentrations of ambient NO 2 (ppb) and PM 2.5 (particulate matter < 2.5 microns in diameter) (μg/m 3 ) were obtained from 3 continuous monitoring stations within the city of Edmonton during the study period. The monitors are part of the National Air Pollution Surveillance Network maintained by Environment Canada [15]. Daily mean values were calculated by taking the average of measures obtained from the three monitoring stations for the period January 01, 2009 to May 19, 2010.

Statistical and Data Analyses
We identified those patients whose self-reported date of stroke onset from the survey differed from the date of presentation to the ED captured by the medical chart. Chi-squared tests were applied to evaluate differences in patient characteristics between these patients and those whose reported day of stroke onset occurred on the day of presentation. Similar analyses were also undertaken to evaluate differences in patient characteristics among those who were close to home on the day of their stroke to those who were further away. Characteristics that were evaluated in these analyses included: age-group, sex, previous stroke, CTAS, season, smoking status, and history of hypertension, diabetes, and heart disease.
We also evaluated the effect of misclassification of the timing of stroke onset on assigned ambient concentrations of PM 2.5 and NO 2 . This was performed for those subjects whose self-reported day of stroke onset differed from the day of presentation to the ED. For each of these subjects, we calculated the difference in daily city-wide mean NO 2 and PM 2.5 levels between these two days. Histograms were then created to describe the frequency distribution of these differences, and a normal distribution was then fit to this distribution. In addition, we estimated the bias resulting from the mis-specification of stroke onset utilizing a previously developed formula [10]. Specifically, attenuation factor for the risk estimate was estimated by c = var(x)/{var(x) + var(x*x)}; where × represents the exposure at the time of stroke onset, and x* represents the exposure on the misspecified date of onset.

Ethics
Ethics approval for this study was provided by the Health Research Ethics Board (Biomedical Panel) at the University of Alberta and by Health Canada's Research Ethics Board.

Recruitment
From 760 stroke patients presenting during the study period, 239 were ineligible to participate because they resided outside of the metropolitan area of Edmonton (n = 164) or had discharge diagnoses that were incompatible with the study criteria (n = 75). Eight of the 521 eligible participants were unable to participate due to language barrier and 4 died in ED or were terminal. Ninety-five were missed (left the hospital or were transferred before a research assistant was able to administer the questionnaire), 64 were unable to respond, and 14 refused to participate (2.7% of 521).
Non-participants differed somewhat from study participants. Compared to the distribution of stroke subtypes among study participants, the distribution among those unable to respond at time of interview was shifted away from the TIA stroke classification and towards ischemic and hemorrhagic stroke (χ 2 = 14.0, p = 0.001). The distribution of sub-types among those who were missed by research assistants at the time of interview was not different compared to participants (χ 2 = 1.94, p = 0.38).

Sample
A total of 336 patients participated in the survey (173 men, 163 women) ( Table 1). Ischemic stroke was the most common type, accounting for more than twothirds of all cases. Almost two thirds of all patients who participated in the survey experienced a stroke for the first time. The prevalence of hypertension and diabetes in our survey population was 71.9% (95% CI: 67.1 -76.8%) and 21.5% (95% CI: 17.1 -25.9%), respectively. Most patients reported being within a 15 minute drive from home on the day when stroke onset symptoms occurred (84.5%; 95% CI: 80.6 -88.4%) and being inside

Presenting Times
Of the 336 participants, 241 (71.7%; 95% CI: 66.9 -76.6%) were able to recall the day on which symptoms of their stroke started. For 27.4% (95% CI: 21.7 -33.1%) of these 241 patients the date of self-reported stroke onset was at least one day before the date of presentation to ED. The number of days between the onset of stroke and presentation to ED ranged from 1 to 12 days (median = 1.0, mean = 1.8, 95% CI: 1.4 -2.3 days). Those presenting with higher acuity conditions were more likely to present to ED on the same day they first experienced stroke symptoms (χ 2 = 14.7, p < 0.001); however, stroke sub-type was not associated with delay (χ 2 = 2.35, p = 0.31) ( Table 2). The difference in hours between time of onset and time of presentation for 231 patients who could recall the time of stroke onset ranged from 0 to 191 hr (median = 3 hr, mean = 11.9 hr, Table 2 Selected characteristics for stroke patients whose self-reported day of stroke onset was the same as the day they presented to an ED compared to those that differed

Associations with Ambient Air Pollution
For patients who did not present to the ED until at least one day after the onset of stroke symptoms, the mean difference in city-wide mean NO 2 between the day of stroke onset and the day of presentation to ED was only 0.51 ppb (95% CI: -1.55 -2.56 ppb, Figure 1). We repeated these calculations for PM 2.5 , for which the mean difference was 0.14 μg/m 3 (95% CI: -2.11 -2.40 μg/m 3 , Figure 2). The differences of NO 2 or PM 2.5 were not normally distributed (Figures 1 and 2). Based on the variances of exposure at onset and at ED presentation, the approximated reduction in stroke risk using the exposure data for patients with delays would be 16% for a linear regression model of NO 2 effect, and 20% for a model of PM 2.5 effect.
Comparisons of patient characteristics for those who were most often within a 15 minute drive from home versus those who were farther away on the day of their stroke are presented in Table 3. Those who were younger (χ 2 = 25.4, p < 0.001) and hypertensive (χ 2 = 12.1, p = 0.001) were more likely to be further away from home at the time of stroke onset. No statistically significant differences were found between those within a 15 minute drive and those who were further away for cigarette smoking status, season, or sub-type of stroke.

Discussion
Recording stroke onset time and minimizing delay between onset and presentation to ED are critical for patients with cerebral infarction because effective treatment with recombinant tissue-type plasminogen activator has a post-onset treatment window of 4.5 hours [16]. Documentation of onset time in the stroke patient population is often missing or inaccurate. For example, examinations of the standards of stroke care in the Paul Coverdell National Stroke Surveillance database revealed that time of stroke onset was missing from 58% of 56,969 stroke cases in the US from 2005 to 2007 [17]. Possible patient factors contributing to missing such data include lack of patient recognition of stroke symptoms, symptom onset during sleep, cognitive impairment upon presentation, and lack of preliminary evidence of cerebral infarction. Possible physician factors contributing to missing or inaccurate data include lack of physician recognition of the importance of documenting stroke onset, busy/chaotic ED environment, patients' inability to recall events, presentation within/outside window driving clinical action, and lack of clear evidence of diagnosis at the time of charting.
While missing onset data can have serious clinical implications in stroke management, our results suggest that using date and time of presentation to ED from an administrative database as a surrogate for date of stroke onset should not lead to a strong exposure misclassification effect on risk estimates obtained from case-crossover studies. Time series and case-crossover analyses are Figure 1 Frequency distribution of differences* in assigned ambient NO 2 concentrations among 66 stroke patients whose self-reported day of stroke onset differed from the day of presentation. * The difference was calculated by subtracting the daily mean ambient NO 2 concentration on the self-reported day of stroke onset from the daily mean concentration on the day the patient presented to the ED.

Figure 2
Frequency distribution of differences* in assigned ambient PM 2.5 concentrations among 66 stroke patients whose self-reported day of stroke onset differed from the day of presentation. * The difference was calculated by subtracting the daily mean ambient PM 2.5 concentration on the self-reported day of stroke onset from the daily mean concentration on the day the patient presented to the ED.
effective study designs to describe associations between short-term fluctuations in ambient air pollution measures and adverse health outcomes. They may be, nonetheless, vulnerable to important sources of bias. We evaluated the potential bias from misclassifying the time of disease onset using administrative hospitalization data, and the potential for exposure misclassification by exploring activity patterns among stroke patients at the time of stroke onset. A systematic difference in ambient NO 2 and PM 2.5 levels could not be detected between day of stroke onset and day of presentation to hospital. Specifically, the mean difference was close to zero.
Our study follows up on a previous investigation that reported misclassification effects due to incomplete or inaccurate data on day of the outcome when dependent on hospital admission databases [10]. Comparing PM 2.5 concentrations at admission and at onset observed among 1,101 stroke patients in Boston, Lokken et al [10] reported a mean difference of -0.1 μg/m 3 . In our data, the difference in concentration between time of presentation and onset was similar (0.14 μg/m 3 ). The Boston study approximated a 50% reduction in ischemic stroke risk based on variance of PM 2.5 concentrations at onset and admission times under a linear regression model. The same approximation method applied to our PM 2.5 data showed a reduction of 20%, suggesting less variance in exposure misclassification in our Edmonton data. Employing the 16% attenuation factor for misclassification of NO 2 calculated from the present data, to the regression coefficient for an interquartile increase in the 3 day average NO 2 we reported in our earlier casecrossover study of ambient air pollution effects on admissions to Edmonton EDs for stroke [4], the observed odds ratio of 1.26 would have been reduced from 1.32.
Several systematic differences between the Boston study [10] and ours should be noted. Delays to ED presentation may be greater among a US population than a Canadian one, for whom accessibility and universality of hospital ED care reduces the potential for individuals of lower socioeconomic levels to postpone seeking medical treatment, as they do in the US [18]. Secondly, in the Boston study an onset time estimation method used a 6-hour time window when the onset time could not be given within a 15-minute time window; thus, onset date data was known or estimated for nearly 100% of patients. Imputing stroke onset back in time within a 6hour window can increase the likelihood of being classified as delayed. In contrast, we did not use an onset estimation method and had stroke onset data for roughly 72% of patients (similar to findings by others [19]). Finally, the Boston study did not include hemorrhagic strokes or TIAs. Since we observed no association between delays and subtype, this likely did not contribute to the differences in results between the studies. Nevertheless, there was an increased likelihood of a greater effect of misclassification in the Boston study data because the proportion of patients classified as delayed at least one day was higher (53%) and was well in excess of the range reported here and by others (18 - [20,21]. In the Boston study, Lokken et al also ran case-crossover models on simulated datasets reflecting the distribution of delays [10]. Logistic regression β parameters from models using stroke onset time data were 60 to 66% less than those using hospital admission data. It is unclear if we also would have found a systematic reduction in the short term effect of ambient NO 2 or PM 2.5 on stroke if onset date was used instead of ED presentation date in a case-crossover model. We found that higher CTAS scores were strongly associated with less delay in presentation to hospital after stroke symptoms first appeared. Our results echo those from an examination of delay risk factors including the modified National Institute of Health Stroke Scale (nMIHSS), a measure neurological impairment [19]. Those with high nMIHSS scores had less delay to arrival and, as in our study, previous stroke and stroke sub-type were not associated with delay. Similar to our findings with respect to presentation dates, others found that delay to hospital admission was not associated with age, smoking, hypertension, diabetes, or heart disease, and that those with previous stroke are marginally less likely to be delayed [21].
A high response rate was a strength of this study (2.7% refused to participate); however, there were many stroke patients who were missed or unable to respond. The data available for non-participants showed that the proportion of patients who suffered more clinically severe stroke events (ischemic and hemorrhagic stroke) was higher among those unable to respond than it was among participants. Nevertheless, given that sub-type was not associated with delay to ED, there is no indication that the sub-type distribution among non-participants would have been a source of bias of our results.
Our data revealed that a very high proportion of stroke patients were indoors during stroke onset, which is not surprising, as adult Canadians spend more than ¾ of their time indoors, even in summer months [22]. Personal exposure to NO 2 and PM 2.5 can be different than background exposure measurements, and in Canada the difference can be dependent on season, the use of gas heating, gas stoves, and air conditioning [23][24][25]. For these reasons, data on, or models of indoor air pollution exposure could substantially improve our understanding of the effect of personal exposure to indoor air pollution on acute health outcomes in Canada.
Spatial variability is an additional source of exposure measurement error that may have affected our results. These errors are caused by the inability of centralized monitoring station data to precisely reflect ambient levels for individuals. In our study we used city-wide daily averages; our measurements were, thus, affected by Berkson-type error [12]. Arguing that measurement error in air pollution data is often distributed lognormally, Goldman et al concluded that spatial error can result in inflation of a risk ratio on a per unit concentration basis, depending on the degree of spatial variability present in the study area [26]. This is a critical source of error, then, that could have a stronger effect on risk estimates than would the error with respect to timing of stroke onset inherent in hospital administrative databases we observed.

Conclusions
Our results demonstrated that a non-negligible proportion of stroke patients presented to ED after a day or more following onset of symptoms. Despite this observation, for those who were delayed, the differences in ambient NO 2 and PM 2.5 between onset and triage dates were not significantly different from zero; thus, the potential impact of exposure misclassification due to the use of ED presentation time on case-crossover risk estimates of pollution effects on stroke events is likely to be small.