The analysis utilized an ecologic study design in which the unit of observation was a province, and used an extensive secondary dataset compiled from governmental surveillance data, censuses and surveys, and meteorologic estimates.
Geographic scope
Peru is comprised of 25 departments, which encompass 195 provinces, which are further sub-divided into districts. Each district belongs to a single province, allowing for aggregation of district data to the provincial level. Analyses were conducted for the 195 provinces of Peru: data from districts assigned to a new (196th) province in 2014 were analyzed with their former province. Province populations range from < 4000 residents (Purus province, Ucayali department) to > 7 million residents (Lima province); areas range from ~ 150 km2 (province of Callao) to > 67,000 km2 (province of Loreto) [29]. The choice of conducting the analysis at a province level, rather than at a district level, was based on small child populations in many districts; this meant a high proportion of weeks had no diarrhea cases in many districts. There were also difficulties in obtaining estimates of the child population and piped water access at a district level.
Data sources and definitions
Diarrhea cases and rotavirus vaccine data
The Peruvian Ministry of Health (MINSA) collects obligatory weekly surveillance data on diarrhea visits from all public inpatient and outpatient clinics in Peru. Private clinics also send regular weekly reports if they opt into the reporting system. Diarrhea cases refer to patients presenting to a clinic with an increase in frequency of bowel movements (three or more bowel movements in 24 h), or in fluidity or volume of stool compared to usual, with onset within the past 2 weeks. Cases are aggregated by age group (< 1 year old, 1–4 years old, ≥5 years old), and assigned to the patient’s district of residence. We use the term “childhood diarrhea rate” to describe the rate of clinic visits for diarrhea in children under 5 years old. We analyzed cases reporting to clinics between January 5, 2005 and December 16, 2015. Weekly data on diarrhea were summed for each province.
MINSA also collects rotavirus immunization data. Rotarix®, which is administered to infants in two doses between approximately 6 weeks and 6 months of age, was added to the national immunization schedule in 2009. By 2010, most Peruvian infants were receiving both doses of the rotavirus vaccine [8]. Data were analyzed for the “pre-(rotavirus) vaccine era” (2005–2009) and “post-vaccine era” or “rotavirus vaccine era” (2010–2015).
Piped water and population
Data from the Peruvian national census (2007 and 2017) and SISFOH (El Sistema de Focalización de Hogares, the Household Focalization System conducted 2012–2013) were used to estimate the percentage of households in each province with access to piped water (households for which the primary drinking water source was water piped inside or outside of the home, but within the building area). The national censuses in 2007 and 2017 were used to estimate the annual population of children under 5 years old by province. Detailed methods of obtaining annual piped water and population estimates were described previously [8].
In order to classify provinces by piped water access across the 11-year study period, provinces were divided into three groups: (1) provinces that had consistently lower access than the rest of the country throughout the study period, (2) provinces that had consistently higher access, and (3) provinces that fit into neither category, namely that transitioned from lower to higher access throughout the study period. Low piped drinking water access provinces were defined as those in which < 60% of households had access to a piped water connection in all study years (2005–2015), or all but 1 year. High piped drinking water access provinces were those in which ≥60% of households had access to a piped water connection in all study years, or all but 1 year. Transitional provinces were those in which < 60% had access to piped water for at least two study years, but transitioned to higher access (≥60% for at least 2 years). The choice of cut-off for the categories was based on the median access level across all years and all provinces, to maximize the number of provinces in the high and low categories. In our previous research utilizing these data, we observed that provinces with higher levels of access to piped drinking water also tended to have higher levels of access to a toilet connected to the sewerage system [8]; therefore, in our primary analysis we considered only piped water access as a proxy for improved drinking water and sanitation.
Temperature data source
The Peruvian National Meteorology and Hydrology Service (SENAMHI) provided temperature data from the Peruvian Interpolation of SENAMHI’s Climatological and hydrological Observations temperature product (PISCOt). Temperature estimates were constructed utilizing data from 178 quality-controlled, gap-filled, and homogenized air temperature stations, remote-sensed data, and a set of topographic predictors [30]. PISCOt includes daily minimum temperatures (Tmin) and maximum temperatures (Tmax) at 0.1° gridded spatial resolution (each grid approximately 10 km × 10 km) for the country of Peru. The Tmin and Tmax values were averaged to create gridded daily mean temperature (Tmean) values.
Construction of meteorological variables
Control for dry/rainy season
Provinces were classified into one of Peru’s three major geographic regions: the coast (departments in western Peru bordering the Pacific Ocean), the Peruvian Amazon (departments in eastern Peru), and the mountains (running north to south between the coast and Peruvian Amazon, along the Andes Mountains) [31]. June–August was considered the dry season in the mountain provinces and the Peruvian Amazon provinces. Coastal provinces receive very little rainfall, nevertheless have a drier season from June to November, which was classified as the dry season in analyses [32].
We did not consider precipitation measurements in this analysis. Rainfall patterns tend to be highly localized and more spatially variable than temperature estimates; use of typical rainfall datasets can lead to large bias in analyses of waterborne disease [33]. The association between rainfall and diarrhea is complex: though heavy rainfall can be associated with increased diarrhea, this relationship may depend on antecedent conditions [34, 35].
El Niño
The El Niño Southern Oscillation (“El Niño”), is a global pattern of climate variability associated with unusual warming of the Pacific Ocean near the equator, resulting in increased temperature and changes in precipitation in Peru [36]. The U.S. National Oceanic and Atmospheric Administration (NOAA) reports data on the Oceanic Niño Index (ONI), calculated using a standard 3-month mean of sea surface temperature anomalies in the Niño 3.4 region of the Pacific Ocean [37]. For each month in our study, we assigned the corresponding ONI value of the three-month running average in which the study month was the midpoint. For example, the ONI for February 2005 was assigned the ONI running average for January–March 2005. El Niño periods were defined using the ONI, with values in the ranges 0.5–0.9, 1.0–1.4, and ≥ 1.5 corresponding to weak, moderate, and strong El Niño events, respectively [38]. We compared months with a moderate or strong El Niño to months with a weak El Niño or no El Niño. Moderate/strong El Niño periods in the study were from October 2009 to February 2010 and May–December 2015.
Province-level temperature estimates
The gridded Tmax temperature values were averaged within each district boundary, giving a daily high temperature for each district. Province-level daily temperature estimates were estimated using population-weighted averages of the district-level temperatures, giving more weight to temperatures in more populous districts of the province. The same process was used to estimate province-level daily Tmean values.
After comparing three temperature variables for model fit (the weekly maximum of the daily high temperatures, the weekly mean of the daily high temperatures, and the weekly mean of the daily mean temperatures; Additional File 1), the weekly mean of daily high temperatures fit the data best and was selected as the main temperature exposure variable for the multivariable models described in this manuscript. An example of a time series displaying the weekly mean of daily high temperatures, including indications of dry seasons and El Niño events, is provided in Additional File 2.
Statistical analysis
Data were compiled and cleaned in R 3.5.1 (R Foundation for Statistical Computing, Vienna, Austria) and analyzed using SAS 9.4 (SAS Institute, Inc., Cary, NC). We analyzed province-level weekly counts of clinic visits for diarrhea in children < 5 years using negative binomial generalized estimating equations (GEEs) with autoregressive correlation structures. We first examined whether temperature was associated with weekly childhood clinic visits for diarrhea in a model that included the following variables: the weekly mean of daily high temperatures, with distributed lags of 1, 2 and 3 weeks; an indicator for the rotavirus vaccine era; an indicator for dry/wet season (which varied by month and region of Peru); an indicator for moderate/strong El Niño events; a continuous variable for the study year (to account for secular trend); and a variable to control for province, to focus on week-to-week temperature changes within each province, while controlling for other unmeasured province-level factors. We also included an offset for provincial population (child population < 5 years).
To examine whether the association between temperature and childhood clinic visits for diarrhea differed between the pre- and post-vaccine eras, we ran the model above with additional terms for the interaction between each lagged temperature value and the rotavirus vaccine era term. All analyses were performed on all provinces combined, and also stratified by the three groups of access to piped water, in three separate models. As the definition of piped water access was fixed for each province for the study period, only a stratified analysis was performed to assess effect modification; piped water access was not considered as a main effect.
The exponentiated coefficients for the included terms were reported as estimates of incidence rate ratios (IRRs). In addition to the separate IRRs for each of the three lagged terms for temperature, the combined effect of temperature across all included lags (the exponentiated sum of the three coefficients) was also reported as an IRR estimate for a single exposure metric, referred to here as the “temperature-diarrhea association”.
Sensitivity analysis
Many provinces of Peru have low temperature variability, i.e., temperature is relatively constant throughout the year. Hypothesizing that these provinces might contribute little to the temperature-diarrhea association, we conducted a sensitivity analysis limited to provinces with higher variability in daily maximum temperature.
To consider the variability in temperature throughout the year in each province, we constructed monthly average temperatures, which were the mean of the daily high temperatures in each month of the year, excluding data from El Niño periods. As an example, the January mean daily high for a province was an average of all daily high temperatures from any January in the study period (2005–2015), excluding any January during El Niño periods. Of the 194 provinces included in the analysis, 133 (68.6%) had less than a 3 °C difference between the mean temperature in the warmest month and coolest month (Additional File 3). These provinces were classified as having low temperature variability and were excluded in the sensitivity analysis models.