I read with interest the recent analysis by Dropkin [1] of the Japanese atomic bomb survivors Life Span Study (LSS) 12 mortality data [2]. There are a number of statistical problems with the paper.
Dropkin chooses to model six cancer sites, namely stomach, liver, lung, colon, uterus and all solid cancer mortality; these sites were apparently chosen as the leading malignant causes of death. This choice of sites is curious, including one site (uterus) for which there is little evidence of associations with radiation exposure, whether in the LSS mortality [2, 3] or incidence data [4] or in any other group [5, 6]. Dropkin does not include such major radiogenic sites as female breast, bladder, ovarian and oesophageal cancer [2-6]. There may be issues of selection bias associated with these choice of cancer sites, also with the 20 mSv cutoff used to truncate the data, which may affect interpretation of the p-values.
Dropkin fits a latency parameter, phi, the effect of which is to set to zero all excess relative risk less than phi years after exposure. A fundamental statistical problem with the use of models incorporating such a latency adjustment, which Dropkin [1] ignores, is that the asymptotic (chi**2) distribution of the deviance difference statistic employed for significance tests is not guaranteed, because of a lack of sufficient (C2) smoothness in the likelihood as a function of this parameter [7]; in particular the likelihood is not even continuous in this parameter, as is to some extent obvious from his figures. Another problem with the fitting of this latency parameter, is that in testing against the null hypothesis (of no effect, where the parameters beta=sigma=0), where the auxiliary parameters (phi, tau) have no effect on the likelihood, the distribution of deviance differences can no longer be assumed to be chi**2 with the appropriate number of degrees of freedom [8]. That these problems are not merely of theoretical concern is shown by analysis I have performed, simulating from the null (control) distribution (no radiation effect) which shows that the observed deviance difference statistic (two-phase model vs control) (under simulations from the control model predicted expected numbers of deaths) for all solid cancers in the LSS13 data [3] has distribution further from 0 than the theoretical chi**2 distributions. This also demonstrates that the observed deviance difference, 6.32, is not statistically significant (p=0.37), in contrast to Tables 1 and 3 of Dropkin [1]. It should be noted that the model that I fitted is slightly different from that employed by Dropkin, in that a semi-parametric stratified model for the background (zero dose) cancer rates is employed, in contrast to the parametric model used by Dropkin; however, this would not be expected to affect the statistical inferences made.
In summary, there are major statistical problems in the analysis of Dropkin [1], associated with certain technical problems in the statistical tests employed. These problems are sufficient to invalidate his conclusion that “the standard linear model … greatly underestimates the risks at low doses”.
References
1. Dropkin G: Low dose radiation and cancer in A-bomb survivors: latency and non-linear dose-response in the 1950-90 mortality cohort. Environ Health 2007, 6 1.
2. Pierce DA, Shimizu Y, Preston DL, Vaeth M, Mabuchi K: Studies of the mortality of atomic bomb survivors. Report 12, Part I. Cancer: 1950-1990. Radiat Res 1996, 146: 1 27.
3. Preston DL, Shimizu Y, Pierce DA, Suyama A, Mabuchi K: Studies of mortality of atomic bomb survivors. Report 13: solid cancer and noncancer disease mortality: 1950 1997. Radiat Res 2003, 160: 381 407.
4. Thompson DE, Mabuchi K, Ron E, Soda M, Tokunaga M, Ochikubo S, Sugimoto S, Ikeda T, Terasaki M, Izumi S, Preston DL: Cancer incidence in atomic bomb survivors. Part II: solid tumors, 1958 1987. Radiat Res 1994, 137: S17 S67.
5. United Nations Scientific Committee on the Effects of Atomic Radiation (UNSCEAR): Sources and effects of ionizing radiation. UNSCEAR 2000 Report to the General Assembly, with scientific annexes. Volume II: Effects. United Nations, New York; 2000.
6. Little MP: Comparison of the risks of cancer incidence and mortality following radiation therapy for benign and malignant disease with the cancer risks observed in the Japanese A-bomb survivors. Int J Radiat Biol 2001, 77: 431-464.
7. Schervish MJ: Theory of statistics. Springer-Verlag, New York; 1995.
8. Davies RB: Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika 1977, 64: 247-54.
Competing interests
None declared
Response to Mark Little
Greg Dropkin, none
14 March 2007
Dr. Mark Little questions my choice of cancers for analysis, and aspects of the statistical method.
I focused on the 0 - 20 mSv dose range because the ICRP recommended annual occupational dose limit is 20 mSv. I then chose the 5 most common cancers in the 0 - 20 mSv subcohort.
Other cancer sites mentioned by Mark Little have fewer cases (deaths) in this subcohort. I agree that it would be of interest to extend the analysis to consider radiogenic sites including the female breast and oesophagus. For this spare time project with limited computing power, I had to draw the line somewhere. But if a more extensive analysis were to find that the stomach, liver, and lung are the only specific sites to show significant results with these models in this dataset and dose range, even that limited result would still be interesting.
On the method, Mark Little points out that the sample likelihood is not a continuous function of the latency parameter phi. As mentioned in the paper, it would make more sense to allow latency to modify dose by some smooth function rather than an abrupt switch from no effect to full effect when Time-Since-Exposure passes phi. I did not have the computing power to attempt this.
However, the discontinuous variation of likelihood with latency could only cause problems with the method if latency varied during the modelling. In fact the modelling including LRT comparisons and estimated Confidence Intervals for ERR, is carried out after phi is chosen and fixed (see Methods and Figures 2 and 3). Once phi is fixed, the sample likelihood depends smoothly on all those model parameters allowed to vary (i.e. every other parameter in the model), and LRT for comparing hypotheses at that fixed choice of phi is asymptotically chi-square distributed. When another phi value is chosen, modelling begins again. Results were shown for phi = 5, 6... 44 and modelling was carried out separately at each of these values. Latency phi was then chosen to maximise LRT for comparing the model with the control model, and subsequent modelling was then carried out with phi fixed at this "optimal" value.
Mark Little's second statistical point concerns the distribution of LRT, and is relevant. He argues that phi and tau do not affect the control model and that from simulations carried out using a different model on different data (LSS13) he does not believe LRT2p-con for comparing the two-phase and control models is chi-square distributed on 2 d.f.
A simulation (200 trials) for the stomach with latency fixed at 10 years does bear out his point. I compute the deviance difference LRT2p-con when the two-phase and control models are fitted to simulated data generated by sampling Poisson distributions in each cell based on the expected values if the control model is fitted to the actual data. The cumulative distribution of this deviance difference is extremely close to a chi-square distribution on 3 d.f., not 2 d.f. as I had thought. Note that latency phi is fixed throughout the simulation.
With the real data for the stomach at latency 10 years the observed LRT2p-con is 18.66. With 3 d.f. the p value would be 0.0003 rather than 0.0001. Thus the two-phase model is still highly significant against control. Likewise if LRT2p-lin for comparing the two-phase and linear models at fixed latency has 2 d.f. rather than 1, the evidence of non-linearity is still extremely strong for the stomach, liver, lung and all-solid cancers in the LSS12 dataset 0 - 20 mSv subcohort.
The comparison with the ICRP predictions is based on ERR not LRT. In fact the entire discussion of LRT2p-con is irrelevant to the computation of ERR and its 95% Confidence Interval, defined in Methods. The CI is based on the Deviance difference between the 2-phase model with and without constraints on ERR, again with phi fixed throughout the computation. The CI is defined to exclude those values of ERR for which all possible choices of the model parameters yield a deviance which differs from the absolute minimum (at fixed latency phi) by more than the appropriate critical value. In the two-phase model ERR depends on beta, sigma and tau, and the relevant chi-square distribution used in computing the confidence interval has 3 d.f. Neither the value of LRT2p-con nor its distribution enters this computation, nor does the behaviour of the sample likelihood at any other value of phi.
However, estimating the optimal latency by maximising LRT2p-con is still a sensible strategy, whatever its distribution under the null hypothesis.
Significant effects occur over a whole range of latencies. For example with the stomach let phi vary from 8.0 to 15.0 by increments of 0.1 years. For these 71 latencies, LRT2p-con has mean 15.42, s.d. 3.01, min 10.76, and max 20.14. ERR for 10 mSv has mean 0.368, s.d. 0.047, min 0.289, and max 0.437. 95%CI's for ERR computed at latencies 8, 9, 10... 15 and also at 13.4 when ERR attains its minimum value in this sample, are all strictly positive. The ERR values do not differ greatly from the results at the optimal latency (11.89 yrs) at which ERR = 0.46 with 95%CI (0.11 , 0.94)
Linear extrapolation of LSS12, as per the ICRP approach, predicts ERR at 10 mSv to be 0.0024. Values found with the two-phase model for latencies between 8 and 15 years are all at least 100 times higher. The standard linear model greatly underestimates the risks at low doses.
For all these reasons I do not see Mark Little's statistical criticisms as invalidating the paper. The first assumes an approach (free variation of phi) which I did not use and the second, which I appreciate and accept, concerns the distribution of LRT2p-con but has no bearing on ERR and its confidence interval which are central to the comparison with the ICRP estimates.
Standard analyses of A-bomb survivors involve a hidden assumption of latency because survivors entered the cohort 5 years after the bombing. The dose variable in those analyses is thus already lagged by 5 years. Why 5? Why not 10, 15, or 11.89?
Greg Dropkin
Competing interests
none
Biphasic model for chromosome aberrations in barley seeds
Alfred Koerblein, Munich Environmental Institute (Umweltinstitut Muenchen)
17 March 2009
Dear Dr Dropkin
I used your biphasic model ERR~f(dose,beta,sigma,tau) for chromosome aberrations in barley seeds (see Geras'kin SA, Oudalova AA, Kim JK, Dikarev VG, Dikareva NS. Cytogenetic effect of low dose gamma-radiation in Hordeum vulgare seedlings: non-linear dose-effect relationship. Radiat Environ Biophys. 2007 Mar;46(1):31-41.) Your model fits these data perfectly well. I determine a value of R of 8.4. If you are interested in my analysis just contact me (alfred.koerblein@gmx.de).
Statistical problems with analysis of Dropkin
14 March 2007
I read with interest the recent analysis by Dropkin [1] of the Japanese atomic bomb survivors Life Span Study (LSS) 12 mortality data [2]. There are a number of statistical problems with the paper.
Dropkin chooses to model six cancer sites, namely stomach, liver, lung, colon, uterus and all solid cancer mortality; these sites were apparently chosen as the leading malignant causes of death. This choice of sites is curious, including one site (uterus) for which there is little evidence of associations with radiation exposure, whether in the LSS mortality [2, 3] or incidence data [4] or in any other group [5, 6]. Dropkin does not include such major radiogenic sites as female breast, bladder, ovarian and oesophageal cancer [2-6]. There may be issues of selection bias associated with these choice of cancer sites, also with the 20 mSv cutoff used to truncate the data, which may affect interpretation of the p-values.
Dropkin fits a latency parameter, phi, the effect of which is to set to zero all excess relative risk less than phi years after exposure. A fundamental statistical problem with the use of models incorporating such a latency adjustment, which Dropkin [1] ignores, is that the asymptotic (chi**2) distribution of the deviance difference statistic employed for significance tests is not guaranteed, because of a lack of sufficient (C2) smoothness in the likelihood as a function of this parameter [7]; in particular the likelihood is not even continuous in this parameter, as is to some extent obvious from his figures. Another problem with the fitting of this latency parameter, is that in testing against the null hypothesis (of no effect, where the parameters beta=sigma=0), where the auxiliary parameters (phi, tau) have no effect on the likelihood, the distribution of deviance differences can no longer be assumed to be chi**2 with the appropriate number of degrees of freedom [8]. That these problems are not merely of theoretical concern is shown by analysis I have performed, simulating from the null (control) distribution (no radiation effect) which shows that the observed deviance difference statistic (two-phase model vs control) (under simulations from the control model predicted expected numbers of deaths) for all solid cancers in the LSS13 data [3] has distribution further from 0 than the theoretical chi**2 distributions. This also demonstrates that the observed deviance difference, 6.32, is not statistically significant (p=0.37), in contrast to Tables 1 and 3 of Dropkin [1]. It should be noted that the model that I fitted is slightly different from that employed by Dropkin, in that a semi-parametric stratified model for the background (zero dose) cancer rates is employed, in contrast to the parametric model used by Dropkin; however, this would not be expected to affect the statistical inferences made.
In summary, there are major statistical problems in the analysis of Dropkin [1], associated with certain technical problems in the statistical tests employed. These problems are sufficient to invalidate his conclusion that “the standard linear model … greatly underestimates the risks at low doses”.
References
1. Dropkin G: Low dose radiation and cancer in A-bomb survivors: latency and non-linear dose-response in the 1950-90 mortality cohort. Environ Health 2007, 6 1.
2. Pierce DA, Shimizu Y, Preston DL, Vaeth M, Mabuchi K: Studies of the mortality of atomic bomb survivors. Report 12, Part I. Cancer: 1950-1990. Radiat Res 1996, 146: 1 27.
3. Preston DL, Shimizu Y, Pierce DA, Suyama A, Mabuchi K: Studies of mortality of atomic bomb survivors. Report 13: solid cancer and noncancer disease mortality: 1950 1997. Radiat Res 2003, 160: 381 407.
4. Thompson DE, Mabuchi K, Ron E, Soda M, Tokunaga M, Ochikubo S, Sugimoto S, Ikeda T, Terasaki M, Izumi S, Preston DL: Cancer incidence in atomic bomb survivors. Part II: solid tumors, 1958 1987. Radiat Res 1994, 137: S17 S67.
5. United Nations Scientific Committee on the Effects of Atomic Radiation (UNSCEAR): Sources and effects of ionizing radiation. UNSCEAR 2000 Report to the General Assembly, with scientific annexes. Volume II: Effects. United Nations, New York; 2000.
6. Little MP: Comparison of the risks of cancer incidence and mortality following radiation therapy for benign and malignant disease with the cancer risks observed in the Japanese A-bomb survivors. Int J Radiat Biol 2001, 77: 431-464.
7. Schervish MJ: Theory of statistics. Springer-Verlag, New York; 1995.
8. Davies RB: Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika 1977, 64: 247-54.
Competing interests
None declared
Response to Mark Little
14 March 2007
Dr. Mark Little questions my choice of cancers for analysis, and aspects of the statistical method.
I focused on the 0 - 20 mSv dose range because the ICRP recommended annual occupational dose limit is 20 mSv. I then chose the 5 most common cancers in the 0 - 20 mSv subcohort.
Other cancer sites mentioned by Mark Little have fewer cases (deaths) in this subcohort. I agree that it would be of interest to extend the analysis to consider radiogenic sites including the female breast and oesophagus. For this spare time project with limited computing power, I had to draw the line somewhere. But if a more extensive analysis were to find that the stomach, liver, and lung are the only specific sites to show significant results with these models in this dataset and dose range, even that limited result would still be interesting.
On the method, Mark Little points out that the sample likelihood is not a continuous function of the latency parameter phi. As mentioned in the paper, it would make more sense to allow latency to modify dose by some smooth function rather than an abrupt switch from no effect to full effect when Time-Since-Exposure passes phi. I did not have the computing power to attempt this.
However, the discontinuous variation of likelihood with latency could only cause problems with the method if latency varied during the modelling. In fact the modelling including LRT comparisons and estimated Confidence Intervals for ERR, is carried out after phi is chosen and fixed (see Methods and Figures 2 and 3). Once phi is fixed, the sample likelihood depends smoothly on all those model parameters allowed to vary (i.e. every other parameter in the model), and LRT for comparing hypotheses at that fixed choice of phi is asymptotically chi-square distributed. When another phi value is chosen, modelling begins again. Results were shown for phi = 5, 6... 44 and modelling was carried out separately at each of these values. Latency phi was then chosen to maximise LRT for comparing the model with the control model, and subsequent modelling was then carried out with phi fixed at this "optimal" value.
Mark Little's second statistical point concerns the distribution of LRT, and is relevant. He argues that phi and tau do not affect the control model and that from simulations carried out using a different model on different data (LSS13) he does not believe LRT2p-con for comparing the two-phase and control models is chi-square distributed on 2 d.f.
A simulation (200 trials) for the stomach with latency fixed at 10 years does bear out his point. I compute the deviance difference LRT2p-con when the two-phase and control models are fitted to simulated data generated by sampling Poisson distributions in each cell based on the expected values if the control model is fitted to the actual data. The cumulative distribution of this deviance difference is extremely close to a chi-square distribution on 3 d.f., not 2 d.f. as I had thought. Note that latency phi is fixed throughout the simulation.
With the real data for the stomach at latency 10 years the observed LRT2p-con is 18.66. With 3 d.f. the p value would be 0.0003 rather than 0.0001. Thus the two-phase model is still highly significant against control. Likewise if LRT2p-lin for comparing the two-phase and linear models at fixed latency has 2 d.f. rather than 1, the evidence of non-linearity is still extremely strong for the stomach, liver, lung and all-solid cancers in the LSS12 dataset 0 - 20 mSv subcohort.
The comparison with the ICRP predictions is based on ERR not LRT. In fact the entire discussion of LRT2p-con is irrelevant to the computation of ERR and its 95% Confidence Interval, defined in Methods. The CI is based on the Deviance difference between the 2-phase model with and without constraints on ERR, again with phi fixed throughout the computation. The CI is defined to exclude those values of ERR for which all possible choices of the model parameters yield a deviance which differs from the absolute minimum (at fixed latency phi) by more than the appropriate critical value. In the two-phase model ERR depends on beta, sigma and tau, and the relevant chi-square distribution used in computing the confidence interval has 3 d.f. Neither the value of LRT2p-con nor its distribution enters this computation, nor does the behaviour of the sample likelihood at any other value of phi.
However, estimating the optimal latency by maximising LRT2p-con is still a sensible strategy, whatever its distribution under the null hypothesis.
Significant effects occur over a whole range of latencies. For example with the stomach let phi vary from 8.0 to 15.0 by increments of 0.1 years. For these 71 latencies, LRT2p-con has mean 15.42, s.d. 3.01, min 10.76, and max 20.14. ERR for 10 mSv has mean 0.368, s.d. 0.047, min 0.289, and max 0.437. 95%CI's for ERR computed at latencies 8, 9, 10... 15 and also at 13.4 when ERR attains its minimum value in this sample, are all strictly positive. The ERR values do not differ greatly from the results at the optimal latency (11.89 yrs) at which ERR = 0.46 with 95%CI (0.11 , 0.94)
Linear extrapolation of LSS12, as per the ICRP approach, predicts ERR at 10 mSv to be 0.0024. Values found with the two-phase model for latencies between 8 and 15 years are all at least 100 times higher. The standard linear model greatly underestimates the risks at low doses.
For all these reasons I do not see Mark Little's statistical criticisms as invalidating the paper. The first assumes an approach (free variation of phi) which I did not use and the second, which I appreciate and accept, concerns the distribution of LRT2p-con but has no bearing on ERR and its confidence interval which are central to the comparison with the ICRP estimates.
Standard analyses of A-bomb survivors involve a hidden assumption of latency because survivors entered the cohort 5 years after the bombing. The dose variable in those analyses is thus already lagged by 5 years. Why 5? Why not 10, 15, or 11.89?
Greg Dropkin
Competing interests
none
Biphasic model for chromosome aberrations in barley seeds
17 March 2009
Dear Dr Dropkin
I used your biphasic model ERR~f(dose,beta,sigma,tau) for chromosome aberrations in barley seeds (see Geras'kin SA, Oudalova AA, Kim JK, Dikarev VG, Dikareva NS. Cytogenetic effect of low dose gamma-radiation in Hordeum vulgare seedlings:
non-linear dose-effect relationship. Radiat Environ Biophys. 2007 Mar;46(1):31-41.) Your model fits these data perfectly well. I determine a value of R of 8.4.
If you are interested in my analysis just contact me (alfred.koerblein@gmx.de).
Best regards,
Alfred Koerblein
Competing interests
I have no competing interests