Reanalysis of cancer mortality in Japanese A-bomb survivors exposed to low doses of radiation: bootstrap and simulation methods

Background The International Commission on Radiological Protection (ICRP) recommended annual occupational dose limit is 20 mSv. Cancer mortality in Japanese A-bomb survivors exposed to less than 20 mSv external radiation in 1945 was analysed previously, using a latency model with non-linear dose response. Questions were raised regarding statistical inference with this model. Methods Cancers with over 100 deaths in the 0 - 20 mSv subcohort of the 1950-1990 Life Span Study are analysed with Poisson regression models incorporating latency, allowing linear and non-linear dose response. Bootstrap percentile and Bias-corrected accelerated (BCa) methods and simulation of the Likelihood Ratio Test lead to Confidence Intervals for Excess Relative Risk (ERR) and tests against the linear model. Results The linear model shows significant large, positive values of ERR for liver and urinary cancers at latencies from 37 - 43 years. Dose response below 20 mSv is strongly non-linear at the optimal latencies for the stomach (11.89 years), liver (36.9), lung (13.6), leukaemia (23.66), and pancreas (11.86) and across broad latency ranges. Confidence Intervals for ERR are comparable using Bootstrap and Likelihood Ratio Test methods and BCa 95% Confidence Intervals are strictly positive across latency ranges for all 5 cancers. Similar risk estimates for 10 mSv (lagged dose) are obtained from the 0 - 20 mSv and 5 - 500 mSv data for the stomach, liver, lung and leukaemia. Dose response for the latter 3 cancers is significantly non-linear in the 5 - 500 mSv range. Conclusion Liver and urinary cancer mortality risk is significantly raised using a latency model with linear dose response. A non-linear model is strongly superior for the stomach, liver, lung, pancreas and leukaemia. Bootstrap and Likelihood-based confidence intervals are broadly comparable and ERR is strictly positive by bootstrap methods for all 5 cancers. Except for the pancreas, similar estimates of latency and risk from 10 mSv are obtained from the 0 - 20 mSv and 5 - 500 mSv subcohorts. Large and significant cancer risks for Japanese survivors exposed to less than 20 mSv external radiation from the atomic bombs in 1945 cast doubt on the ICRP recommended annual occupational dose limit.


Background
Analyses of cancer mortality in survivors of the atomic bombing of Japan in 1945 have played a central role in establishing risk estimates and radiation protection limits for workers and the wider public. While many people exposed at Hiroshima and Nagasaki received fairly high doses, the recommended occupational and environmental dose limits aim to control the risks from much lower doses. Typically, analyses of the full cohort of Japanese survivors, whose doses ranged from 0 -4000 mSv or more, are extrapolated downwards to predict the risks from doses below 20 mSv [1]. However, even if a particular model gives a good overall account of the full range of data, fitting that model and extrapolating the results may misrepresent the risks at low doses where the dose response may be quite distinct.
My previous paper [2] considered the cancer mortality data for those survivors whose external (flash) dose from the A-bombs was below 20 mSv, the annual occupational dose limit recommended by the International Commission on Radiological Protection (ICRP). I applied several different models, each of which used as their exposure variable the radiation dose lagged by an unknown latency. One model adopted the standard assumption that Excess Relative Risk (ERR) is proportional to dose, in this case the lagged dose. Other models allowed that ERR might be a non-linear function of dose, as suggested by radiation cell biology studies of the bystander effect. All the models detected ERR values for a dose of 10 mSv which were far higher than predicted by extrapolating the standard analyses of the full cohort. The linear model found large, significant results for the liver, while the preferred non-linear model found large, significant results for the stomach, liver, and lung. These results occurred over a range of latencies.
The statistical method I used to assess the significance of the findings with non-linear models was criticised by Mark Little [3]. In this paper the data is re-analysed with other methods of statistical inference for the preferred non-linear model.
The models use Poisson regression to analyse the grouped data in the 0 -20 mSv subcohort of the  Life Span Study LSS12 [4]. The models take the form λ = λ o (1+ERR) where λ is the cancer risk, λ o depends only on control variables, and ERR depends only on the lagged radiation dose. Profile Likelihood Confidence Intervals for ERR were based in [2] on the Likelihood Ratio Test (LRT) for comparisons between nested models. Once latency is fixed, the models depend smoothly on all other parameters. If Wilks Theorem [5] holds, the asymptotic null distribution of LRT is χ 2 on d degrees of freedom when the null hypothesis is specified by fixing the values of d parameters in the wider model. For example, the LRT comparison of the linear model ERR = βD with the control model ERR = 0 is asymptotically χ 2 on 1 d.f. if the control model holds. Here the null hypothesis is β = 0 and Wilks Theorem does apply to this nested pair. As Little [3] pointed out, the non-linear models are less well behaved. The transient model ERR = σDexp(-τD) and the two-phase model ERR = βD + σDexp (-τD) are indeterminate in τ when σ = 0, so one of the regularity conditions for Wilks Theorem fails. In any case, even if the asymptotic null distribution of LRT were known, the distribution for a given finite set of records may differ from the asymptotic case.
However, the actual distribution of LRT can be estimated by simulation before comparing two competing models or constructing confidence intervals. Bootstrap methods [6,7] can also be used to construct confidence intervals more directly.
In response to questions raised during peer review I also consider whether the data justifies fitting the two-phase model, investigate uncertainties in the "optimal latency" as estimated from the data, extend the analysis to include the covariate "city", and compare results from the 0 -20 mSv and 5 -500 mSv dose ranges.

Methods
An initial analysis of latency is carried out for all cancer sites with at least 100 cases (deaths) in the 0 -20 mSv subcohort. Sites which appear to have a raised dose response are then analysed by the bootstrap and the distribution of LRT is estimated by simulation. A fuller simulation is carried out for the stomach, the site with the largest number of cases in this subcohort. The analysis focuses on the behaviour of the two-phase model (defined below) and its relation to the linear model. It aims to compare confidence intervals obtained by bootstrap and LRT simulation methods and to discover without relying on Wilks Theorem, whether the apparent elevation and non-linearity of the dose response in the 0 -20 mSv subcohort are statistically significant. Further analysis considers variation in latency, the impact of "city", male and female subcohorts, and the 5 -500 mSv dose range. Computations use the statistical freeware R [8], giving a further check on previous results. Possible sources of error in the Japanese data itself and the potential for confounding by other covariates as discussed in [2], are not reconsidered here.
As previously source data for the 1950-1990 mortality cohort LSS12 was obtained from RERF [9] via the CEDR [10], and the 0 -20 mSv subcohort defined by restricting the weighted adjusted colon dose. The subcohort comprises 3011 data cells with 1690391.75 p-y. The control model, linear model, and two-phase model are defined as in [2]. Briefly, the control model is log-linear in 14 indicator variables for 5 year age-at-exposure categories, an indicator for gender, and the numerical variable log mean attained age, each of which is well-defined for the data cells. The control model is specified as λ = λ 0 = exp(α+Σβ j x j ) where α and the β j are unknown parameters, x j are the covariates, exp denotes exponential, and the Poisson parameter in cell i with T i person-years is λ i T i where λ i is evaluated using the covariates in cell i. At this stage, as in [2], "city" (Hiroshima or Nagasaki) is not included as a covariate.
Other models are defined by λ = λ 0 (1+ERR) where ERR depends only on D , the radiation dose lagged by a latency parameter . D is defined as the weighted adjusted colon dose in 10 mSv units when time-sinceexposure ≥ , and 0 otherwise.
The models require 1+ERR > 0 in all cells so that the Poisson distributions are defined. For the two-phase model τ > 0, slightly modifying the previous approach where τ ≥ 0. However, τ = 0 reduces to the linear model as does σ = 0. In fitting a model at latency by Poisson regression, the unknown parameters α,β j and β, σ, τ (as appropriate to the model) are chosen to maximise the likelihood of the observed data assuming independent Poisson distributions in each cell. Equivalently, fitting minimises the sum over the 3011 data cells of E i -O i ln(E i ), where O i and E i = λ i T i are the observed and expected number of cases in cell i.
Latency is fixed during the minimisation. The models depend smoothly on all other parameters. ERR D, denotes the Excess Relative Risk evaluated at D for the fitted model with latency . Note ERR 1, is the estimated risk at 10 mSv lagged by years. For model I nested within model J, LRT J-I, is the Likelihood Ratio Test computed at latency .
Code written in R emulates the previous model fitting with Excel-Solver, using the Newton-Raphson minimisation routine "optim" [11]. Results for the stomach at latency = 5, 6,... 44 are checked against the values obtained previously.
At a given cancer site, the control, linear, and two-phase models are fitted at latency = 5, 6,... 44 to give LRT J-I, for each pair of nested models: linear against control, two-phase against control, and two-phase against linear. Sites are selected for further analysis if LRT lin-con, > 4 for some , or if LRT 2p-lin, > 6 for some (not assuming that this criterion establishes statistical significance). For sites meeting the criterion for the linear model, Profile Likelihood Confidence Intervals for β = ERR 1, are computed using Wilks Theorem at integer values of . For sites meeting the criterion for the two-phase model, = mabs is chosen to maximise LRT 2p-con, (not LRT 2p-lin, ) so that fitting at mabs gives the absolute Maximum Likelihood Estimate (MLE) for the model. The "optimal latency" m is chosen to maximise LRT 2p-con, subject to the constraint ERR 1, ≥ 0. Bootstrap confidence intervals and LRT simulations are then computed at m . Details are given in the next two more technical subsections, and examples with commentary are supplied with Additional File 1. The method for taking account of uncertainty in the "optimal latency" is outlined in a final subsection.

Bootstrap
Fitting the two-phase model at = m gives a maximum likelihood estimateψ for the vector of all 20 model parameters and determines fitted values of λ i in each data cell. The parametric bootstrap assumes the observed data arose from sampling this fitted model, and considers the variation arising if other data had been sampled from that same model.
Sampling the independent Poisson distributions with parameters λ i T i gives simulated observations in each data cell. Fitting the two-phase model to this simulated data gives simulated parameter values including β , σ and τ , and therefore a simulated value of ERR = ERR D, = β σ τ . The entire process is repeated B times to give the bootstrap replications ERR*.
There are many well-established methods to obtain confidence intervals from bootstrap replications [7,12]. The two options chosen here are the original "percentile" method, which is easy to describe, and the "Biascorrected accelerated" method which is much faster to compute, and in general more accurate.
The 1-2α "percentile" confidence interval has endpoints denotes the Bα th ordered value of ERR*. With B = 1000 replications and α = 0.025, the endpoints of the 0.95 percentile interval are the 25 th and 975 th order statistics of ERR*.
For the Bias-corrected accelerated (BC a ) confidence interval (see [7] Chapter 5 pp. 203-207 and p.249) simulation is again carried out using λ i T i . However, the simulated observations are now used to fit the model to the "Least Favourable family" of distributions, obtained by restricting the parameter vector ψ to a line parametrised by ζ, passing through the MLEψ and defined by ψ ψ ζδ = +ˆw hereˆ(ˆ) Here i -1 (ψ ) is the inverse of the Fisher Information evaluated atψ and h( ) ψ is the gradient of the function h which defines the variable of interest (in this case ERR) in terms of the model parameters ψ.
For each bootstrap replication, fitting the simulated data along the LF family determines ζ and thus the simulated value ERR = h( ) ψ ζδ + . If ERR* LF denotes the resulting set of B replications, the BC a confidence interval has endpoints Here Φ is the standard normal cumulative distribution and z (α) is the 100αth percentile point of a standard normal distribution. The parameters w and a are estimated from the bootstrap replicates. The biascorrection w is computed as i.e. the fraction of bootstrap replications along the LF family which are below the original fitted value of ERR, in normal units.
The acceleration a is computed as where m 3 and m 2 are the 3 rd and 2 nd moments of LF * ( ) 0 , the partial derivative of the log likelihood of the simulated data with respect to ζ, evaluated at ζ = 0.
Code for this calculation is shown in Additional File 1, Code File AC1.txt, which calls Additional File 1, Data Files stomdat1.txt, lambda2.txt and theta2.txt. Additional File 1, Commentary File ACom1.doc discusses this code.
Despite the intricate formulas, computation is much faster for BC a than for the simpler percentile method, because optimisation is now restricted to a line within the full 20-dimensional parameter space. The percentile CIs are computed at B = 1000, the recommended minimum number, while BC a results are obtained at B = 5000.
Coverage for the percentile and BC a methods is tested using the stomach data. For each of 400 sets of simulated observations generated by sampling the Poisson distributions determined by the original fitted parameters, the model is refitted to the simulated data. Confidence intervals for the simulated ERR are obtained by the bootstrap methods (B = 1000) using the refitted parameters and counted as 1 or 0 if they contain or exclude the "true" value of ERR from the original fitted parameters. To streamline these computations of coverage for the percentile method, data is restricted to the Hiroshima subcohort (N = 1536 cells) and the model simplified to use 10-year rather than 5-year age-atexposure categories. Estimating coverage for the BC a method uses the full 0 -20 mSv dataset and model. The application of both methods to compute CIs from the observed data uses the full dataset and model.

Simulation of LRT
The Likelihood Ratio Test (LRT) for comparing nested models H 0 ≤ H 1 (H 0 contained within H 1 ) given the observed data D is LRT = 2[L(H 1 , D ) -L(H 0 , D )] where L(H i , D ) is the log likelihood of the observed data after fitting the model H i . In well-behaved situations where Wilks theorem [5] applies, the asymptotic distribution of LRT is known, provided the data was drawn from a distribution which satisfies the null hypothesis H 0 . In particular, an appropriate critical value t 0.95 can be chosen for rejecting H 0 if LRT > t 0.95 and this will result in a 5% error rate of rejecting H 0 when it is true. Typically H 0 fixes the values of d parameters in the full model H 1 which allows those parameters to vary freely in an open neighbourhood. In this case Wilks theorem says that LRT is asymptotically χ 2 distributed on d degrees of freedom if the observed data was generated by the model with those d parameters fixed as specified. On that basis, the null hypothesis can be rejected if LRT > t 0.95 = F -1 (0.95) where F is the cumulative of a χ 2 distribution on d degrees of freedom.
Wilks theorem depends on many regularity conditions for the models H 0 and H 1 . If these fail, LRT may not be χ 2 distributed; its distribution may depend on H 0 rather than just on the number of specified parameters, and there may be no simple universal 95% critical value for rejecting H 0 . Simulation aims to estimate the distribution of LRT and then to obtain conservative critical values which are sufficient to reject H 0 . Confidence intervals for functions of the parameters are constructed using these critical values.
One) The model is fitted subject to H 0 . The resulting parameter estimates give fitted valuesλ 0i and thereforê λ 0i T i for each cell, subject to H 0 . Five) Steps Two through Four are repeated 500 (or 1000) times to give the sample LRT*.
An example of the required code is given in Additional File 1, Code File AC2.txt which calls Additional File 1, Data Files stomdat1.txt, lambda1.txt, theta1.txt, and Additional File 1, Code File AC2s.txt. Additional File 1, Commentary File ACom2.doc discusses this code.
While step Four is time-consuming, steps One and Three are quick as the model is log-linear in the remaining parameters.
Step Two is trivial in R. Note that in step Four the possible values of (β, σ, τ) are still constrained by the requirement that 1+ERR > 0 in all cells. For computability, the constraint is set at 1+ERR ≥ 0.001.
To check whether Wilks theorem holds, LRT* is first tested against a χ 2 distribution on 3 d.f. by the Kolmogorov-Smirnov (one-sample) test. LRT* is then tested against gamma distributions with unknown shape and rate parameters. A γ distribution is fitted to LRT* using the "fitdistr" subroutine of R (MASS library) [14]. When testing LRT* against the fitted distribution with shape parameterŝ and rate parameterr , the Kolmogorov statistic D is obtained from the (one sample) K-S test but its p-value is determined by simulation (as in the Lilliefors test). When LRT* has 500 elements, 500 random deviates are taken from the fitted γ distribution with parametersŝ andr to form a sample S, a new gamma distribution Γ is fitted to S, the K-S statistic D* for testing S against Γ (one-sample) is computed, and the process is repeated 5000× to estimate the probability that D* > D when samples are drawn from γ(ŝ ,r ). R code was adapted from [15].
Estimates of a 95% critical value for the null distribution of LRT are then obtained in three ways. For each cancer site, the fitted gamma distributions and the order statistics of the simulated LRT i * are used to estimate a global 95% critical value as ′ t .95 = max(sup(t .95,i ), sup(s .95,i )) where t .95,i is the 95 th percentile of γ i and s .95,i is the 95 th percentile of LRT i *. That is, at each value of (β 0 ,σ 0 ,τ 0 ) tested for a particular site, 95% of the simulated LRT values and the 95 th percentile of the fitted gamma distribution are both below ′ t .95 . An overall conservative estimate t .95 is chosen to exceed the corresponding maximum for all simulations and all cancer sites investigated. At each site, a refined estimate ′′ t .95 = max(sup(t .95,j )) is obtained by considering only the fitted gamma distributions and restricting to simulations at points (β 0 ,σ 0 ,τ 0 ) within N 10 .
The neighbourhood N 10 of (ˆ,ˆ,β σ τ ) is termed "appropriate" if the overall estimate t .95 ≤ 10 so that for (β 0 , 95 . This is not a circular definition; t .95 is computed from the behaviour of simulated LRT at points outside as well as inside N 10 for each site, while for (β 0 ,σ 0 ,τ 0 ) within N 10 and even at (ˆ,ˆ,β σ τ ) simulation may, if N 10 is inappropriate, give LRT* with 95 th percentile > 10.
Profile Likelihood Confidence Intervals for ERR 1, and ERR 0.025, are then constructed using all three estimates for the critical value. With fixed, let U be the set of those (β 0 ,σ 0 ,τ 0 ) which cannot be ruled out with 95% confidence as LRT p H
Since the observed data is regarded as a sample from an underlying distribution, the optimal latency m as inferred from the data is an estimate, whose distribution is again obtained by simulation. The two-phase model is fitted at m and the fitted model is resampled to provide simulated data, from which a new estimate φ m is determined. To simplify, φ m is restricted to integer values 5, 6,... 44. The process is repeated ×200 for the stomach, liver, lung, pancreas and leukaemia. The resulting set of φ m is compared with the range of latencies for which BC a 95% CIs are strictly positive, and likewise with the range for which LRT 2p-lin exceeds the estimated critical values.
City, dose range, and gender The models are refitted with "city" as an added control covariate. For each of the 5 cancers, the optimal latency Φ m (restricted to 5, 6,... 44) is determined along with latency ranges over which BC a 95% and 90% CIs for ERR 1 are strictly positive.
The models including "city" are fitted to the 5 -500 mSv subcohort. A limited analysis of the null distributions of LRT 2p-con and LRT 2p-lin including "city" is carried out on the 0 -20 mSv and 5 -500 mSv dose ranges. For each cancer at its Φ m , H 0 : (β 0 ,σ 0 ,τ 0 ) = (0,0, arbitrary) is used for simulation of LRT 2p-con and H 0 : (β 0 ,σ 0 ,τ 0 ) = (β lin ,0, arbitrary) whereβ lin is the fitted parameter for the linear model, is used for simulation of LRT 2p-lin . Gamma distributions fitted to the simulated LRT* are used to estimate 95% critical values and to assign p-values to the LRT 2p-con and LRT 2p-lin arising from the observed data.
The models including "city" are fitted separately to the male and female subcohorts in the 0 -20 mSv dose range for the stomach, liver, and lung.
The very extensive simulations of LRT were shared across a number of PC's. Table 1 shows the individual cancer sites with over 100 cases (deaths) in the subcohort.
The stomach, liver, lung, pancreas, and leukaemia were selected for further analysis as LRT 2p-lin, > 6 for some , an indication that the dose response may be non-linear. For the pancreas and leukaemia, Figure 3 shows LRT 2p-lin, and LRT 2p-con, while Figure 4 shows ERR 1, =ˆˆe xp(ˆ) β σ τ + − as estimated from fitting the two-phase model at latency . Note thatβ may differ greatly from the value obtained by fitting the linear model at . Related graphs for the stomach, liver, and lung were shown in [2], though their statistical significance is reconsidered below. Table 2 shows the latencies mabs and m which maximise LRT 2p-con, without and with the constraint ERR 1, ≥ 0, the values of LRT 2p-con, and LRT 2p-lin, at m along with the fitted parameters and corresponding point estimates of ERR 1 and ERR 0.025 and their bootstrap    LRT for Two-phase vs Control and Linear models: pancreas and leukaemia. For each cancer and for latency = 5, 6,... 44 years, LRT 2p-con = LRT 2p-con, is the Likelihood Ratio Test for comparing the two-phase and control models at latency , and LRT 2p-lin = LRT 2p-lin, is the Likelihood Ratio Test for comparing the two-phase and linear models at latency . confidence intervals by the BC a (5000 replications) and percentile (1000 replications) methods. For the stomach, liver, and lung, these confirm the point estimates in [2].
Coverage for the BC a method is estimated for the stomach at 94% for ERR 1 (376 of 400) and 93% for ERR 0.025 (372 of 400). Coverage for the percentile method is estimated for the stomach using the 0 -20 mSv Hiroshima subcohort with the simplified two-phase model (10 year age-at-exposure categories) giving 94.3% for ERR 1 (377 of 400) and 93% for ERR 0.025 (372 of 400). All CIs reported in Table 2 use the full model (5 year categories, both cities).
For the stomach with latency = 5, 6,... 21, Figure 5 shows the BC a 95% confidence intervals (B = 5000) for ERR 1 and ERR 0.025 . Each of these CIs is strictly positive. Since the minimum time-since-exposure in the subcohort is 6.08 years, for = 5 or 6 the model involves no correction for latency and ERR is a function of the unmodified colon dose.
More generally, for each site (stomach, liver, lung, pancreas, leukaemia) there is a neighbourhood surrounding m over which BC a 95% CIs for ERR 1 and ERR 0.025 are strictly positive, which was evaluated restricting to integer latencies and using B = 5000 bootstrap replications. For the stomach, both CIs are For cancers of the pancreas (red) and leukaemia (blue) and for latency = 5, 6,... 44 years, ERR 1 is the Excess Relative Risk from 10 mSv (lagged dose) as estimated by fitting the two-phase model at latency .  Additional File 1, Tables S1, S2, and S3 give the simulated null distributions of LRT for the two-phase model with the null hypothesis H 0 : (β, σ, τ) = (β 0 ,σ 0 ,τ 0 ) vs the alternative (β, σ, τ) ≠ (β 0 ,σ 0 ,τ 0 ), for the stomach, liver, lung, pancreas, and leukaemia. Additional File 1, Table S1 shows simulation for each cancer at the fitted values (β 0 ,σ 0 ,τ 0 ) = (ˆ,ˆ,β σ τ ) and for randomly selected points in the neighbourhood N 10 of the fitted values (see Methods). Additional File 1, Table S2 shows simulation for each cancer with σ 0 = 0, where τ 0 is arbitrary. Additional File 1, Table S3 shows additional simulation for the stomach when σ 0 ≠ 0, but (β 0 ,σ 0 ,τ 0 ) is outside the neighbourhood N 10 . For the simulations in S2, Additional File 1, Table S4 shows the results for the linear model and the comparison with the two-phase model. These simulations are carried out at the optimal latency for the specific cancer.
As the control parameters are optimised subject to H 0 before simulating LRT, perhaps other choices of control parameters while retaining H 0 might yield different estimates of the null distribution of LRT subject to H 0 . However, four sets of simulations (×500) for the stomach with (β 0 ,σ 0 ,τ 0 ) = (0, 0, arbitrary) using very different choices of control parameters and total cases expected, yielded fairly similar LRT* and fitted gamma distributions as shown in Table 3. Optimised control parameters for the observed data are shown in the last row. This is evidence that the null distribution of is approximately independent of the parameters used to simulate the data, provided that H 0 holds. In the terminology of [7] (Chapter 4 p. 139)     Table 4 shows Profile Likelihood 95% confidence intervals for ERR 1 and ERR 0.025 at optimal latency m as computed with the various estimated critical values.
Likewise from Additional File 1, Table S4 and using the notation in that file, all simulations of the null distribution of LRT 2p-lin showed qlrt2(0.95) < 8 and qgamma2(0.95) < 8 so T .95 = 8 is a conservative estimate throughout the parameter space, for all cancer sites considered here. For any given site ′ T .95 is the supremum of qlrt2(0.95) and qgamma2(0.95) over simulations for that site. ′′ T .95 is the supremum of qgamma2(0.95) over simulations for that site. Likewise ′′ T .95 and ′′ T .999 are the suprema of qgamma2(0.99) and qgamma2(0.999) over simulations for that site. Table 5 shows these critical values and the observed value of LRT 2p-lin (see Table 2) for each site.  Table 6.
Simulation of the optimal latency φ m as described earlier also has implications for non-linearity. Again restricting to integral latencies, for the stomach LRT 2p-lin > 9. Each of these results shows non-linearity (see Table 5) and for the stomach and liver the evidence is very strong. For the pancreas, LRT 2p-lin > 9.5 when 7 ≤ ≤ 11 but only 156 of 200 simulations give φ m in this range.
A limited analysis by simulation of LRT for the model with "city" was carried out over the 0 -20 mSv and 5 -500 mSv dose ranges, focused at the optimal (integral) latencies on the null distributions of LRT 2p-con and LRT 2p-lin at (β 0 ,σ 0 ,τ 0 ) = (0, 0, arbitrary) and (β 0 ,σ 0 ,τ 0 ) = (β lin , 0, arbitrary) whereβ lin is obtained by fitting the linear model. Results are shown in Table 8 and the fitted gamma distributions are used to assign p-values to the observed LRT in Table 7.
From Table 8 (with "city") and Additional File 1, Table  S4 (without "city"), the criterion LRT 2p-lin > 6 is a test of non-linearity at 90% level. Accordingly, Table 7 also shows the latency ranges surrounding Φ m for which LRT 2p-lin > 6.
Other aspects of the analysis (simulated null distributions of LRT 2p-con and LRT 2p-lin across the parameter space and simulated variation in optimal latency m ) were not repeated with "city".
For the stomach, liver, and lung Table 9 shows the results when the two-phase model with "city" is fitted separately to the male and female 0 -20 mSv subcohorts. For the pancreas and leukaemia, there were too few cases to split the data by gender. Table 10 shows the linear model with "city" fitted to the 0 -20 mSv, 0 -5 mSv, 5 -20 mSv, 0 -0.5 mSv, and 0.5 -20 mSv subcohorts for the liver and lung.

Discussion
This paper develops and partially corrects the approach in [2] and the two should be read in conjunction. I focus initially on the 0 -20 mSv dose range, which provides over 60% of the person-years of observation in the Life Span Study 12 cohort. Over the full data range 0 -8000 mSv and the truncated range 0 -4000 mSv the dose response for cancer mortality is known to be approximately linear and risk estimates for 1000 mSv have been obtained through the ongoing Life Span Study project. The ICRP recommendations for radiation protection are then based on linear extrapolation, as the Excess Relative Risk from 10 mSv is presumed to be 0.01 times the ERR from 1000 mSv.
If we were confident that the approximate linearity of the dose response extends to the low dose region, then linear extrapolation would be justified. But this is not known, and the response may be approximately linear over a  There are many examples of non-linear dose response in radiation cell biology [16,17].
Thus, if the Japanese data is used to derive risk estimates for doses such as 10 mSv or 1mSv which are directly relevant to occupational and public exposure, it makes sense to consider the 0 -20 mSv data in its own right and in relation to wider dose ranges, rather than to pool the data and apply linear extrapolation.
Secondly, a latency parameter is used to assess the delay between exposure and cancer mortality. Although it would be better to allow latency to modify dose by some smooth function rather than an abrupt switch from no effect to full effect when Time-Since-Exposure passes , this is much harder to compute and was not attempted here.
As pointed out during peer review, we should consider at the outset whether the data is sufficient for fitting a nonlinear model based on mean dose. Figure 6 shows scatterplots of p-y observation against mean dose, for the ranges 0 -500 mSv, 5 -20 mSv, 0 -5 mSv, and 0 -0.5 mSv. Clearly the 0 -500 mSv and 5 -500 mSv data contains a wide variety of doses. The 5 -20 mSv data shows a Dose range: data on which the two-phase model is fitted, including "city" as a control covariate Φ m : latency, restricted to integers 5, 6,... 44, which maximises LRT 2p-con subject to ERR 1 ≥ 0 LRT 2p-con : Likelihood Ratio Test comparison of two-phase and control models at Φ m. p values derived from Table 8 LRT 2p-lin : Likelihood Ratio Test comparison of two-phase and linear models at Φ m. p values derived from spread of mean doses. Whilst the 0 -5 mSv data is concentrated below 1 mSv, there is a spread of mean doses in that region. Each dose range determines a very large dataset.
The fact that for the stomach, liver, lung, and leukaemia rather similar results for ERR 1 and latency are obtained from fitting the two-phase model across the 5 -500 mSv dose range as from fitting across the 0 -20 mSv dose range suggests that there is also sufficient information in the 0 -20 mSv data to justify fitting this model. The stomach, with the largest number of cases, shows fairly tight confidence intervals for Excess Relative Risk over a broad range of latencies, which supports the validity of the model.
If the dose response were linear, similar estimates of its slope (β) should be obtained over different dose ranges. As the liver and lung data illustrate (Table 10), fitting the linear model separately to subcohorts of the 0 -20 mSv dose range yields very different estimates of the dose response. For the liver, there are significant results at comparable latencies in the 0 -20 mSv dose range and all the subranges, but the estimates of β from doses below 5 mSv (or below 0.5 mSv) are completely different from those obtained from doses above 5 mSv (or 0.5 mSv), in fact around 20 times higher when latency = 36 years. For the lung, the linear model detects some significant activity in restricted dose ranges, but the results are very different, and combining the two dose ranges yields insignificant results. By contrast, see Table 7, the two-phase model 14 gamma distributions are fitted to 500 simulations of LRT 2p-con for the null hypothesis (β 0 ,σ 0 ,τ 0 ) = (0,0, arbitrary) and of LRT 2p-lin for the null hypothesis (β 0 ,σ 0 ,τ 0 ) = (β lin ,0, arbitrary) whereβ lin is the fitted parameter for the linear model. Models include "city" as a control covariate and are fitted at the optimal (integral) latency Φ m for the site and dose range. shape: shape parameter of the fitted gamma distribution rate: rate parameter of the fitted gamma distribution t 0.95 : critical value of the fitted gamma distribution for LRT p-con 2 * T 0.95 : critical value of the fitted gamma distribution for LRT p-lin 2 * shows significant results for the lung with the 0 -20 mSv data for latencies from 10 to 21 years.
It is not unusual for a model to achieve significance on portions of the data but fail to do so when they are combined. Even univariate Ordinary Least Squares regression on a bimodal independent variable may illustrate this, as in Figure 7 with synthetic data.
As discussed in [2] the graphs of ERR against latency show a clear separation into positive and negative regions, as is also apparent with the linear model over restricted dose ranges for the liver and lung. For any given model, the "optimal latency" m identifies the time lag which maximises LRT within the region of nonnegative ERR 1 . The existence of a significant raised risk is of interest whether or not other latencies show significant decrease in risk, with possibly higher LRT. Frequently m coincides with mabs which maximises LRT without constraint, but the distinction is relevant for the lung (Table 2) and more generally when simulating the variation in latency.   y  1690392  1152356  538039  1141938  548454  1690392  1152356  538039 1141938 548454  The point estimates of optimal latency m and ERR obtained previously are confirmed by Table 2. The next stage of analysis concerns confidence intervals for ERR at the optimal latency. Statistical inference in [2] was based on the Likelihood Ratio Test but it was wrongly assumed, as Little [3] pointed out, that Wilks theorem held for the non-linear models so that LRT would be asymptotically χ 2 distributed, if the null hypothesis (specifying parameters in the model) were true.
One of the regularity conditions required for Wilks Theorem is that distinct values of the model parameters produce distinct probability distributions. In the twophase model when σ = 0, ERR = βD irrespective of the value of τ. The asymptotic properties of models with an indeterminate parameter (which cannot be identified from the distribution) have been analysed [18][19][20][21] but I was unable to use these methods and therefore took an intensive computational approach, which applies to the given set of records rather than asymptotically.
The parametric bootstrap is an established technique for finding confidence intervals in the absence of any information on the distribution of LRT. The percentile method refits the model to simulated data generated by samples from the model whose parameter values were obtained by fitting to the actual data. The BC a method adjusts for the fact that simulation may produce biased estimates of the statistic of interest (in this case ERR) and of its variance. Both methods gave good coverage with almost identical rates when tested on the stomach data, Figure 6 p-y observation against mean dose. For the subcohorts 0-500 mSv, 5-20 mSv, 0-5 mSv, and 0-0.5 mSv, each data cell in the subcohort is represented by a vertical line at its mean dose, with height the p-y observation in the data cell. The total p-y observation for each subcohort is also shown.

Figure 7
Ordinary Least Squares fitting to (synthetic) univariate data with a bimodal independent variable. Regression of y on x is significantly positive over portions of data (black and blue lines) but lacks significance over the full range (red line).  Table S4 shows that the null distribution of LRT 2p-lin departs from χ 2 on 2 d.f. everywhere. Table 3 indicates that the simulated null distribution of LRT p H 2 0 − is fairly independent of the parameters for the control covariates, and depends only on H 0 : (β, σ, τ) = (β 0 ,σ 0 ,τ 0 ). The null distributions when σ 0 = 0 are quite different from those when σ 0 >> 0, as expected since it is σ 0 = 0 which causes the indeterminacy of the model and the failure of Wilks Theorem on this hyperplane. Gamma distributions give a reasonable fit to the simulated null distribution of LRT p H 2 0 − in general (see Additional File 1, Tables S1, S2 and S3). When H 0 has σ 0 = 0, typical shape and rate parameters are s~2.24 and r~0.63, while for σ 0 >> 0 s~1.5 and r~0.5 as predicted by Wilks Theorem. Figure 8 The Kolmogorov-Smirnov test against the fitted distribution is extremely sensitive and even when it distinguishes LRT from the fitted gamma, a probability plot may show close agreement. For example for the liver, with (β 0 ,σ 0 ,τ 0 ) = (7.297, 0, arbitrary), the K-S test has D = 0.058 with simulated p = 0.001 but the plot (Figure 9) is fairly close.
Likewise, fitted gamma distributions generally give a reasonable fit to the null distribution of LRT 2p-lin with typical parameters s~1.79, r~0.68. For the linear model, Additional File 1, Table S4 shows simulation of the null distribution of LRT lin H − 0 generally conforms to a χ 2 distribution on 1 d.f. as expected from Wilks theorem. However as β 0 -0.5, the null distribution approaches a γ distribution with  http://www.ehjournal.net/content/8/1/56 shape = 0.5, rate = 1. Since β 0 = -0.5 specifies a boundary of the parameter space, the theory of mixture models [22] predicts, at the boundary, the null distribution of LRT lin H − 0~1 /2 χ 2 (1 d.f.) = 1/2 γ (0.5,0.5) = γ (0.5,1). The confidence intervals shown in Figure 2 for liver and urinary cancers involve β values far from the boundary and Wilks theorem is valid in this region.
For all simulations and all cancer sites (Additional File 1, Tables S1, S2 and S3) the 95 th percentile of the fitted gamma distributions and the 95 th percentile of the simulated LRT p H 2 0 − itself are all below 10. Additional File 1, Tables S1, S2, and S3 summarise nearly 50,000 simulations of LRT at a wide variety of randomly chosen values of H 0 . On that basis, t .95 = 10 is chosen as a conservative estimate applying throughout the parameter space for all cancer sites considered here. H 0 can be rejected whenever LRT p H 2 0 − > 10, i.e. whenever (β 0 , σ 0 ,τ 0 ) is outside the neighbourhood N 10 of the fitted parameters (ˆ,ˆ,β σ τ ). N 10 is thus an appropriate neighbourhood, as defined in Methods.
Inside this neighbourhood Additional File 1, Table S1 shows gamma distributions generally fit the simulated For the stomach, liver, and lung, confidence intervals for ERR obtained by the bootstrap (Table 2) (Table 4) are broadly similar, whichever critical value or bootstrap method is chosen. LRT intervals using ′′ t .95 show closer agreement with the bootstrap methods. For these 3 cancers, all the LRTbased intervals for ERR 1 and ERR 0.025 exclude 0 as do the bootstrap intervals, which are tighter. This validates the claim in [2] that ERR 1 is significantly elevated for the stomach, liver, and lung.
All five cancers show significant non-linearity of the dose response by LRT comparison of the two-phase and linear models (Table 5). For the stomach, liver, and leukaemia the two-phase model is preferable with p << 0.001, while for the lung and pancreas p < 0.01.
The previous discussion concerns behaviour at the optimal latency m , an unknown parameter estimated from the data. BC a intervals, comparable at m to CIs obtained by other methods, were used to investigate ERR across a range of latencies, restricted to integers 5, 6,... 44. Each cancer site has latency ranges in which BC a CIs for ERR 1 and ERR 0.025 are strictly positive. Figure 5 illustrates this for the stomach and also shows that ERR is significantly raised when the dose is not modified by latency ( = 6 is below the minimum time-sinceexposure in the 0 -20 mSv cohort). With the linear model, liver and urinary cancers are significantly elevated over a range of long latencies, showing similar variation (Figures 1 and 2).
Fitting the two-phase model at m , simulating new data and finding the simulated optimal latency φ m gives an estimate of the probability that the true value of m lies in any given region. For the liver and stomach, this approach gives very strong evidence that m lies in the regions where ERR 1 and ERR 0.025 are strictly positive, and where the two-phase model is clearly superior (by LRT) to the linear model. For the lung, the evidence on these points is still valid, though weaker. For leukaemia, there is strong evidence that m lies in the region where the dose response is non-linear, and likewise in the region where the 95% CI for ERR 0.025 is strictly positive. However, with the variation in m we cannot be confident that the 95% CI for ERR 1 is strictly positive.
Many results for the pancreas are considerably weaker than for other sites. Even at the optimal latency, the confidence intervals are extremely wide. With the BC a method, the lower limit is attained as the minimum bootstrap value, and is assigned a probability > 0.025. The "bias" and "acceleration" of the BC a bootstrap sample are both large, unlike those for other sites. When the uncertainty in the optimal latency is determined by simulation at m we cannot be confident that the true optimal latency for the pancreas lies within the range where ERR > 0 or where non-linearity is detected by LRT 2p-lin .
Up to this point, the primary aim was to test the validity of the conclusions in [2], and accordingly "city" was not included as a control covariate and the dose range remained at 0 -20 mSv. A limited reanalysis including "city" in the baseline model and comparing the 0 -20 mSv and 5 -500 mSv dose ranges is shown in Tables 7 and 8. The higher dose range was chosen to include 10 mSv, but to exclude the lowest dose category in the LSS12 dataset. Above 5 mSv the revised dosimetry DS02 generally agrees with the DS86 dosimetry used here (and in [2]). Comparing these ranges tests whether the results from 0 -20 mSv simply reflect some anomaly in the lowest dose category. With the exception of the pancreas, there is a striking similarity between results from the 0 -20 mSv and 5 -500 mSv dose ranges, which show comparable estimates of the optimal latency, latency ranges, and the Excess Relative Risk from 10 mSv. I do not know if any other approach to baseline risk may explain these effects. However, significant non-linearity and positive ERR are also found for the stomach, liver and lung when the data is restricted to male or female subcohorts, with "city" included in the model (Table 9). Any possible interaction effects involving gender would disappear with the all-male or all-female data. As mentioned in [2] the use of log mean attained age was an adequate alternative to the full set of attained age categories, and alternative controls affected the estimates of ERR by no more than a factor of 2.
The results at 0.25 mSv (ERR 0.025 ) show that the risk detected here arises at extremely low doses. If there are doubts about the accuracy of the DS86 dosimetry, the question remains as to why the DS86 dose is a significant predictor of risk.
Overall, the results have several possible interpretations. They may reflect confounding by other covariates not shown in the public dataset, which may be correlated with the external dose used here (and in LSS12) and may also interact with radiation and may vary over time.
Differences between the rural and urban populations may contribute to the non-linear effects. Or, these may be caused by gross underestimates of the external dose (unlikely) or by internal doses arising, for example, from "black rain". Such interpretations would be specific to the Japanese cohort, and would therefore cast doubt on using this data to predict radiation response elsewhere.
However, the results may also indicate that low doses of radiation do impact on cancer mortality, provided that latency is included in the model and the dose response is not constrained to be linear. In radiation cell biology, non-linear dose response is known for the bystander effect using a variety of endpoints and the two-phase model was proposed as a simplified form of a model for the bystander effect [16].
This paper does not reconsider any of these possible interpretations [2]. However, I hope it establishes the statistical validity of this analysis of the available public data for the 1950-1990 mortality cohort.

Conclusion
This reanalysis validates the main conclusions of [2]. Bootstrap methods and Monte Carlo simulation of LRT show that Excess Relative Risk for cancer mortality in Japanese A-bomb survivors exposed to external radiation doses below 20 mSv is positive, large, and significant for various cancers, as detected by models incorporating latency. The dose response is highly non-linear for the stomach, liver, lung, pancreas, and leukaemia. In each case the two-phase model shows large Excess Relative Risk at 10 mSv external dose lagged by the optimal latency for the cancer. LRT-based 95% Confidence Intervals are strictly positive, except for leukaemia. Bootstrap BC a 95% CIs are strictly positive for all five cancers over a range of latencies. Large, positive Excess Relative Risk is also found when the male and female data is analysed separately for the stomach, liver and lung.
When the optimal latency varies by simulation, the stomach, liver, lung and leukaemia still show non-linear dose response, and likewise ERR > 0 at 95% level for the stomach, liver, lung.
With "city" included as a control covariate in the twophase model, similar estimates of latency and ERR at 10 mSv are obtained for the stomach, liver, lung, and leukaemia whether the dose range is 0 -20 mSv or 5 -500 mSv. Dose response for the liver, lung, and leukaemia is significantly non-linear in the 5 -500 mSv range, particularly for the lung. Such results cannot be explained by any anomaly of the 0 -5 mSv data.
The linear model finds significant results for the liver and urinary tract over a range of long latencies.
This analysis of cancer mortality in Japanese A-bomb survivors exposed to low doses of external radiation in 1945 shows significant non-linearity of dose response and significant large Excess Relative Risk over a range of latencies. These findings do not support the current ICRP recommended annual occupational dose limit of 20 mSv.