A new approach of nonparametric estimation of incidence and lifetime risk based on birth rates and incident events

Henrik Støvring; Mei-Cheng Wang

doi:10.1186/1471-2288-7-53

A new approach of nonparametric estimation of incidence and lifetime risk based on birth rates and incident events

Støvring, Henrik; Wang, Mei-Cheng 2007-12-20 00:00:00 Background: Incidence and lifetime risk of diabetes are important public health measures. Traditionally, nonparametric estimates are obtained from survey data by means of a Nelson-Aalen estimator which requires data information on both incident events and risk sets from the entire cohort. Such data information is rarely available in real studies. Methods: We compare two different approaches for obtaining nonparametric estimates of age- specific incidence and lifetime risk with emphasis on required assumptions. The first and novel approach only considers incident cases occurring within a fixed time window–we have termed this cohort-of-cases data–which is linked explicitly to the birth process in the past. The second approach is the usual Nelson-Aalen estimate which requires knowledge on observed time at risk for the entire cohort and their incident events. Both approaches are used on data on anti-diabetic medications obtained from Odense Pharmacoepidemiological Database, which covers a population of approximately 470,000 over the period 1993–2003. For both methods we investigate if and how incidence rates can be projected. Results: Both the new and standard method yield similar sigmoidal shaped estimates of the cumulative distribution function of age-specific incidence. The Nelson-Aalen estimator gives somewhat higher estimates of lifetime risk (15.65% (15.14%; 16.16%) for females, and 17.91% (17.38%; 18.44%) for males) than the estimate based on cohort-of-cases data (13.77% (13.74%; 13.81%) for females, 15.61% (15.58%; 15.65%) for males). Accordingly the projected incidence rates are higher based on the Nelson-Aalen estimate–also too high when compared to observed rates. In contrast, the cohort-of-cases approach gives projections that fit observed rates better. Conclusion: The developed methodology for analysis of cohort-of-cases data has potential to become a cost-effective alternative to a traditional survey based study of incidence. To allow more general use of the methodology, more research is needed on how to relax stationarity assumptions. From a public health perspective it is vital to get good esti- Background Diabetes is a severe disease, which is becoming increas- mates of the present and future burden of diabetes. One ingly prevalent in countries throughout the world [1-6]. measure of primary interest is diabetes incidence, both Page 1 of 11 (page number not for citation purposes) BMC Medical Research Methodology 2007, 7:53 http://www.biomedcentral.com/1471-2288/7/53 with respect to calendar time and age [7]. If combined whereas the failure time in cohort-of-cases is time from with a model for mortality, it allows estimating lifetime initiating event to failure event. risk of diabetes, another important public health measure [8]. Also, if combined with data on birth rates, it is possi- To illustrate how the model can be applied we will use ble to obtain a projection of future incidence, often data from Odense Pharmaco-Epidemiological Database needed for planning of health care services. (OPED). Briefly, this database contains information on all redemptions of medications prescribed by a physician As the annual risk of developing diabetes is low in a gen- and subsidized by the national health insurance at any eral population, only very few follow-up studies exist on a pharmacy within in a well defined geographical area hold- general population level. Alternatively, various types of ing nearly 500,000 inhabitants. The drug class of interest surveys have been conducted [8,9], which have then been here is that used to treat diabetes. While such data by def- analyzed to estimate age-specific incidence rates. Obvi- inition only concern pharmacologically treated diabetes, ously, subjects of different ages in a survey originate from they do offer the opportunity for comparing the proposed different birth cohorts, but this has received little atten- approach with the traditional approach–the main pur- tion in this context. As a consequence, the life-time risk pose of the present paper. estimated from such approaches pertains to a hypotheti- cal cohort subjected to the current age-specific incidence The paper is organized as follows. We first describe the and mortality rates. Likewise, future incidence is predicted data, both on births and incident events. We then intro- from assuming birth cohorts of a given size and then sub- duce a methodology which yields a non-parametric max- ject these to the same age-specific incidence and mortality imum likelihood estimate of the age-specific incidence rates observed in the survey. distribution based solely on cohort-of-cases data, possibly supplemented with a known birth rate. The non-paramet- In this paper we propose a different approach which from ric method does not directly provide measures of the the outset links past birth rates to the occurrence of inci- uncertainty of the estimate, and so we propose a bootstrap dent events in a (often relatively short) time window. We method for obtaining measures of this uncertainty. We will term this type of data cohort-of-cases data as it is a then briefly outline the traditional analysis, before we cohort consisting entirely of cases. More specifically, we present and compare results when applying the two meth- require the sample to include all subjects who have ods to the data. We finally discuss implications in the last advanced to a certain end-point (failure event) within a Section. given calendar time period–and only these cases. Further, we assume that the time origin (initiating event, birth Methods time) of each case can be retrospectively identified. So far, Cohort-of-cases data on anti-diabetic treatment statistical methods for this type of doubly truncated data For the period 1992–2003 the Odense Pharmaco-epide- have not (to the extent of the authors' knowledge) been miological Database (OPED) contains subject specific extensively studied, when the rate of initiating events is information on all prescriptions for subsidized medica- not assumed constant over calendar time. tions redeemed at any pharmacy in the County of Fyn, as well as information on births, deaths and migration into It should be noted that cohort-of-cases data are different and out of the County of Fyn. The tracking of individuals than case-cohort data (see for example [10], where the is based on the Civil Registration Number (CRN) which is phrase case-cohort was coined) as the latter refers to a assigned to all at birth or first immigration into Denmark, study comparing cases to a random sample from the cor- and which uniquely identifies all residents of Denmark. responding cohort. In contrast, the cohort-of-cases data For each individual we identified all prescriptions of anti- studied here comprises a study population consisting only diabetic agents in OPED. The anti-diabetic drugs are char- of cases, but possibly supplemented with additional infor- acterized by the first three characters of the so-called ATC- mation on the process of initiating events. Cohort-of- code being A10 [16]. We will not distinguish between the cases designs–in this sense–are generally considered effi- various types of anti-diabetic treatments, such as for cient, in particular for diseases with a low rate of occur- example insulin (A10A) and oral anti-diabetics (A10B). rence; see [11-15], and references therein. We also want to Incident events are defined to be the first treatment event point out that cohort-of-cases data provide information observed in the time window for subjects who did not different from the information of the cases in the case- have any previous events during a one year run-in period. cohort studies, although the two types of data do share The run-in period was either started at the start of the data- common characteristics. As pointed out in ([10], p4), the base or at the time of first immigration into Fyn of the failure time in case-cohort studies is usually defined as subject, if the subject immigrated into Fyn during the time from the beginning of follow-up to a failure event, observation period. Note, that this may well introduce a Page 2 of 11 (page number not for citation purposes) BMC Medical Research Methodology 2007, 7:53 http://www.biomedcentral.com/1471-2288/7/53 calendar-time-dependent misclassification and hence bias Overall, Fyn hold 9%–10% of the Danish population dur- [17], but this will be ignored in the following as we are not ing most of the twentieth century and the fit seems very studying secular trends in incidence. Also note, that by good. The sudden drop in 1920 is due to the reunion of definition, these data will only allow us to study incidence North Slesvig with Denmark after having been part of Ger- of pharmacologically treated diabetes. We will use the many from 1864. words "treated" and "diseased" interchangeably, and ask the reader to keep in mind that the present analysis only In subsequent analyses the missing proportions were pertains to pharmacologically treated diabetes. replaced with the predicted, while the observed propor- tions were retained. When we combined this with the Birth rates national birth rates, we could compute the number of For the period 1891–2003, available data from Statistics births in the county of Fyn as the product of the number Denmark were used to determine annual, national birth of births in Denmark and the proportion of the Danish counts for each gender. To estimate the number of births population living in the County of Fyn. Since no observa- within the county of Fyn, data was obtained on popula- tions were available for the ten year period 1891–1900, tion size for Denmark as a whole, as well as for Fyn with we predicted the annual number of births in this period the objective of rescaling. Population counts were availa- from a linear extrapolation of the birth counts in the ble roughly at five year intervals (1901, 1906, ..., 1921, period 1901–1910. The resulting gender specific annual 1925, 1930, ..., 1970, 1976, 1981, 1986, 1990, 1995, birth rates in the County of Fyns are presented in Figure 2. 1998, 1999, ..., 2003) for Fyn, whereas nationwide data was available annually from 1970 and onward, and oth- For the birth rates to be of value, we must assume that erwise similar to those for Fyn given above. Only from migration balances in the sense that immigration and 1970 can all members of a given birth cohort be followed emigration for each birth cohort prior to and within the up individually, and hence we only rely on annual counts observation window is expected to be of similar size. This that are available throughout. is, however, reasonable in the present context as the rela- tive size of the studied population is nearly constant com- To estimate the number of births in the county of Fyn, we pared to the entire Danish population. In all subsequent scaled national birthrates by the relative population size analyses, estimated numbers of births are treated as fixed. in the county of Fyn compared to the total population of Denmark. The underlying assumption is that the fertility Methodological set-up rate on Fyn is similar to national rates, which seems plau- Let us now introduce the notation used in the paper. Let sible given the small size of Denmark and the relatively U be the calendar time of the initiating events (births). Let homogeneous composition of the population. As popula- Y be age at onset if disease occurs before death, and infin- tion counts are not available annually we interpolated the ity in the absence of disease before death. Let the proba- population data based on piecewise linear regression with bility density function (pdf) of Y be f(y|u), and the cut points at 1920, 1970, and 1996, cf. Figure 1. associated cumulative distribution function (cdf) F(y|u). F M 1900 1920 1940 1960 1980 2000 Calendar year 1900 1950 2000 1900 1950 2000 Birth year Proportion of Danes living in Fyn Fitted values Graphs by child gender (F = Female, M = Male) Annua 2003 Figure 2 l number of births in the county of Fyn during 1891– Observed an liv Figure 1 ing in the county of Fyn during 19 d predicted fractions of the Da 00–2003nish population Observed and predicted fractions of the Danish population Annual number of births in the county of Fyn during 1891– living in the county of Fyn during 1900–2003. Page 3 of 11 (page number not for citation purposes) 0 .02 .04 .06 .08 .1 .12 Number of births 0 1000 2000 3000 4000 5000 BMC Medical Research Methodology 2007, 7:53 http://www.biomedcentral.com/1471-2288/7/53 Further, let Z be age at death if Z <Y, that is disease does with associated cumulative distribution function G. In 0 0 not occur before death. If Y > Z , we let Y = ∞, and other- principle could well depend on covariates, but since we = ∞. To avoid ambiguity, we will at times consider either known or constant, we will ignore this. wise we let Z denote F as F . We will in the following assume (U , Y ),..., (U , Y ) to be 1 1 n n Since not all subjects will experience disease prior to independent and identically distributed (iid). Two crucial death, the pdf of Y, f(y|u), is a mixture distribution with assumptions must be considered. First, whether or not we two components: have calendar time stationarity with respect to age of onset, i.e., f(y|u) = (u)f*(y|u) + (1 - (u))I(y = ∞) ∞ ∞ (S1) Age of onset is independent of time of birth, i.e., where (u) is defined as P (Y < ∞ |u), i.e., it is the prob- F(y|u) = F(y). ability of disease occurring before death, I(·) is an indi- cator function, and f*(y|u) is the conditional pdf of Y Secondly, knowledge about the birth process will not be . Note, that since (u) is the given that Y < ∞, i.e., Y ≤ Z available in many applications. Hence we also consider 0 ∞ probability of disease occurring before death for a subject the situation with calendar time stationarity of the birth with birth at u, it is the lifetime risk for subjects with birth process: time u. (S2) Assume that the occurrence of initiating events, Cohort-of-cases data births, started in the distant past and that this birth rate Assume that we observe all ages of onset, Y, occurring has been stabilized. Or, quantitatively, assume that u = within the calendar time observation window [0; ), cf. inf{u: (u) > 0} is small enough so that u ≤ -y , and that g 0 x Figure 3 for a graphical presentation of the sampling is uniform on [-y ; ). scheme. Assume that the occurrence of births follows a Poisson process with intensity (u) for u ≤ , and that y = Stationary incidence, known birth process ((S1) only) sup{y: F* (y|u) < 1} exists and is finite for all u ≤ , i.e., y When only (S1) holds, the joint density of the observed is the maximal observable age at onset before death. We (u, y) can be written as follows: can then normalize the birth intensity (u) to a density g on [-y ; ), ⎡ ⎤ guI () (−≤ y u≤τ −y) pu (,y |−≤ U Y ≤ τ −U) = ⎢ ⎥ Gy () τ −−G(−y) ⎣ 0 ⎦ φ() u ⎡ ⎤ gu () = ∗ + ⎢ ⎥ 0 {( G τ −−− yG )(−y)}f (y)I(y ≤y ) φ() sds ⎢ ⎥ −y ⎢ ∗ ⎥⎥ {(Gs τ −− )G(−s)}f (s)ds ⎣ ⎦ Age ≡ p (u|y)p (y) c m where p (u|y) and p (y) are defined by the expressions in c m Z each bracket in (3), respectively. Thus p (u|y) can be inter- preted as the density of birth times conditional on y being observed in [0; ), and p (y) as the marginal density for Y 0 m the observed y weighted with w = G( -y) - G(-y), i.e., the i 0 probability of birth occurring within the interval [-y; -y). When g is known, then so is p , as are the weights in p . It c m is thus straightforward to compute the maximum likeli- hood estimate of F* based on the weighted observations: −1 iy : ≤ y ∗ i Fy () = Calendar time −1 ∑ w i =1 i Lex Figure 3 is diagram with observation window (gray area) Lexis diagram with observation window (gray area). Dotted −− 11 ww / lines indicate lifetime without disease until age of onset (Y), jj ∑ The estimate thus places mass at each jump or age at death (Z ), full lines lifetime with disease. Only age point j, where j corresponds to the observation number in at onset times within the observation window are observed (blue points) in a cohort-of-cases study. the ordered set of Y . That the estimate in (5) is the non- Page 4 of 11 (page number not for citation purposes) BMC Medical Research Methodology 2007, 7:53 http://www.biomedcentral.com/1471-2288/7/53 parametric maximum likelihood estimate (NPMLE) fol- can use Equation (1) to compute an estimate of F, the lows directly from standard results on NPMLE, see for unconditional cdf of age at onset. example the paper by Turnbull [18], who covers the gen- Stationary incidence, stationary birth process (both (S1) and (S2)) eral case of which this is a special case. If all weights are When both (S1) and (S2) hold, the marginal density of equal, the above formula reduces to the ordinary formula the observed y's can be further simplified by substitution for non-parametric estimation of a cdf in the uncensored of the uniform birth density in the corresponding expres- -1 case, putting mass n at each jump point. sion in Equation (3), i.e., With the estimate of the conditional cdf F* it is possible +∗ + ⎡ ⎤ to obtain an estimator of the unconditional F utilizing {/ττ( +yf)} (y)I(y ≤y ) ∗ + py () = ⎢ ⎥=≤ fy ()I(y y y ) their relationship given in Equation (1). What we need is ⎢ ⎥ ττ /( + y ) ⎣ ⎦ an estimate of , which may be obtained from noting that tr the occurrence rate of incident events, I , at any calendar Note, that here the density function of the observed y's time point is given by coincides with the population density function f* of the observable onset times, Y . In the case when only age at tr + onset distribution is of interest, and not lifetime risk, the It ()=− φ(u)f(t u)I(t−u≤y )du −∞ 'usual methods' are thus applicable to the case data to esti- mate f* by putting equal weights on all observations as t noted above. =− πφ() uf (t u)du −∞ If, however, we are also interested in the unconditional density, f(y), we need an estimate of to be able to pro- where the indicator function I(t - u ≤ y ) is needed, since ceed. Above, this was obtained from our knowledge of the the occurrence rate does not include those for which onset birth process, and in principle we could exploit this again. never happens, that is when y = t - u > y or equivalently However, in situations where a stationary birth process is that y = t - u = ∞. Integrating this over the observation win- assumed, this is typically because we lack information on dow, we find the birth process. Thus it may in such situations be neces- sary with alternative approaches. One obvious way to pro- ττ t tr ∗ I () t dt=− πφ(u) f (t u)du dt ceed is the following: In the time window where ∫∫ ∫ 00 −∞ information is collected on incident cases, we also collect information on deaths–either for all or a random sample– τ τ 0 0 ⎧ ⎫ and classify them according to whether or not they had =− πφ() u f (t u)dt du ⎨ ⎬ ∫ ∫ experienced disease. The relative frequency of diseased −∞ max(u,0) ⎩ ⎭ deaths will then be an estimate of under stationarity assumptions with respect to the birth process, the incidence τ τ −u 0 0 ⎧ ⎫ ∗ process, and the mortality. This estimate is valid if age-spe- =πφ()uf (t)dtdu ⎨ ⎬ ∫ ∫ cific mortality is assumed stationary both among diseased −∞ max(0,−u) ⎩ ⎭ and non-diseased–these strong assumptions reflect the lack of available information in such situations. With this esti- ∗∗ mate of we may then estimate the unconditional F. =− πφ(uF ){} (τ u)−F (max(0,−u)) du ∞ 0 −∞ Non-stationary incidence, known birth process (Neither (S1) nor (S2)) from which it follows that When neither (S1) nor (S2) hold, the likelihood becomes substantially more complicated. In principle, this can be ττ ⎡ ⎤ tr ∗∗ handled by introducing a parameter vector which relates πφ=− It ( )dt (u){} F (τ u)−F (max(0,−u)) du ∞ ⎢ 0 ⎥ ∫∫ 0 −∞ ⎣ ⎦ the incidence density to the time of birth. The rewriting presented in Equation (4) is still valid with the modifica- Although the estimate is intuitely attractive it is not clear tion that the density term p (y) now depends on the whether it is the MLE. However, if we fill in the MLE of F* parameter vector , i.e., in Equation (12), we do obtain an estimate of , since is τ ∗ tr {(Gy τθ−− | )G(−y|θ)}f (y|θ) It ()dt 0 py(| θ) = 0 m known and is estimated by the total number y ∗ {(Gs τθ−− | )G(−s|θ)}f (s|θ θ )ds of observed incidences over the interval [0; ). Having obtained the MLE of F* together with an estimate of , we Page 5 of 11 (page number not for citation purposes) BMC Medical Research Methodology 2007, 7:53 http://www.biomedcentral.com/1471-2288/7/53 Unfortunately this density does not directly permit use of the event indicator from onset of disease to death and the approach presented above for finding a non-paramet- maintaining the at-risk time. ric estimate of f*(y| ), nor for finding the corresponding estimate of (). From Λ and an estimate of F is given by (cf. [22] for One alternative is to set up a full likelihood by consider- a theoretical discussion, while [23] gives an example of its ing a full parametric model of both age of onset and age application) of death, but we will not go into further details here and instead commend this as a topic for future research. Fy ( )=− λ (s)exp[ Λ (s)]exp[−A (s)]ds YY z Ordinary non-parametric analysis The ordinary Nelson-Aalen analysis based on observed += Iy() y (1− π ) events and time at risk is well described elsewhere, see where is the hazard associated with Λ , and is the life- Y Y ∞ [19] for an extensive treatment of the subject, or [20] for a time risk given by more focused treatment. In short, we use age as the funda- mental time scale, and we then have delayed entry due to the fact that not all subjects are followed from birth. πλ=− (ss )exp[ Λ ( )]exp[−A (s)]ds ∞ YY z Rather, they enter the observation window and capture area at a certain age and are then followed until either As no analytic confidence intervals are available for the event or censoring, whichever comes first. lifetime risk, we obtained them using bootstrap as above. This can also be applied to obtain age-specific confidence We let subjects become at-risk one year after the start of intervals for F. the observation period if they resided in Fyn County in this period, or one year after entrance to the capture area, Projection of incidence if they immigrated to Fyn during the study period. In both Based on an estimate of F, projection of incidence is pos- cases the one year run-in period is used to identify subjects sible both inside and outside the observation window by not already in treatment (those without filled prescrip- application of the formula in Equation (6), when the tions in the period), as only they are at risk for becoming birth process is known and incidence is assumed station- incident. Subjects cease to be under observation either at ary. In the application studied here, the birth process is onset, death, emigration from Fyn, or end of follow-up, known for u ≤ . For u > it must be projected. Hence, we 0 0 whichever comes first. carry the last observed value of the birth process forward, i.e., let (u) = ( ) for u > . 0 0 As above we require calendar time stationarity for estima- tion of F. The second assumption in this setup is that entry Results and Discussion is independent of disease onset, i.e., age at immigration to Table 1 gives basic descriptive statististics of the studied Fyn County is not informative for the subsequent distri- population, as it shows the number of incidence events bution of Y. The final assumption is that censoring is non- tabulated by gender, birth period and calendar year, informative. The two latter assumptions are similar, but which is used for estimating age-specific incidence. not identical, to the assumption of balance of migration made in the analysis of doubly truncated data. The differ- Cohort-of-cases data ence is, that independent delayed entry and censoring Complete stationarity only concerns the time within the observation period. On Although the birth process is known in our setting, we the other hand, the balancing assumptions does not for comparison present an analysis based on assuming require independence, i.e., migrating subjects may well stationarity for the birth process, the incidence process, have a different morbidity than non-migrating subjects– as well as the mortality process among treated. We first which is indeed the case [21]–as long as the distribution classified all deaths according to whether or not a previ- of onset ages is similar among immigrating and emigrat- ous redemption of anti-diabetics had been observed, ing subjects. considering all with such a redemption to be diabetics. The lifetime risk, , was for females estimated at 9.68% Thus we get a non-parametric estimate of Λ , the cumula- (95% Confidence Interval: 9.35%; 10.02%) and for tive hazard for onset of disease. Similarly, a non-paramet- males at 10.86% (10.51%; 11.22%), where both confi- ric estimate of the cumulative hazard of death among dence intervals are binomial exact. The estimated inci- dence distribution, F, stratified on gender is shown in non-diseased, , can be obtained by simply exchanging Figure 4(a). Page 6 of 11 (page number not for citation purposes) BMC Medical Research Methodology 2007, 7:53 http://www.biomedcentral.com/1471-2288/7/53 Table 1: Number of incidence events by gender, calendar year of event and calendar year of birth Event year Gender Birth199319941995 19961997 199819992000 200120022003 Females -1909 60 24 30 22 7 19 9 7 6 4 4 1910-9 1037974 10181696658 65 5147 1920-9 110 123 99 122 98 112 118 109 106 119 118 1930-9 65 86 78 86 96 82 96 108 111 132 140 1940-9 51 51 55 70 65 64 88 100 115 117 137 1950-941 3132 26 32303848 51 6470 1960-9 18 18 13 20 8 18 28 29 30 34 48 1970-9 17 10 9 6 13 5 9 9 18 19 34 1980-9 4 3 24 5 6 8 7 9 5 11 1990- 4 6 3 3 5 7 10 10 Males -1909 29 19 1887 74 222 1910-994 8068 71 65584245 37 2128 1920-9 126 145 106 118 96 116 99 93 82 123 123 1930-9 107 95 114 132 126 119 131 156 143 126 174 1940-9 104 102 83 102 111 129 140 166 183 191 214 1950-949 5238 52 42776570 77 106 113 1960-916 1919 21 27232829 41 5553 1970-9 12 11 17 8 5 7 10 9 15 14 12 1980-9 8 2 3 28 6 2 7 12 12 12 10 1990- 53 267 895 Number of incidence events by gender, calendar year of event and calender year of birth. Stationarity of incidence, known birth process and associated ages at the events. The resulting estimates When only stationarity of the incidence distribution is of the incidence distribution F are displayed in Figure assumed, a non-parametric analysis based on the 4(b). weighted likelihood given in Equation (4) and the estima- tor of in Equation (12) can be conducted. With the gen- We see that the incidence distribution for both genders are der specific birth rates, we estimated gender specific made up of two components: The first component is a estimates of F*, , and hence F, from the observed events more or less constant density for ages below 40 years (the 0 20 40 60 80 100 0 20 40 60 80 100 age age Females Males Females Males (a) Stationary birth and incidence process (b) Stationary incidence process, known birth process on ge Figure 4 Estimated incidence nder distribution F for pharmacological treatment with any anti-diabetic drug with respect to age, and stratified Estimated incidence distribution F for pharmacological treatment with any anti-diabetic drug with respect to age, and stratified on gender. Page 7 of 11 (page number not for citation purposes) 0 .05 .1 .15 0 .05 .1 .15 BMC Medical Research Methodology 2007, 7:53 http://www.biomedcentral.com/1471-2288/7/53 linear part in F), whereas the second is a much higher, uni- births observed in 2003. Note, that the observed inci- modal density for ages above 40 years which vanishes for dence strongly suggests a departure from stationarity, and ages above 80 (the sigmoid shaped part of F). For females so future actual incidences are likely to be higher than the lifetime risk, , was estimated at 13.77% (13.74%; those projected from a stationarity assumption. The pro- 13.81%), for males at 15.61% (15.58%; 15.65%). Both jected incidences show a small but persistent decline for confidence intervals are computed using bootstrap with a 2004–2013 due to declining birth rates in the last half of thousand replications. The confidence intervals are very the twentieth century. The general level is much higher narrow which reflects the high statistical efficiency of the than above, reflecting the higher estimate of obtained weighted likelihood approach–which in turn partly from using the known birth distribution, but correspond comes from the strong assumption of stationarity. As well with observed incidences. birth counts are assumed known, this too contributes to the narrow confidence intervals, although to a lesser Ideally, projections should be accompanied by confi- degree. dence intervals, but we have been unable to compute them. While in principle some variant of bootstrap might The shape of F is quite similar to the unweighted estimate, be employed, this is numerically very demanding as the whereas the estimated lifetime risks are substantially entire cdf of age-specific incidence must be bootstrapped. higher than those estimated above. The major explana- Judged from the conifdence intervals of the lifetime risks, tion is of course lack of stationarity of the true lifetime risk the confidence intervals of the projections will be very and/or the disease duration: The estimate of based on small, reflecting both high efficiency of the method, as disease status among observed deaths takes most of its well as its strong assumptions. information from the older cohorts as they are the ones with high mortality. If the older cohorts had lower life- Ordinary non-parametric analysis time risk and/or previously had relatively higher mortality The gender specific estimates of F are shown in Figure 6. among diseased compared to non-diseased (both of these The shape of the estimated cdf is very similar to the one scenarios are very realistic, but contrary to assumptions of obtained above using a known birth process for weight- the previous analysis), this will result in a decreased esti- ing. The estimated lifetime risks are 15.65% (15.14; . This would be amplified if older cohorts are mate of 16.16) for females, and 17.91% (17.38; 18.44) for males, larger than younger cohorts, as is indeed the case here, cf. where confidence intervals were found from bootstrap Figure 2. with 1,000 replications. This is somewhat higher than when analyzing data as doubly truncated. The explana- Contrastingly, when indirectly estimating based on tion is that mortality has generally declined substantially weighting with the birth process, the estimate can be over the past century, and hence an estimate based on the viewed as a weighted average of over the entire interval mortality rates observed within the observation window for the birth process [-y+; -y). Projection of diabetes incidence In the completely stationary situation, where (S1) and (S2) are both assumed to hold, the projected annual inci- dence is a constant number equaling the lifetime risk mul- tiplied by the annual number of births. As the annual number of births are usually not observable in such set- tings, an alternative is needed. In the spirit of estimating from the treatment status among deaths, one could take the total annual number of deaths as an estimate of the number of births. The reasoning for this is that if the pop- ulation is in a completely stationary state, the annual 1993 1995 1997 1999 2001 2003 2005 2007 2009 2011 2013 number of deaths must on average equal the average year annual number of births. In our setting the observed Females, predicted Females, observed Males, predicted Males, observed numbers of deaths over the 11 year period are 29,871 for females and 29,816 for males yielding projected, annual incidences of 262.8 for females and 294.3 for males. In Pr the incidenc Figure 5 ojec count ted and observed ann e a y of Fyn ba nd using ased on weighted, non-p an assum ual numbers pa tion of a rametric estimate of of inciden stat tionary events in F Figure 5 the incidence is projected based on the weighted, Projected and observed annual numbers of incident events in non-parametric estimate of F obtained above, i.e., with the county of Fyn based on an assumption of a stationary known birth intensity and stationary incidence. All incidence and using a weighted, non-parametric estimate of F. annual birth counts after 2003 are set to the number of Page 8 of 11 (page number not for citation purposes) 0 200 400 600 800 BMC Medical Research Methodology 2007, 7:53 http://www.biomedcentral.com/1471-2288/7/53 1993 1995 1997 1999 2001 2003 2005 2007 2009 2011 2013 0 20 40 60 80 100 year Age Females, predicted Females, observed Females Males Males, predicted Males, observed m stratifi Figure 6 Ese timated incidenc nt with any ed on gend anti-diabetic dr er e distribution ug with respect F for pharma to age, and cological treat- Pr the county of Fyn mate of Figure 7 ojected and observed ann F with inde bpe ased ndent del on the ual a num ordin yed entry bers ary non-parametric of incident events in esti- Estimated incidence distribution F for pharmacological treat- Projected and observed annual numbers of incident events in ment with any anti-diabetic drug with respect to age, and the county of Fyn based on the ordinary non-parametric esti- stratified on gender. Ordinary non-parametric estimate with mate of F with independent delayed entry. independent delayed entry. leads to a higher risk of diabetes onset prior to death than tively fast computational procedures developed, confi- an estimate which implicitly accounts for past mortality. dence intervals for the lifetime risk could be obtained from direct application of bootstrap methodology. We Projection of diabetes incidence were however unable to provide confidence intervals for Projections are obtained as above–except that the ordi- projection of incidence. nary non-parametric estimate of F is used–and results are shown in Figure 7. Due to the elevated lifetime risk the As stated by Narayan et al. in 2003, lifetime risk of diabe- projected incidences are now higher–also too high com- tes appears not to have been estimated prior to their pared to observed incidences. Also for this projection we paper [8], and only one subsequent paper have reported have been unable to provide confidence intervals for the comparable estimates of lifetime risk [24]. The directly same reasons as above. comparable estimates for the US population found in [8] are substantially higher (39% for females, 33% for Conclusion males) than ours (14% for females, 16% for males). The In this paper we have developed and implemented meth- two major reasons for the difference is a generally lower ods for estimating and projecting incidence, as well as the diabetes incidence in Denmark [4], as well as the fact that lifetime risk of a disease based on observation of incident our estimates only pertain to pharmacologically treated events in an observation window, i.e., what we termed diabetes. It would however be interesting to explore if cohort-of-cases data. The developed methodology yields part of the difference is due to their use of the traditional non-parametric estimates comparable to those of a stand- method, as the traditional method in our material leads ard Nelson-Aalen analysis based on independent delayed to an elevated estimate of lifetime risk of 16% for females entry, but it gives slightly better projections of incidence and 18% for males. It is further interesting that the gen- due to its implicit accounting for the unobserved mortal- der differences are in opposite directions in the two ity among untreated in the past. countries. In its simplest form–i.e., assuming both a stationary birth Several papers have used estimates of incidence to project process and incidence–a simple non-parametric estimate the future burden of diabetes, most prominently [2,5,6]. of the age of onset distribution is obtained. When alterna- For all three, it would be interesting to re-analyze their tively the birth process is considered known, this is taken data using our developed method for cohort-of-cases into account by a weighted, non-parametric estimate with data, if possible, to see if a similar discrepancy exist weights based on the relative sizes of the relevant birth between the two analytical methods as we have found, cohorts. Both approaches directly provide estimates of where the traditional method lead to an inflated projec- age-specific incidence as well as of lifetime risk, which are tion of the number of incident events of diabetes, when of considerable public health interest. Due to the rela- compared to the observed count. Page 9 of 11 (page number not for citation purposes) 0 .05 .1 .15 .2 0 200 400 600 800 BMC Medical Research Methodology 2007, 7:53 http://www.biomedcentral.com/1471-2288/7/53 For the theoretical developments, assumptions (S1) and References 1. Mokdad A, Bowman B, Ford E, Vinicor F, Marks J, Koplan J: The con- (S2) have been crucial, but from an applied perspective tinuing epidemics of obesity and diabetes in the United the assumptions are very restrictive. In our application States. JAMA 2001, 286:1195-200. concerning diabetes, the assumptions are likely not satis- 2. Honeycutt AA, Boyle JP, Broglio KR, Thompson TJ, Hoerger TJ, Geiss LS, Narayan KMV: A dynamic Markov model for forecasting fied, as it is questionable that both age-specific incidence diabetes prevalence in the United States through 2050. and age-specific mortality among diabetics have been Health Care Manag Sci 2003, 6(3):155-164. 3. Wild S, Roglic G, Green A, Sicree R, King H: Global prevalence of constant since 1900–rather, changes in incidence due to diabetes: estimates for the year 2000 and projections for altered lifestyle, and changes in mortality due to improved 2030. Diabetes Care 2004, 27(5):1047-53. treatment and general health are reasonable. Indeed, it is 4. Støvring H, Andersen M, Beck-Nielsen H, Green A, Vach W: Count- ing drugs to understand the disease: The case ofmeasuring known that within the observation window of 1993 and the diabetes epidemic. Popul Health Metr 2007, 5:2. 2003, statistically significant trends exist for both quanti- 5. Narayan KMV, Boyle JP, Geiss LS, Saaddine JB, Thompson TJ: Impact ties [4]. Yet the predictions based on the developed model of recent increase in incidence on future diabetes burden: U.S., 2005–2050. Diabetes Care 2006, 29(9):2114-2116. are at least as good as those based on the ordinary non- 6. Evans JMM, Barnett KN, Ogston SA, Morris AD: Increasing preva- parametric method, showing the potential of the devel- lence of type 2 diabetes in a Scottish population: effect of increasing incidence or decreasing mortality? Diabetologia oped model. More work on relaxing the assumptions is 2007, 50(4):729-732. however mandated before the model can be used more 7. Fox CS, Pencina MJ, Meigs JB, Vasan RS, Levitzky YS, D'Agostino RB: generally. Trends in the incidence of type 2 diabetes mellitus from the 1970s to the 1990s: the Framingham Heart Study. Circulation 2006, 113(25):2914-2918. Although we in principle showed how the stationarity 8. Narayan KMV, Boyle JP, Thompson TJ, Sorensen SW, Williamson DF: assumption could be relaxed by formulating a full, para- Lifetime risk for diabetes mellitus in the United States. JAMA 2003, 290(14):1884-1890. metric likelihood, we did not give a detailed analysis of 9. Harris M, Flegal K, Cowie C, Eberhardt M, Goldstein D, Little R, this situation due to its complexity. Also, the data consid- Wiedmeyer H, Byrd-Holt D: Prevalence of diabetes, impaired fasting glucose, and impaired glucose tolerance in U.S. ered in this paper are rather limited since, first, the obser- adults. The Third National Health and Nutrition Examina- vation window is short compared to typical disease tion Survey, 1988–1994. Diabetes Care 1998, 21(4):518-24. duration, and second, no information is available on age 10. Prentice RL: A Case-Cohort Design for Epidemiologic Cohort Studies and Disease Prevention Trials. Biometrika 1986, of onset outside the observation window. As a result, we 73:1-11. have been unable to allow for trends in incidence and 11. Mantel N: Synthetic retrospective studies and related topics. mortality, the absence of which must be considered unre- Biometrics 1973, 29(3):479-86. 12. Prentice RL, Breslow NE: Retrospective Studies and Failure alistic. In some epidemiological settings it will, however, Time Models. Biometrika 1978, 65(1153-158 [http://links.jstor.org/ be possible to obtain data on age of onset for subjects sici?sici=0006- 3444%28197804%2965%3A1%3C153%3ARSAFTM%3E2.0.CO%3B2- prevalent at start of the time window or for diseased sub- E]. jects dying in the observation window [25]. While such 13. Oakes D: Survival times: aspects of partial likelihood. Internat information is valuable and needs to be incorporated in Statist Rev 1981, 49(3):235-264. With discussion and a reply by the author the analysis to allow relaxation of assumptions, it requires 14. Thomas DC: General Relative-Risk Models for Survival Time knowledge about the past mortality among diabetics. In and Matched Case-Control Analysis. Biometrics 1981, 37(4673-686 [http://links.jstor.org/sici?sici=0006- contrast, we have tried to develop a methodology that 341X%28198112%2937%3A4%3C673%3AGRMFST%3E2.0.CO%3B2 only rely on observation of incident events and past birth -V]. rates, which are often easier to obtain. There is, however, 15. Lubin J, Gail M: Biased selection of controls for case-control analyses of cohort studies. Biometrics 1984, 40:63-75. a need for further research on the applicability and exten- 16. The WHO Collaborating Centre for Drug Statistics Methodology: sions of the method before its potential can be more ATC index with DDDs and Guidelines for ATC classification and DDD clearly appreciated. assignment. Oslo 2001. 17. Støvring H, Andersen M, Beck-Nielsen H, Green A, Vach W: Rising prevalence of diabetes: evidence from a Danish pharmaco- Competing interests epidemiological database. Lancet 2003, 362(9383):537-8. 18. Turnbull BW: The empirical distribution function with arbi- The author(s) declare that they have no competing inter- trarily grouped, censored and truncated data. J R Statist Soc A ests. 1976, 38(3):290-295. 19. Andersen PK, Borgan O, Gill RD, Keiding N: Statistical models based on counting processes New York: Springer-Verlag; 1997. Corrected Authors' contributions second printing HS had the original idea for the study, carried out all anal- 20. Keiding N: Independent delayed entry. In Survival Analysis: State of the Art Edited by: Klein JP, Goel PK. Kluwer Academic Publishers; yses and drafted the original and revised manuscripts with 1992:309-326. substantial input from MCW. The planning of analyses 21. Støvring H: Selection bias due to immigration in pharmacoep- and interpretation of the data was the joint product of dis- idemiologic studies. Pharmacoepidemiol Drug Saf 2007, 16(6):681-686. cussions between MCW and HS. Both authors have seen 22. Keiding N: Age-specific Incidence and Prevalence: a Statistical and approved the final version. Perspective. J R Statist Soc A 1991, 154(3):371-412. 23. Faergemann C, Lauritsen JM, Brink O, Stovring H: What is the life- time risk of contact with an A&E Department or an Institute Page 10 of 11 (page number not for citation purposes) BMC Medical Research Methodology 2007, 7:53 http://www.biomedcentral.com/1471-2288/7/53 of Forensic Medicine following violent victimisation? Injury 2006, 39(1):121-127. 24. Narayan KMV, Boyle JP, Thompson TJ, Gregg EW, Williamson DF: Effect of BMI on lifetime risk for diabetes in the U.S. Diabetes Care 2007, 30(6):1562-1566. 25. Keiding N, Holst C, Green A: Retrospective estimation of diabe- tes incidence from information in a prevalent population and historical mortality. Am J Epidemiol 1989, 130(3):588-600. Pre-publication history The pre-publication history for this paper can be accessed here: http://www.biomedcentral.com/1471-2288/7/53/prepub Publish with Publish with Bio Bio Med Med Central Central and and e ev ver ery y scientist can scientist can r read ead y your our w work ork fr free of ee of charge charge "BioMed Centr "BioMed Central al will will be be the the most most signif significant icant de development velopment f for or disseminating the disseminating the r results esults of of biomedical biomedical r researc esearc h h in in our our lif lifetime etime." ." Sir Paul Nurse, Cancer Research UK Sir Paul Nurse, Cancer Research UK Y Your research papers will be: our research papers will be: a available fr vailable free ee of of charge charge to the to the entir entire e biomedical biomedical comm community unity peer r peer re evie view wed ed and and published published immediatel immediately y upon upon acceptance acceptance cited in cited in PubMed PubMed and and ar archiv chived ed on PubMed on PubMed Central Central y yours — ours — y you ou k keep eep the the cop copyright yright Bio BioMed Medcentral central Submit your manuscript here: Submit your manuscript here: http://www http://www.biomedcentral.com/info/publishing_adv .biomedcentral.com/info/publishing_adv.asp .asp Page 11 of 11 (page number not for citation purposes) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png BMC Medical Research Methodology Springer Journals http://www.deepdyve.com/lp/springer-journals/a-new-approach-of-nonparametric-estimation-of-incidence-and-lifetime-Vvr40HFwZ2

Loading next page...

References (32)

D. Oakes (1981)
Survival Times: Aspects of Partial Likelihood
International Statistical Review, 49
N. Keiding, M. Moeschberger (1992)
Independent Delayed Entry
H. Støvring, M. Andersen, H. Beck-Nielsen, A. Green, W. Vach, E. Gale (2003)
Rising prevalence of diabetes: evidence from a Danish pharmaco-epidemiological database. Commentary
The Lancet, 362
C. Fox, M. Pencina, J. Meigs, R. Vasan, Yamini Levitzky, R. D'Agostino (2006)
Trends in the Incidence of Type 2 Diabetes Mellitus From the 1970s to the 1990s: The Framingham Heart Study
Circulation, 113
K. Narayan, J. Boyle, L. Geiss, J. Saaddine, T. Thompson (2006)
Impact of Recent Increase in Incidence on Future Diabetes Burden
Diabetes Care, 29
(2007)
Effect of BMI on lifetime risk for diabetes in the U.S. Diabetes Care
C. Faergemann, J. Lauritsen, O. Brink, H. Støvring (2008)
What is the lifetime risk of contact with an A&E Department or an Institute of Forensic Medicine following violent victimisation?
Injury, 39 1
Organización Salud (1996)
Guidelines for ATC classification and DDD assignment
Pre-publication history The pre-publication history for this paper can be accessed here: http://www.biomedcentral.com/1471-2288
H. Støvring, M. Andersen, H. Beck-Nielsen, A. Green, W. Vach (2007)
Counting drugs to understand the disease: The case of measuring the diabetes epidemic
Population Health Metrics, 5
H. Støvring, M. Andersen, H. Beck-Nielsen, A. Green, W. Vach (2003)
Rising prevalence of diabetes: evidence from a Danish pharmacoepidemiological database.
The Lancet, 362
Niels Keiding, Claus Holst, Anders Green (1989)
Retrospective estimation of diabetes incidence from information in a prevalent population and historical mortality.
American journal of epidemiology, 130 3
M. Harris, K. Flegal, C. Cowie, M. Eberhardt, D. Goldstein, R. Little, H. Wiedmeyer, D. Byrd-Holt (1998)
Prevalence of Diabetes, Impaired Fasting Glucose, and Impaired Glucose Tolerance in U.S. Adults: The Third National Health and Nutrition Examination Survey, 1988–1994
Diabetes Care, 21
D. Oakes (1981)
[Survival Times: Aspects of Partial Likelihood]: Reply to Discussion
International Statistical Review, 49
N Keiding (1992)
Survival Analysis: State of the Art
S. Wild, G. Roglić, A. Green, R. Sicree, H. King (2004)
Global prevalence of diabetes: estimates for the year 2000 and projections for 2030.
Diabetes care, 27 5
KMV Narayan, JP Boyle, LS Geiss, JB Saaddine, TJ Thompson (2006)
Impact of recent increase in incidence on future diabetes burden: U.S., 2005–2050
Diabetes Care, 29
J. Lubin, M. Gail (1984)
Biased selection of controls for case-control analyses of cohort studies.
Biometrics, 40 1
N. Keiding (1991)
Age‐Specific Incidence and Prevalence: A Statistical Perspective
Journal of The Royal Statistical Society Series A-statistics in Society, 154
KMV Narayan, JP Boyle, TJ Thompson, EW Gregg, DF Williamson (2007)
Effect of BMI on lifetime risk for diabetes in the U.S
Diabetes Care, 30
F. Coolen, Per Andersen, Ørnulf Borgan, Richard Gill, Niels Keiding (1996)
Statistical Models Based on Counting Processes.
The Statistician, 45
A. Honeycutt, J. Boyle, Kristine Broglio, T. Thompson, T. Hoerger, L. Geiss, K. Narayan (2003)
A Dynamic Markov Model for Forecasting Diabetes Prevalence in the United States through 2050
Health Care Management Science, 6
(2001)
ATC index with DDDs and Guidelines for ATC classification and DDD assignment. Oslo
R. Prentice, N. Breslow (1978)
Retrospective studies and failure time models
Biometrika, 65
N. Mantel (1973)
Synthetic retrospective studies and related topics.
Biometrics, 29 3
R. Prentice (1986)
A case-cohort design for epidemiologic cohort studies and disease prevention trials
Biometrika, 73
B. Turnbull (1976)
The Empirical Distribution Function with Arbitrarily Grouped, Censored, and Truncated Data
Journal of the royal statistical society series b-methodological, 38
A. Mokdad, B. Bowman, E. Ford, F. Vinicor, J. Marks, J. Koplan (2001)
The continuing epidemics of obesity and diabetes in the United States.
JAMA, 286 10
J. Evans, K. Barnett, S. Ogston, Andrew Morris (2007)
Increasing prevalence of type 2 diabetes in a Scottish population: effect of increasing incidence or decreasing mortality?
Diabetologia, 50
Duncan Thomas (1981)
General relative-risk models for survival time and matched case-control analysis
Biometrics, 37
K. Narayan, J. Boyle, T. Thompson, Stephen Sorensen, D. Williamson (2003)
Lifetime risk for diabetes mellitus in the United States.
JAMA, 290 14
H. Støvring (2007)
Selection bias due to immigration in pharmacoepidemiologic studies
Pharmacoepidemiology and Drug Safety, 16

Publisher: Springer Journals
Copyright: Copyright © 2007 by Støvring and Wang; licensee BioMed Central Ltd.
Subject: Medicine & Public Health; Theory of Medicine/Bioethics; Statistical Theory and Methods; Statistics for Life Sciences, Medicine, Health Sciences
eISSN: 1471-2288
DOI: 10.1186/1471-2288-7-53
pmid: 18096045
Publisher site: See Article on Publisher Site

Abstract

Background: Incidence and lifetime risk of diabetes are important public health measures. Traditionally, nonparametric estimates are obtained from survey data by means of a Nelson-Aalen estimator which requires data information on both incident events and risk sets from the entire cohort. Such data information is rarely available in real studies. Methods: We compare two different approaches for obtaining nonparametric estimates of age- specific incidence and lifetime risk with emphasis on required assumptions. The first and novel approach only considers incident cases occurring within a fixed time window–we have termed this cohort-of-cases data–which is linked explicitly to the birth process in the past. The second approach is the usual Nelson-Aalen estimate which requires knowledge on observed time at risk for the entire cohort and their incident events. Both approaches are used on data on anti-diabetic medications obtained from Odense Pharmacoepidemiological Database, which covers a population of approximately 470,000 over the period 1993–2003. For both methods we investigate if and how incidence rates can be projected. Results: Both the new and standard method yield similar sigmoidal shaped estimates of the cumulative distribution function of age-specific incidence. The Nelson-Aalen estimator gives somewhat higher estimates of lifetime risk (15.65% (15.14%; 16.16%) for females, and 17.91% (17.38%; 18.44%) for males) than the estimate based on cohort-of-cases data (13.77% (13.74%; 13.81%) for females, 15.61% (15.58%; 15.65%) for males). Accordingly the projected incidence rates are higher based on the Nelson-Aalen estimate–also too high when compared to observed rates. In contrast, the cohort-of-cases approach gives projections that fit observed rates better. Conclusion: The developed methodology for analysis of cohort-of-cases data has potential to become a cost-effective alternative to a traditional survey based study of incidence. To allow more general use of the methodology, more research is needed on how to relax stationarity assumptions. From a public health perspective it is vital to get good esti- Background Diabetes is a severe disease, which is becoming increas- mates of the present and future burden of diabetes. One ingly prevalent in countries throughout the world [1-6]. measure of primary interest is diabetes incidence, both Page 1 of 11 (page number not for citation purposes) BMC Medical Research Methodology 2007, 7:53 http://www.biomedcentral.com/1471-2288/7/53 with respect to calendar time and age [7]. If combined whereas the failure time in cohort-of-cases is time from with a model for mortality, it allows estimating lifetime initiating event to failure event. risk of diabetes, another important public health measure [8]. Also, if combined with data on birth rates, it is possi- To illustrate how the model can be applied we will use ble to obtain a projection of future incidence, often data from Odense Pharmaco-Epidemiological Database needed for planning of health care services. (OPED). Briefly, this database contains information on all redemptions of medications prescribed by a physician As the annual risk of developing diabetes is low in a gen- and subsidized by the national health insurance at any eral population, only very few follow-up studies exist on a pharmacy within in a well defined geographical area hold- general population level. Alternatively, various types of ing nearly 500,000 inhabitants. The drug class of interest surveys have been conducted [8,9], which have then been here is that used to treat diabetes. While such data by def- analyzed to estimate age-specific incidence rates. Obvi- inition only concern pharmacologically treated diabetes, ously, subjects of different ages in a survey originate from they do offer the opportunity for comparing the proposed different birth cohorts, but this has received little atten- approach with the traditional approach–the main pur- tion in this context. As a consequence, the life-time risk pose of the present paper. estimated from such approaches pertains to a hypotheti- cal cohort subjected to the current age-specific incidence The paper is organized as follows. We first describe the and mortality rates. Likewise, future incidence is predicted data, both on births and incident events. We then intro- from assuming birth cohorts of a given size and then sub- duce a methodology which yields a non-parametric max- ject these to the same age-specific incidence and mortality imum likelihood estimate of the age-specific incidence rates observed in the survey. distribution based solely on cohort-of-cases data, possibly supplemented with a known birth rate. The non-paramet- In this paper we propose a different approach which from ric method does not directly provide measures of the the outset links past birth rates to the occurrence of inci- uncertainty of the estimate, and so we propose a bootstrap dent events in a (often relatively short) time window. We method for obtaining measures of this uncertainty. We will term this type of data cohort-of-cases data as it is a then briefly outline the traditional analysis, before we cohort consisting entirely of cases. More specifically, we present and compare results when applying the two meth- require the sample to include all subjects who have ods to the data. We finally discuss implications in the last advanced to a certain end-point (failure event) within a Section. given calendar time period–and only these cases. Further, we assume that the time origin (initiating event, birth Methods time) of each case can be retrospectively identified. So far, Cohort-of-cases data on anti-diabetic treatment statistical methods for this type of doubly truncated data For the period 1992–2003 the Odense Pharmaco-epide- have not (to the extent of the authors' knowledge) been miological Database (OPED) contains subject specific extensively studied, when the rate of initiating events is information on all prescriptions for subsidized medica- not assumed constant over calendar time. tions redeemed at any pharmacy in the County of Fyn, as well as information on births, deaths and migration into It should be noted that cohort-of-cases data are different and out of the County of Fyn. The tracking of individuals than case-cohort data (see for example [10], where the is based on the Civil Registration Number (CRN) which is phrase case-cohort was coined) as the latter refers to a assigned to all at birth or first immigration into Denmark, study comparing cases to a random sample from the cor- and which uniquely identifies all residents of Denmark. responding cohort. In contrast, the cohort-of-cases data For each individual we identified all prescriptions of anti- studied here comprises a study population consisting only diabetic agents in OPED. The anti-diabetic drugs are char- of cases, but possibly supplemented with additional infor- acterized by the first three characters of the so-called ATC- mation on the process of initiating events. Cohort-of- code being A10 [16]. We will not distinguish between the cases designs–in this sense–are generally considered effi- various types of anti-diabetic treatments, such as for cient, in particular for diseases with a low rate of occur- example insulin (A10A) and oral anti-diabetics (A10B). rence; see [11-15], and references therein. We also want to Incident events are defined to be the first treatment event point out that cohort-of-cases data provide information observed in the time window for subjects who did not different from the information of the cases in the case- have any previous events during a one year run-in period. cohort studies, although the two types of data do share The run-in period was either started at the start of the data- common characteristics. As pointed out in ([10], p4), the base or at the time of first immigration into Fyn of the failure time in case-cohort studies is usually defined as subject, if the subject immigrated into Fyn during the time from the beginning of follow-up to a failure event, observation period. Note, that this may well introduce a Page 2 of 11 (page number not for citation purposes) BMC Medical Research Methodology 2007, 7:53 http://www.biomedcentral.com/1471-2288/7/53 calendar-time-dependent misclassification and hence bias Overall, Fyn hold 9%–10% of the Danish population dur- [17], but this will be ignored in the following as we are not ing most of the twentieth century and the fit seems very studying secular trends in incidence. Also note, that by good. The sudden drop in 1920 is due to the reunion of definition, these data will only allow us to study incidence North Slesvig with Denmark after having been part of Ger- of pharmacologically treated diabetes. We will use the many from 1864. words "treated" and "diseased" interchangeably, and ask the reader to keep in mind that the present analysis only In subsequent analyses the missing proportions were pertains to pharmacologically treated diabetes. replaced with the predicted, while the observed propor- tions were retained. When we combined this with the Birth rates national birth rates, we could compute the number of For the period 1891–2003, available data from Statistics births in the county of Fyn as the product of the number Denmark were used to determine annual, national birth of births in Denmark and the proportion of the Danish counts for each gender. To estimate the number of births population living in the County of Fyn. Since no observa- within the county of Fyn, data was obtained on popula- tions were available for the ten year period 1891–1900, tion size for Denmark as a whole, as well as for Fyn with we predicted the annual number of births in this period the objective of rescaling. Population counts were availa- from a linear extrapolation of the birth counts in the ble roughly at five year intervals (1901, 1906, ..., 1921, period 1901–1910. The resulting gender specific annual 1925, 1930, ..., 1970, 1976, 1981, 1986, 1990, 1995, birth rates in the County of Fyns are presented in Figure 2. 1998, 1999, ..., 2003) for Fyn, whereas nationwide data was available annually from 1970 and onward, and oth- For the birth rates to be of value, we must assume that erwise similar to those for Fyn given above. Only from migration balances in the sense that immigration and 1970 can all members of a given birth cohort be followed emigration for each birth cohort prior to and within the up individually, and hence we only rely on annual counts observation window is expected to be of similar size. This that are available throughout. is, however, reasonable in the present context as the rela- tive size of the studied population is nearly constant com- To estimate the number of births in the county of Fyn, we pared to the entire Danish population. In all subsequent scaled national birthrates by the relative population size analyses, estimated numbers of births are treated as fixed. in the county of Fyn compared to the total population of Denmark. The underlying assumption is that the fertility Methodological set-up rate on Fyn is similar to national rates, which seems plau- Let us now introduce the notation used in the paper. Let sible given the small size of Denmark and the relatively U be the calendar time of the initiating events (births). Let homogeneous composition of the population. As popula- Y be age at onset if disease occurs before death, and infin- tion counts are not available annually we interpolated the ity in the absence of disease before death. Let the proba- population data based on piecewise linear regression with bility density function (pdf) of Y be f(y|u), and the cut points at 1920, 1970, and 1996, cf. Figure 1. associated cumulative distribution function (cdf) F(y|u). F M 1900 1920 1940 1960 1980 2000 Calendar year 1900 1950 2000 1900 1950 2000 Birth year Proportion of Danes living in Fyn Fitted values Graphs by child gender (F = Female, M = Male) Annua 2003 Figure 2 l number of births in the county of Fyn during 1891– Observed an liv Figure 1 ing in the county of Fyn during 19 d predicted fractions of the Da 00–2003nish population Observed and predicted fractions of the Danish population Annual number of births in the county of Fyn during 1891– living in the county of Fyn during 1900–2003. Page 3 of 11 (page number not for citation purposes) 0 .02 .04 .06 .08 .1 .12 Number of births 0 1000 2000 3000 4000 5000 BMC Medical Research Methodology 2007, 7:53 http://www.biomedcentral.com/1471-2288/7/53 Further, let Z be age at death if Z <Y, that is disease does with associated cumulative distribution function G. In 0 0 not occur before death. If Y > Z , we let Y = ∞, and other- principle could well depend on covariates, but since we = ∞. To avoid ambiguity, we will at times consider either known or constant, we will ignore this. wise we let Z denote F as F . We will in the following assume (U , Y ),..., (U , Y ) to be 1 1 n n Since not all subjects will experience disease prior to independent and identically distributed (iid). Two crucial death, the pdf of Y, f(y|u), is a mixture distribution with assumptions must be considered. First, whether or not we two components: have calendar time stationarity with respect to age of onset, i.e., f(y|u) = (u)f*(y|u) + (1 - (u))I(y = ∞) ∞ ∞ (S1) Age of onset is independent of time of birth, i.e., where (u) is defined as P (Y < ∞ |u), i.e., it is the prob- F(y|u) = F(y). ability of disease occurring before death, I(·) is an indi- cator function, and f*(y|u) is the conditional pdf of Y Secondly, knowledge about the birth process will not be . Note, that since (u) is the given that Y < ∞, i.e., Y ≤ Z available in many applications. Hence we also consider 0 ∞ probability of disease occurring before death for a subject the situation with calendar time stationarity of the birth with birth at u, it is the lifetime risk for subjects with birth process: time u. (S2) Assume that the occurrence of initiating events, Cohort-of-cases data births, started in the distant past and that this birth rate Assume that we observe all ages of onset, Y, occurring has been stabilized. Or, quantitatively, assume that u = within the calendar time observation window [0; ), cf. inf{u: (u) > 0} is small enough so that u ≤ -y , and that g 0 x Figure 3 for a graphical presentation of the sampling is uniform on [-y ; ). scheme. Assume that the occurrence of births follows a Poisson process with intensity (u) for u ≤ , and that y = Stationary incidence, known birth process ((S1) only) sup{y: F* (y|u) < 1} exists and is finite for all u ≤ , i.e., y When only (S1) holds, the joint density of the observed is the maximal observable age at onset before death. We (u, y) can be written as follows: can then normalize the birth intensity (u) to a density g on [-y ; ), ⎡ ⎤ guI () (−≤ y u≤τ −y) pu (,y |−≤ U Y ≤ τ −U) = ⎢ ⎥ Gy () τ −−G(−y) ⎣ 0 ⎦ φ() u ⎡ ⎤ gu () = ∗ + ⎢ ⎥ 0 {( G τ −−− yG )(−y)}f (y)I(y ≤y ) φ() sds ⎢ ⎥ −y ⎢ ∗ ⎥⎥ {(Gs τ −− )G(−s)}f (s)ds ⎣ ⎦ Age ≡ p (u|y)p (y) c m where p (u|y) and p (y) are defined by the expressions in c m Z each bracket in (3), respectively. Thus p (u|y) can be inter- preted as the density of birth times conditional on y being observed in [0; ), and p (y) as the marginal density for Y 0 m the observed y weighted with w = G( -y) - G(-y), i.e., the i 0 probability of birth occurring within the interval [-y; -y). When g is known, then so is p , as are the weights in p . It c m is thus straightforward to compute the maximum likeli- hood estimate of F* based on the weighted observations: −1 iy : ≤ y ∗ i Fy () = Calendar time −1 ∑ w i =1 i Lex Figure 3 is diagram with observation window (gray area) Lexis diagram with observation window (gray area). Dotted −− 11 ww / lines indicate lifetime without disease until age of onset (Y), jj ∑ The estimate thus places mass at each jump or age at death (Z ), full lines lifetime with disease. Only age point j, where j corresponds to the observation number in at onset times within the observation window are observed (blue points) in a cohort-of-cases study. the ordered set of Y . That the estimate in (5) is the non- Page 4 of 11 (page number not for citation purposes) BMC Medical Research Methodology 2007, 7:53 http://www.biomedcentral.com/1471-2288/7/53 parametric maximum likelihood estimate (NPMLE) fol- can use Equation (1) to compute an estimate of F, the lows directly from standard results on NPMLE, see for unconditional cdf of age at onset. example the paper by Turnbull [18], who covers the gen- Stationary incidence, stationary birth process (both (S1) and (S2)) eral case of which this is a special case. If all weights are When both (S1) and (S2) hold, the marginal density of equal, the above formula reduces to the ordinary formula the observed y's can be further simplified by substitution for non-parametric estimation of a cdf in the uncensored of the uniform birth density in the corresponding expres- -1 case, putting mass n at each jump point. sion in Equation (3), i.e., With the estimate of the conditional cdf F* it is possible +∗ + ⎡ ⎤ to obtain an estimator of the unconditional F utilizing {/ττ( +yf)} (y)I(y ≤y ) ∗ + py () = ⎢ ⎥=≤ fy ()I(y y y ) their relationship given in Equation (1). What we need is ⎢ ⎥ ττ /( + y ) ⎣ ⎦ an estimate of , which may be obtained from noting that tr the occurrence rate of incident events, I , at any calendar Note, that here the density function of the observed y's time point is given by coincides with the population density function f* of the observable onset times, Y . In the case when only age at tr + onset distribution is of interest, and not lifetime risk, the It ()=− φ(u)f(t u)I(t−u≤y )du −∞ 'usual methods' are thus applicable to the case data to esti- mate f* by putting equal weights on all observations as t noted above. =− πφ() uf (t u)du −∞ If, however, we are also interested in the unconditional density, f(y), we need an estimate of to be able to pro- where the indicator function I(t - u ≤ y ) is needed, since ceed. Above, this was obtained from our knowledge of the the occurrence rate does not include those for which onset birth process, and in principle we could exploit this again. never happens, that is when y = t - u > y or equivalently However, in situations where a stationary birth process is that y = t - u = ∞. Integrating this over the observation win- assumed, this is typically because we lack information on dow, we find the birth process. Thus it may in such situations be neces- sary with alternative approaches. One obvious way to pro- ττ t tr ∗ I () t dt=− πφ(u) f (t u)du dt ceed is the following: In the time window where ∫∫ ∫ 00 −∞ information is collected on incident cases, we also collect information on deaths–either for all or a random sample– τ τ 0 0 ⎧ ⎫ and classify them according to whether or not they had =− πφ() u f (t u)dt du ⎨ ⎬ ∫ ∫ experienced disease. The relative frequency of diseased −∞ max(u,0) ⎩ ⎭ deaths will then be an estimate of under stationarity assumptions with respect to the birth process, the incidence τ τ −u 0 0 ⎧ ⎫ ∗ process, and the mortality. This estimate is valid if age-spe- =πφ()uf (t)dtdu ⎨ ⎬ ∫ ∫ cific mortality is assumed stationary both among diseased −∞ max(0,−u) ⎩ ⎭ and non-diseased–these strong assumptions reflect the lack of available information in such situations. With this esti- ∗∗ mate of we may then estimate the unconditional F. =− πφ(uF ){} (τ u)−F (max(0,−u)) du ∞ 0 −∞ Non-stationary incidence, known birth process (Neither (S1) nor (S2)) from which it follows that When neither (S1) nor (S2) hold, the likelihood becomes substantially more complicated. In principle, this can be ττ ⎡ ⎤ tr ∗∗ handled by introducing a parameter vector which relates πφ=− It ( )dt (u){} F (τ u)−F (max(0,−u)) du ∞ ⎢ 0 ⎥ ∫∫ 0 −∞ ⎣ ⎦ the incidence density to the time of birth. The rewriting presented in Equation (4) is still valid with the modifica- Although the estimate is intuitely attractive it is not clear tion that the density term p (y) now depends on the whether it is the MLE. However, if we fill in the MLE of F* parameter vector , i.e., in Equation (12), we do obtain an estimate of , since is τ ∗ tr {(Gy τθ−− | )G(−y|θ)}f (y|θ) It ()dt 0 py(| θ) = 0 m known and is estimated by the total number y ∗ {(Gs τθ−− | )G(−s|θ)}f (s|θ θ )ds of observed incidences over the interval [0; ). Having obtained the MLE of F* together with an estimate of , we Page 5 of 11 (page number not for citation purposes) BMC Medical Research Methodology 2007, 7:53 http://www.biomedcentral.com/1471-2288/7/53 Unfortunately this density does not directly permit use of the event indicator from onset of disease to death and the approach presented above for finding a non-paramet- maintaining the at-risk time. ric estimate of f*(y| ), nor for finding the corresponding estimate of (). From Λ and an estimate of F is given by (cf. [22] for One alternative is to set up a full likelihood by consider- a theoretical discussion, while [23] gives an example of its ing a full parametric model of both age of onset and age application) of death, but we will not go into further details here and instead commend this as a topic for future research. Fy ( )=− λ (s)exp[ Λ (s)]exp[−A (s)]ds YY z Ordinary non-parametric analysis The ordinary Nelson-Aalen analysis based on observed += Iy() y (1− π ) events and time at risk is well described elsewhere, see where is the hazard associated with Λ , and is the life- Y Y ∞ [19] for an extensive treatment of the subject, or [20] for a time risk given by more focused treatment. In short, we use age as the funda- mental time scale, and we then have delayed entry due to the fact that not all subjects are followed from birth. πλ=− (ss )exp[ Λ ( )]exp[−A (s)]ds ∞ YY z Rather, they enter the observation window and capture area at a certain age and are then followed until either As no analytic confidence intervals are available for the event or censoring, whichever comes first. lifetime risk, we obtained them using bootstrap as above. This can also be applied to obtain age-specific confidence We let subjects become at-risk one year after the start of intervals for F. the observation period if they resided in Fyn County in this period, or one year after entrance to the capture area, Projection of incidence if they immigrated to Fyn during the study period. In both Based on an estimate of F, projection of incidence is pos- cases the one year run-in period is used to identify subjects sible both inside and outside the observation window by not already in treatment (those without filled prescrip- application of the formula in Equation (6), when the tions in the period), as only they are at risk for becoming birth process is known and incidence is assumed station- incident. Subjects cease to be under observation either at ary. In the application studied here, the birth process is onset, death, emigration from Fyn, or end of follow-up, known for u ≤ . For u > it must be projected. Hence, we 0 0 whichever comes first. carry the last observed value of the birth process forward, i.e., let (u) = ( ) for u > . 0 0 As above we require calendar time stationarity for estima- tion of F. The second assumption in this setup is that entry Results and Discussion is independent of disease onset, i.e., age at immigration to Table 1 gives basic descriptive statististics of the studied Fyn County is not informative for the subsequent distri- population, as it shows the number of incidence events bution of Y. The final assumption is that censoring is non- tabulated by gender, birth period and calendar year, informative. The two latter assumptions are similar, but which is used for estimating age-specific incidence. not identical, to the assumption of balance of migration made in the analysis of doubly truncated data. The differ- Cohort-of-cases data ence is, that independent delayed entry and censoring Complete stationarity only concerns the time within the observation period. On Although the birth process is known in our setting, we the other hand, the balancing assumptions does not for comparison present an analysis based on assuming require independence, i.e., migrating subjects may well stationarity for the birth process, the incidence process, have a different morbidity than non-migrating subjects– as well as the mortality process among treated. We first which is indeed the case [21]–as long as the distribution classified all deaths according to whether or not a previ- of onset ages is similar among immigrating and emigrat- ous redemption of anti-diabetics had been observed, ing subjects. considering all with such a redemption to be diabetics. The lifetime risk, , was for females estimated at 9.68% Thus we get a non-parametric estimate of Λ , the cumula- (95% Confidence Interval: 9.35%; 10.02%) and for tive hazard for onset of disease. Similarly, a non-paramet- males at 10.86% (10.51%; 11.22%), where both confi- ric estimate of the cumulative hazard of death among dence intervals are binomial exact. The estimated inci- dence distribution, F, stratified on gender is shown in non-diseased, , can be obtained by simply exchanging Figure 4(a). Page 6 of 11 (page number not for citation purposes) BMC Medical Research Methodology 2007, 7:53 http://www.biomedcentral.com/1471-2288/7/53 Table 1: Number of incidence events by gender, calendar year of event and calendar year of birth Event year Gender Birth199319941995 19961997 199819992000 200120022003 Females -1909 60 24 30 22 7 19 9 7 6 4 4 1910-9 1037974 10181696658 65 5147 1920-9 110 123 99 122 98 112 118 109 106 119 118 1930-9 65 86 78 86 96 82 96 108 111 132 140 1940-9 51 51 55 70 65 64 88 100 115 117 137 1950-941 3132 26 32303848 51 6470 1960-9 18 18 13 20 8 18 28 29 30 34 48 1970-9 17 10 9 6 13 5 9 9 18 19 34 1980-9 4 3 24 5 6 8 7 9 5 11 1990- 4 6 3 3 5 7 10 10 Males -1909 29 19 1887 74 222 1910-994 8068 71 65584245 37 2128 1920-9 126 145 106 118 96 116 99 93 82 123 123 1930-9 107 95 114 132 126 119 131 156 143 126 174 1940-9 104 102 83 102 111 129 140 166 183 191 214 1950-949 5238 52 42776570 77 106 113 1960-916 1919 21 27232829 41 5553 1970-9 12 11 17 8 5 7 10 9 15 14 12 1980-9 8 2 3 28 6 2 7 12 12 12 10 1990- 53 267 895 Number of incidence events by gender, calendar year of event and calender year of birth. Stationarity of incidence, known birth process and associated ages at the events. The resulting estimates When only stationarity of the incidence distribution is of the incidence distribution F are displayed in Figure assumed, a non-parametric analysis based on the 4(b). weighted likelihood given in Equation (4) and the estima- tor of in Equation (12) can be conducted. With the gen- We see that the incidence distribution for both genders are der specific birth rates, we estimated gender specific made up of two components: The first component is a estimates of F*, , and hence F, from the observed events more or less constant density for ages below 40 years (the 0 20 40 60 80 100 0 20 40 60 80 100 age age Females Males Females Males (a) Stationary birth and incidence process (b) Stationary incidence process, known birth process on ge Figure 4 Estimated incidence nder distribution F for pharmacological treatment with any anti-diabetic drug with respect to age, and stratified Estimated incidence distribution F for pharmacological treatment with any anti-diabetic drug with respect to age, and stratified on gender. Page 7 of 11 (page number not for citation purposes) 0 .05 .1 .15 0 .05 .1 .15 BMC Medical Research Methodology 2007, 7:53 http://www.biomedcentral.com/1471-2288/7/53 linear part in F), whereas the second is a much higher, uni- births observed in 2003. Note, that the observed inci- modal density for ages above 40 years which vanishes for dence strongly suggests a departure from stationarity, and ages above 80 (the sigmoid shaped part of F). For females so future actual incidences are likely to be higher than the lifetime risk, , was estimated at 13.77% (13.74%; those projected from a stationarity assumption. The pro- 13.81%), for males at 15.61% (15.58%; 15.65%). Both jected incidences show a small but persistent decline for confidence intervals are computed using bootstrap with a 2004–2013 due to declining birth rates in the last half of thousand replications. The confidence intervals are very the twentieth century. The general level is much higher narrow which reflects the high statistical efficiency of the than above, reflecting the higher estimate of obtained weighted likelihood approach–which in turn partly from using the known birth distribution, but correspond comes from the strong assumption of stationarity. As well with observed incidences. birth counts are assumed known, this too contributes to the narrow confidence intervals, although to a lesser Ideally, projections should be accompanied by confi- degree. dence intervals, but we have been unable to compute them. While in principle some variant of bootstrap might The shape of F is quite similar to the unweighted estimate, be employed, this is numerically very demanding as the whereas the estimated lifetime risks are substantially entire cdf of age-specific incidence must be bootstrapped. higher than those estimated above. The major explana- Judged from the conifdence intervals of the lifetime risks, tion is of course lack of stationarity of the true lifetime risk the confidence intervals of the projections will be very and/or the disease duration: The estimate of based on small, reflecting both high efficiency of the method, as disease status among observed deaths takes most of its well as its strong assumptions. information from the older cohorts as they are the ones with high mortality. If the older cohorts had lower life- Ordinary non-parametric analysis time risk and/or previously had relatively higher mortality The gender specific estimates of F are shown in Figure 6. among diseased compared to non-diseased (both of these The shape of the estimated cdf is very similar to the one scenarios are very realistic, but contrary to assumptions of obtained above using a known birth process for weight- the previous analysis), this will result in a decreased esti- ing. The estimated lifetime risks are 15.65% (15.14; . This would be amplified if older cohorts are mate of 16.16) for females, and 17.91% (17.38; 18.44) for males, larger than younger cohorts, as is indeed the case here, cf. where confidence intervals were found from bootstrap Figure 2. with 1,000 replications. This is somewhat higher than when analyzing data as doubly truncated. The explana- Contrastingly, when indirectly estimating based on tion is that mortality has generally declined substantially weighting with the birth process, the estimate can be over the past century, and hence an estimate based on the viewed as a weighted average of over the entire interval mortality rates observed within the observation window for the birth process [-y+; -y). Projection of diabetes incidence In the completely stationary situation, where (S1) and (S2) are both assumed to hold, the projected annual inci- dence is a constant number equaling the lifetime risk mul- tiplied by the annual number of births. As the annual number of births are usually not observable in such set- tings, an alternative is needed. In the spirit of estimating from the treatment status among deaths, one could take the total annual number of deaths as an estimate of the number of births. The reasoning for this is that if the pop- ulation is in a completely stationary state, the annual 1993 1995 1997 1999 2001 2003 2005 2007 2009 2011 2013 number of deaths must on average equal the average year annual number of births. In our setting the observed Females, predicted Females, observed Males, predicted Males, observed numbers of deaths over the 11 year period are 29,871 for females and 29,816 for males yielding projected, annual incidences of 262.8 for females and 294.3 for males. In Pr the incidenc Figure 5 ojec count ted and observed ann e a y of Fyn ba nd using ased on weighted, non-p an assum ual numbers pa tion of a rametric estimate of of inciden stat tionary events in F Figure 5 the incidence is projected based on the weighted, Projected and observed annual numbers of incident events in non-parametric estimate of F obtained above, i.e., with the county of Fyn based on an assumption of a stationary known birth intensity and stationary incidence. All incidence and using a weighted, non-parametric estimate of F. annual birth counts after 2003 are set to the number of Page 8 of 11 (page number not for citation purposes) 0 200 400 600 800 BMC Medical Research Methodology 2007, 7:53 http://www.biomedcentral.com/1471-2288/7/53 1993 1995 1997 1999 2001 2003 2005 2007 2009 2011 2013 0 20 40 60 80 100 year Age Females, predicted Females, observed Females Males Males, predicted Males, observed m stratifi Figure 6 Ese timated incidenc nt with any ed on gend anti-diabetic dr er e distribution ug with respect F for pharma to age, and cological treat- Pr the county of Fyn mate of Figure 7 ojected and observed ann F with inde bpe ased ndent del on the ual a num ordin yed entry bers ary non-parametric of incident events in esti- Estimated incidence distribution F for pharmacological treat- Projected and observed annual numbers of incident events in ment with any anti-diabetic drug with respect to age, and the county of Fyn based on the ordinary non-parametric esti- stratified on gender. Ordinary non-parametric estimate with mate of F with independent delayed entry. independent delayed entry. leads to a higher risk of diabetes onset prior to death than tively fast computational procedures developed, confi- an estimate which implicitly accounts for past mortality. dence intervals for the lifetime risk could be obtained from direct application of bootstrap methodology. We Projection of diabetes incidence were however unable to provide confidence intervals for Projections are obtained as above–except that the ordi- projection of incidence. nary non-parametric estimate of F is used–and results are shown in Figure 7. Due to the elevated lifetime risk the As stated by Narayan et al. in 2003, lifetime risk of diabe- projected incidences are now higher–also too high com- tes appears not to have been estimated prior to their pared to observed incidences. Also for this projection we paper [8], and only one subsequent paper have reported have been unable to provide confidence intervals for the comparable estimates of lifetime risk [24]. The directly same reasons as above. comparable estimates for the US population found in [8] are substantially higher (39% for females, 33% for Conclusion males) than ours (14% for females, 16% for males). The In this paper we have developed and implemented meth- two major reasons for the difference is a generally lower ods for estimating and projecting incidence, as well as the diabetes incidence in Denmark [4], as well as the fact that lifetime risk of a disease based on observation of incident our estimates only pertain to pharmacologically treated events in an observation window, i.e., what we termed diabetes. It would however be interesting to explore if cohort-of-cases data. The developed methodology yields part of the difference is due to their use of the traditional non-parametric estimates comparable to those of a stand- method, as the traditional method in our material leads ard Nelson-Aalen analysis based on independent delayed to an elevated estimate of lifetime risk of 16% for females entry, but it gives slightly better projections of incidence and 18% for males. It is further interesting that the gen- due to its implicit accounting for the unobserved mortal- der differences are in opposite directions in the two ity among untreated in the past. countries. In its simplest form–i.e., assuming both a stationary birth Several papers have used estimates of incidence to project process and incidence–a simple non-parametric estimate the future burden of diabetes, most prominently [2,5,6]. of the age of onset distribution is obtained. When alterna- For all three, it would be interesting to re-analyze their tively the birth process is considered known, this is taken data using our developed method for cohort-of-cases into account by a weighted, non-parametric estimate with data, if possible, to see if a similar discrepancy exist weights based on the relative sizes of the relevant birth between the two analytical methods as we have found, cohorts. Both approaches directly provide estimates of where the traditional method lead to an inflated projec- age-specific incidence as well as of lifetime risk, which are tion of the number of incident events of diabetes, when of considerable public health interest. Due to the rela- compared to the observed count. Page 9 of 11 (page number not for citation purposes) 0 .05 .1 .15 .2 0 200 400 600 800 BMC Medical Research Methodology 2007, 7:53 http://www.biomedcentral.com/1471-2288/7/53 For the theoretical developments, assumptions (S1) and References 1. Mokdad A, Bowman B, Ford E, Vinicor F, Marks J, Koplan J: The con- (S2) have been crucial, but from an applied perspective tinuing epidemics of obesity and diabetes in the United the assumptions are very restrictive. In our application States. JAMA 2001, 286:1195-200. concerning diabetes, the assumptions are likely not satis- 2. Honeycutt AA, Boyle JP, Broglio KR, Thompson TJ, Hoerger TJ, Geiss LS, Narayan KMV: A dynamic Markov model for forecasting fied, as it is questionable that both age-specific incidence diabetes prevalence in the United States through 2050. and age-specific mortality among diabetics have been Health Care Manag Sci 2003, 6(3):155-164. 3. Wild S, Roglic G, Green A, Sicree R, King H: Global prevalence of constant since 1900–rather, changes in incidence due to diabetes: estimates for the year 2000 and projections for altered lifestyle, and changes in mortality due to improved 2030. Diabetes Care 2004, 27(5):1047-53. treatment and general health are reasonable. Indeed, it is 4. Støvring H, Andersen M, Beck-Nielsen H, Green A, Vach W: Count- ing drugs to understand the disease: The case ofmeasuring known that within the observation window of 1993 and the diabetes epidemic. Popul Health Metr 2007, 5:2. 2003, statistically significant trends exist for both quanti- 5. Narayan KMV, Boyle JP, Geiss LS, Saaddine JB, Thompson TJ: Impact ties [4]. Yet the predictions based on the developed model of recent increase in incidence on future diabetes burden: U.S., 2005–2050. Diabetes Care 2006, 29(9):2114-2116. are at least as good as those based on the ordinary non- 6. Evans JMM, Barnett KN, Ogston SA, Morris AD: Increasing preva- parametric method, showing the potential of the devel- lence of type 2 diabetes in a Scottish population: effect of increasing incidence or decreasing mortality? Diabetologia oped model. More work on relaxing the assumptions is 2007, 50(4):729-732. however mandated before the model can be used more 7. Fox CS, Pencina MJ, Meigs JB, Vasan RS, Levitzky YS, D'Agostino RB: generally. Trends in the incidence of type 2 diabetes mellitus from the 1970s to the 1990s: the Framingham Heart Study. Circulation 2006, 113(25):2914-2918. Although we in principle showed how the stationarity 8. Narayan KMV, Boyle JP, Thompson TJ, Sorensen SW, Williamson DF: assumption could be relaxed by formulating a full, para- Lifetime risk for diabetes mellitus in the United States. JAMA 2003, 290(14):1884-1890. metric likelihood, we did not give a detailed analysis of 9. Harris M, Flegal K, Cowie C, Eberhardt M, Goldstein D, Little R, this situation due to its complexity. Also, the data consid- Wiedmeyer H, Byrd-Holt D: Prevalence of diabetes, impaired fasting glucose, and impaired glucose tolerance in U.S. ered in this paper are rather limited since, first, the obser- adults. The Third National Health and Nutrition Examina- vation window is short compared to typical disease tion Survey, 1988–1994. Diabetes Care 1998, 21(4):518-24. duration, and second, no information is available on age 10. Prentice RL: A Case-Cohort Design for Epidemiologic Cohort Studies and Disease Prevention Trials. Biometrika 1986, of onset outside the observation window. As a result, we 73:1-11. have been unable to allow for trends in incidence and 11. Mantel N: Synthetic retrospective studies and related topics. mortality, the absence of which must be considered unre- Biometrics 1973, 29(3):479-86. 12. Prentice RL, Breslow NE: Retrospective Studies and Failure alistic. In some epidemiological settings it will, however, Time Models. Biometrika 1978, 65(1153-158 [http://links.jstor.org/ be possible to obtain data on age of onset for subjects sici?sici=0006- 3444%28197804%2965%3A1%3C153%3ARSAFTM%3E2.0.CO%3B2- prevalent at start of the time window or for diseased sub- E]. jects dying in the observation window [25]. While such 13. Oakes D: Survival times: aspects of partial likelihood. Internat information is valuable and needs to be incorporated in Statist Rev 1981, 49(3):235-264. With discussion and a reply by the author the analysis to allow relaxation of assumptions, it requires 14. Thomas DC: General Relative-Risk Models for Survival Time knowledge about the past mortality among diabetics. In and Matched Case-Control Analysis. Biometrics 1981, 37(4673-686 [http://links.jstor.org/sici?sici=0006- contrast, we have tried to develop a methodology that 341X%28198112%2937%3A4%3C673%3AGRMFST%3E2.0.CO%3B2 only rely on observation of incident events and past birth -V]. rates, which are often easier to obtain. There is, however, 15. Lubin J, Gail M: Biased selection of controls for case-control analyses of cohort studies. Biometrics 1984, 40:63-75. a need for further research on the applicability and exten- 16. The WHO Collaborating Centre for Drug Statistics Methodology: sions of the method before its potential can be more ATC index with DDDs and Guidelines for ATC classification and DDD clearly appreciated. assignment. Oslo 2001. 17. Støvring H, Andersen M, Beck-Nielsen H, Green A, Vach W: Rising prevalence of diabetes: evidence from a Danish pharmaco- Competing interests epidemiological database. Lancet 2003, 362(9383):537-8. 18. Turnbull BW: The empirical distribution function with arbi- The author(s) declare that they have no competing inter- trarily grouped, censored and truncated data. J R Statist Soc A ests. 1976, 38(3):290-295. 19. Andersen PK, Borgan O, Gill RD, Keiding N: Statistical models based on counting processes New York: Springer-Verlag; 1997. Corrected Authors' contributions second printing HS had the original idea for the study, carried out all anal- 20. Keiding N: Independent delayed entry. In Survival Analysis: State of the Art Edited by: Klein JP, Goel PK. Kluwer Academic Publishers; yses and drafted the original and revised manuscripts with 1992:309-326. substantial input from MCW. The planning of analyses 21. Støvring H: Selection bias due to immigration in pharmacoep- and interpretation of the data was the joint product of dis- idemiologic studies. Pharmacoepidemiol Drug Saf 2007, 16(6):681-686. cussions between MCW and HS. Both authors have seen 22. Keiding N: Age-specific Incidence and Prevalence: a Statistical and approved the final version. Perspective. J R Statist Soc A 1991, 154(3):371-412. 23. Faergemann C, Lauritsen JM, Brink O, Stovring H: What is the life- time risk of contact with an A&E Department or an Institute Page 10 of 11 (page number not for citation purposes) BMC Medical Research Methodology 2007, 7:53 http://www.biomedcentral.com/1471-2288/7/53 of Forensic Medicine following violent victimisation? Injury 2006, 39(1):121-127. 24. Narayan KMV, Boyle JP, Thompson TJ, Gregg EW, Williamson DF: Effect of BMI on lifetime risk for diabetes in the U.S. Diabetes Care 2007, 30(6):1562-1566. 25. Keiding N, Holst C, Green A: Retrospective estimation of diabe- tes incidence from information in a prevalent population and historical mortality. Am J Epidemiol 1989, 130(3):588-600. Pre-publication history The pre-publication history for this paper can be accessed here: http://www.biomedcentral.com/1471-2288/7/53/prepub Publish with Publish with Bio Bio Med Med Central Central and and e ev ver ery y scientist can scientist can r read ead y your our w work ork fr free of ee of charge charge "BioMed Centr "BioMed Central al will will be be the the most most signif significant icant de development velopment f for or disseminating the disseminating the r results esults of of biomedical biomedical r researc esearc h h in in our our lif lifetime etime." ." Sir Paul Nurse, Cancer Research UK Sir Paul Nurse, Cancer Research UK Y Your research papers will be: our research papers will be: a available fr vailable free ee of of charge charge to the to the entir entire e biomedical biomedical comm community unity peer r peer re evie view wed ed and and published published immediatel immediately y upon upon acceptance acceptance cited in cited in PubMed PubMed and and ar archiv chived ed on PubMed on PubMed Central Central y yours — ours — y you ou k keep eep the the cop copyright yright Bio BioMed Medcentral central Submit your manuscript here: Submit your manuscript here: http://www http://www.biomedcentral.com/info/publishing_adv .biomedcentral.com/info/publishing_adv.asp .asp Page 11 of 11 (page number not for citation purposes)

Journal

BMC Medical Research Methodology – Springer Journals

Published: Dec 20, 2007

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

A new approach of nonparametric estimation of incidence and lifetime risk based on birth rates and incident events

A new approach of nonparametric estimation of incidence and lifetime risk based on birth rates and incident events

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

A new approach of nonparametric estimation of incidence and lifetime risk based on birth rates and incident events

A new approach of nonparametric estimation of incidence and lifetime risk based on birth rates and incident events

References (32)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies