How many mailouts? Could attempts to increase the response rate in the Iraq war cohort study be counterproductive?

A Tate; Margaret Jones; Lisa Hull; Nicola Fear; Roberto Rona; Simon Wessely; Matthew Hotopf

doi:10.1186/1471-2288-7-51

How many mailouts? Could attempts to increase the response rate in the Iraq war cohort study be counterproductive?

Tate, A; Jones, Margaret; Hull, Lisa; Fear, Nicola; Rona, Roberto; Wessely, Simon; Hotopf, Matthew 2007-11-28 00:00:00 Background: Low response and reporting errors are major concerns for survey epidemiologists. However, while nonresponse is commonly investigated, the effects of misclassification are often ignored, possibly because they are hard to quantify. We investigate both sources of bias in a recent study of the effects of deployment to the 2003 Iraq war on the health of UK military personnel, and attempt to determine whether improving response rates by multiple mailouts was associated with increased misclassification error and hence increased bias in the results. Methods: Data for 17,162 UK military personnel were used to determine factors related to response and inverse probability weights were used to assess nonresponse bias. The percentages of inconsistent and missing answers to health questions from the 10,234 responders were used as measures of misclassification in a simulation of the 'true' relative risks that would have been observed if misclassification had not been present. Simulated and observed relative risks of multiple physical symptoms and post-traumatic stress disorder (PTSD) were compared across response waves (number of contact attempts). Results: Age, rank, gender, ethnic group, enlistment type (regular/reservist) and contact address (military or civilian), but not fitness, were significantly related to response. Weighting for nonresponse had little effect on the relative risks. Of the respondents, 88% had responded by wave 2. Missing answers (total 3%) increased significantly (p < 0.001) between waves 1 and 4 from 2.4% to 7.3%, and the percentage with discrepant answers (total 14%) increased from 12.8% to 16.3% (p = 0.007). However, the adjusted relative risks decreased only slightly from 1.24 to 1.22 for multiple physical symptoms and from 1.12 to 1.09 for PTSD, and showed a similar pattern to those simulated. Conclusion: Bias due to nonresponse appears to be small in this study, and increasing the response rates had little effect on the results. Although misclassification is difficult to assess, the results suggest that bias due to reporting errors could be greater than bias caused by nonresponse. Resources might be better spent on improving and validating the data, rather than on increasing the response rate. Page 1 of 8 (page number not for citation purposes) BMC Medical Research Methodology 2007, 7:51 http://www.biomedcentral.com/1471-2288/7/51 Background Methods Poor response is a major source of concern in epidemio- Data and measures used For investigation of nonresponse bias logical surveys, and much effort is often spent on chasing up initial non-responders [1] with the implicit assump- We examined data on 17,370 personnel who had been tion that a higher response rate is associated with a more sampled for the first wave of data collection of the Iraq representative sample and hence lower bias. However, war cohort study. All personnel had been employed in the there is increasing evidence that this assumption may not military between January 18th and June 28th 2003: 7,621 always be true. Several reports have found little difference (labelled Op TELIC 1) were recorded as having been in the risk estimates obtained from the first wave of deployed in Iraq during this period and 9749 (labelled response and later waves [2-5]. In addition, a recent sim- Era). were not recorded as having been deployed on Op ulation study by Stang et al [6] suggests that if misclassifi- TELIC1. Participants were contacted by post, or were cation error increases with the number of contact asked to complete a questionnaire during military unit attempts, or the prevalence of the exposure decreases, visits made by the research team. Up to 5 further attempts then, if misclassification is non-differential (i.e. inde- were made to recruit initial non-responders. Reservist per- pendent of exposure status) the estimates after each sonnel were over-sampled by a ratio of 2:1. The study attempt will become successively biased towards the null received approval from the Ministry of Defence (Navy) hypothesis. Their results are consistent with the long- personnel research ethics committee and the King's Col- known fact that non-differential independent misclassifi- lege Hospital local research ethics committee. Full details cation error of a dichotomous outcome will always bias a of the study design, the participants and the questionnaire relative risk estimate for a binary exposure towards the are described in [12]. null value (i.e. no difference) [7-9]. 129 personnel who appeared to have never received a While there is an extensive literature on evaluating and questionnaire (i.e. all mailings were listed as return to dealing with the effects of survey nonresponse (e.g. the sender, or they had been recorded as absent during a mil- collection of articles in [10]) misclassification bias is itary unit visit) were excluded as were 42 who were mostly ignored in the survey literature particularly in rela- recorded as having died during the study and 166 (1%) tion to attempts to increase response. We could find only who refused to take part in the study. Of the remaining a few studies which reported the effect of increasing 17,162 personnel, 10,256 (60%) were listed as having response on the relative risks e.g. [2-4] and none that returned the questionnaire and were labelled 'responders'. explicitly examined whether increasing response rates increased the bias. This was surprising, since the propor- Demographic information, including age, rank, Service tion of missing information has been found to be greater and address, for individuals in our sample was provided for late responders [5,11] which suggests that late by the Defence Analytical Services Agency (DASA), who responders may take less care in answering a question- also provided a monthly fitness category for each person, naire and hence make more errors. indicating whether or not they were fit for active duty dur- ing that month, known in military jargon as "downgrad- To help redress this imbalance, we report an empirical ing status". This study is unusual in that we were able to evaluation of the effect of nonresponse bias and outcome ascertain the health of non-responders for over two years misclassification on the relative risks of two health out- following the start of the study. Fitness data were available comes which were obtained from a recent large study of for 99% of regulars and for 55% of reservists. For the pur- the health of United Kingdom (UK) military personnel pose of this study 'fit' was defined as fit to deploy at all deployed to the 2003 Iraq war [12]. In the first part of this times between May 2003 (end of TELIC 1) and August study we attempt to assess the effect of nonresponse bias 2005. Reservists were excluded from all analyses using the on the results by comparing the known characteristics of fitness data because of the large percentage with missing responders and non-responders. In the second part we data. They were, however, included in all other analyses investigate the pattern of misclassification and prevalence since reservists showed the biggest health differences of health risk factors in those who responded. We com- between TELIC 1 and Era. pare relative risks that were observed with those simulated using Stang's algorithm in an attempt to ascertain the For investigation of bias across response waves effect of reporting errors across successive waves of For this part of the analysis we used data on the response response, and whether increasing the initial response rate patterns, fitness indicators and replies to health questions of 43% to 60%, by numerous and diligent attempts at of 10,234 survey participants (labelled 'full responders') contact could possibly have been counterproductive. after excluding 18 responders who completed only the first page of the questionnaire. These respondents had been sent (or believed they had been sent) the incorrect Page 2 of 8 (page number not for citation purposes) BMC Medical Research Methodology 2007, 7:51 http://www.biomedcentral.com/1471-2288/7/51 questionnaire, i.e. a questionnaire tailored for the TELIC sender, or the person was listed as being not present at a 1 group when they had not been deployed on TELIC 1. A unit visit (e.g. wave 1 respondents are those that further 57 responders were re-assigned from the TELIC 1 responded at first contact). Two measures were used to to Era group and 22 individuals from Era to the TELIC 1 assess prevalence of the outcomes, those obtained from group after establishing that they had been wrongly clas- the questionnaires, and the fitness category for each per- sified [12]. son. Although previous evidence has shown that the cor- relation between fitness status and perceived health may The paper by Stang et al., on which we based the simula- be quite weak [17], fitness status will provide some indi- tions, considers error in the exposure variable, for exam- cation of the likely physical and mental health levels of ple alcohol consumption, and assumes that the outcome, respondents at each wave. for example liver cancer, is known. Since the exposure (deployment on TELIC 1) is known in the Iraq war study, Analysis we are concerned with misclassification of outcome, but Statistical analysis the same principles will apply [13]. We consider two Statistical analyses were carried out using Stata 9 (Stata health outcomes: multiple physical symptoms (18 or Corporation, Texas, USA), using the svy commands and more physical symptoms) and post-traumatic stress disor- sampling weights to adjust for the oversampling of reserv- der (PTSD) defined as having a score of 50 or more on the ists. Post traumatic Check List (PCL), a commonly used meas- ure of PTSD [14] We have defined outcome misclassifica- The factors which differed between responders and non- tion as "errors caused by carelessness in completing the responders were identified using the chi-squared test and questionnaire." Another possibility would have been to a multivariable logistic regression model, based on these define misclassification as under or over-reporting of mul- factors (including any significant interactions), was used tiple physical symptoms. However, since the purpose of to predict the probability of response. These probabilities the Iraq war study was to identify people who perceived were used to construct an inverse probability weight for that they had a health problem, rather than to identify each responder, which was then multiplied by the sam- those that had some quantifiable disease, the first defini- pling weight. Relative risks for the main health outcomes tion seemed more apt for this investigation. We used two were estimated with and without response weights and measures for assessing the extent of misclassification: 1. compared in order to determine the extent of nonre- the percentage of discrepant answers to a question on sponse bias. health that asked a similar question in a different way: and 2. the percentage of missing answers to PTSD, and other All relative risks were estimated using Poisson regression health questions. For the first measure respondents were [18]. The estimates of relative risks across response waves labelled 'discrepant' if they gave the same (contradictory) were adjusted for age, sex, rank, service type, and reservist answer to the two questions "I'm as healthy as anyone I status but (in contrast to [12]) we excluded any covariates know" and "I seem to get ill more easily than other peo- that might be misclassified and hence cause extra bias ple," where the choice of answers were "definitely true", [13]. The Rao and Scott second order correction was used "mostly true", "mostly false" or "definitely false" [15]. For for Chi squared tests and an extension of the Wilcoxon this measure two variables were constructed, 'discrepant rank-sum test was used to test for trends. Sample weights 1', excluded any missing values for the two questions, and were used for all analyses (and reported percentages) 'discrepant 2' labelled those with missing values for both except tests for trend and the Spearman correlation. All questions as discrepant. For the second measure, having reported p values are two-sided. missing health data was defined as falling into at least one Simulations of the following categories: 1. having at least 4 missing answers to either the PTSD or General Health Question- The equation presented on page 206 of [6] was used 1. to naire 12 [16]; 2. not answering either of the two questions simulate the 'true' (unbiased) relative risks that would described above; 3. not answering a question on general have been observed at wave 4 (for all responders) if there health. The questions on multiple physical symptoms had been no misclassification and 2. to simulate the were not included in this measure since participants were biased 'observed' relative risks for wave 1 – wave 3 that only required to respond to this question if they had at would result from these 'true' relative risks for a range of least one symptom. Full details on all the questions on 'true' prevalence rates. We compared the simulated health are provided in [12]. observed relative risks with those estimated from the data. We used the proportion of discrepant answers and miss- As in [6] wave was defined as the number of contacts that ing data as measures of misclassification (unlike [6] who were needed before a successful response, after excluding used hypothesised specificity and sensitivity). Full details any attempts where the questionnaire was returned to of the calculations are provided in the additional material Page 3 of 8 (page number not for citation purposes) BMC Medical Research Methodology 2007, 7:51 http://www.biomedcentral.com/1471-2288/7/51 (see Additional file 1). The R programming language was to have multiple physical symptoms and 3 times as likely used for all the simulations [19]. to be classified with PTSD. However, the number of symp- toms and PTSD score were only weakly correlated with fit- ness status (with Spearman correlation coefficients of - Results Comparison of responders with non-responders 0.2). The response rate to the survey was 60%. All of the factors we investigated were related to response (Table 1) except The percentage of full respondents who gave the same fitness status (p = 0.5), with 22.6% of responders and answer to the two health questions was 11.8 increasing to 22.3% of non-responders labelled as being unfit anytime 13.2 when those with missing answers to both questions between May 2003 and August 2005. were included. These percentages were the same for mail and unit visit responses. The most common pair of dis- Weighting to account for these factors (except ethnic crepant answers to the two questions: "I get ill more easily group which had 14% missing) had little effect on the rel- than other people" and "I am as healthy as anyone I ative risks. The relative risk for multiple physical symp- know" was "mostly false" (6.5%) followed by "definitely toms by deployment status was 1.19 (95% confidence false" and "mostly true" (both 2.6%), with only 0.2% interval: 1.07, 1.34) using sample weights alone and 1.19 answering both questions "definitely true". There were (1.06, 1.33) when nonresponse weights were employed. 2.7% with missing answers for at least one of the two For PTSD, the relative risks were 1.17 (0.96, 1.43) and questions and 1.7% with both. There were slightly fewer 1.15 (0.94, 1.42) respectively. discrepancies in the TELIC 1 cohort; 10.9% TELIC 1 vs. 12.6% Era (p = 0.01). This difference was mainly due to Investigation of responses the smaller percentage of TELIC 1 personnel answering 72% of the participants responded at first contact, and "definitely false" to both questions (1.7% vs. 3.3%). 88% had responded after one reminder (wave 2). 11% of These differences held after adjustment for the other only individuals were classified as having multiple physical factors found to be related to discrepancies, i.e. lower symptoms and 4% were categorized as having PTSD. rank, and Service (the Army had the highest percentage). Those labeled as unfit were two and a half times as likely However, the percentage with missing answers to both Table 1: Response rates according to demographic and other factors. Response differed significantly for all factors shown ((p < 0.001) Total N Percentage (weighted) who responded Age (years) at 01/01/05 <25 3,442 50 25–29 3,347 58 30–34 3,432 64 35–39 3,173 67 40–49 3,137 64 50–60 631 69 Gender Male 15,585 60 Female 1,577 67 Service Naval Service 2,943 57 Army 10,936 61 RAF 3,283 61 Rank Officer 2,718 70 Rank 14,444 59 Status Reservist 2,987 53 Regular 14,175 61 Deployment Era 9,627 58 TELIC 1 7,535 63 Ethnic group* White 14,142 62 Non-white 882 56 All addresses military Yes 12,705 70 No 4,457 30 Military unit visited Yes 5,252 68 No 11,910 57 Total 17,162 60 *Ethnic group was missing for 2456 personnel. Page 4 of 8 (page number not for citation purposes) BMC Medical Research Methodology 2007, 7:51 http://www.biomedcentral.com/1471-2288/7/51 questions was significantly (p = 0.02) greater for TELIC 1 each wave as the hypothesised misclassification rate. This than Era (2.1% vs. 1.5%). measure had the advantage that it represents the worse case scenario and provides an upper bound for the per- When the discrepancy variable was recalculated to include centage of true misclassification. The true relative risks those with missing data for both questions the difference that would lead to the observed relative risks at wave 4, i.e. between TELIC 1 and Era was reduced to 12.6% versus 1.22 for multiple physical symptoms and 1.09 for PTSD if 13.7% and became less significant (p = 0.11). For the pur- misclassification was 13.6% are shown in Table 4 column pose of this study, we shall assume that this measure is 3 for a range of true prevalence rates. This shows that the non-differential between TELIC 1 and Era. effect of misclassification decreases with increased true prevalence, so for example, a true prevalence of 8% for Investigation of misclassification bias across response multiple physical symptoms would mean that the true rel- waves ative risk was nearly double that observed, while a true The percentage of people giving discrepant answers to the prevalence of 16% would only increase it by 20%. health questions did not change significantly with number of contact attempts, unless those who had miss- The lowest true prevalence rate for PTSD, compatible with ing data for both of the two questions were included as 13.6% misclassification, was 11%. Since it seems unlikely discrepant, when there was a significant upward trend that the true prevalence of PTSD was over three times that (Table 2). There was also an upward trend in missing observed, we repeated the simulations using a more con- answers to any health question (Table 2). servative estimates for misclassification of 6.5%, i.e. half that of discrepant 2 (Table 4 column 4). The prevalence Since there was no apparent trend between number of range compatible with this percent was more plausible attempts at contact and fitness status, PTSD or multiple (3–9%). Even though the difference between the true and physical symptoms (Table 2) we assumed that the true observed relative risks is much smaller, the difference is and observed prevalence of both outcomes was constant still large when the true prevalence is small, most notably across wave. for PTSD at the lowest end of the compatible range (3%) which is associated with a low (and possibly implausible) Comparison of observed and simulated relative risks across response true positive rate of 0.2%. wave Table 3 shows the (adjusted) observed cumulative relative The simulated true relative risks shown in column 3 of risks of the two health outcomes by response wave, show- Table 4 were then used to calculate the cumulative relative ing that these risks are slightly higher at wave 1 than wave risks that would be expected at each wave if the percent of 4. misclassification was the same as discrepant 2 (Table 5). The simulated observed relative risks show a similar pat- Since the main aim was to assess the change in relative risk tern of changes as the actual observed relative risks across by response wave, and because we needed a non-differen- wave, with the differences across wave becoming less as tial measure, we chose to use the percentage of discrepan- the true prevalence increases. This same pattern was cies which included missing answers (discrepant 2) at observed when the percentages of missing data at each Table 2: Trends in discrepancies, PTSD data, fitness status and health outcomes by response wave (number of times a person was contacted before response) Number of contacts 1 2 3 4+ All p-value Full responders (N) 7,384 1,651 758 441 10,234 Indicators for misclassification Discrepant 1 (%) 11.8 11.1 13.1 12.3 11.8 0.5 Discrepant 2 (%) 12.8 12.9 15.8 16.3 13.2 0.007 At least 1 health question missing (%) 2.4 4.0 5.6 7.3 3.1 <0.001 Indicators for outcome prevalence Unfit (%) 22.8 21.8 22.3 21.7 22.4 0.4 PTSD 50+ (%) 3.7 4.7 4.1 3.3 3.9 0.6 Symptoms 18+ (%) 10.8 11.7 11.1 9.8 10.9 0.97 Estimates are weighted to allow for oversampling of reservists. p-values are for trend Those with missing values to either question are excluded from discrepant 1, but included as discrepancies in discrepant 2 Page 5 of 8 (page number not for citation purposes) BMC Medical Research Methodology 2007, 7:51 http://www.biomedcentral.com/1471-2288/7/51 Table 3: Cumulative observed relative risks* and 95% confidence intervals for health outcomes over response wave. Outcome Number of contacts 123 4+ (all)** Symptoms >17 1.24 1.09, 1.42 1.23 1.09, 1.39 1.21 1.08, 1.36 1.22 1.09, 1.36 PTSD 1.12 0.89, 1.43 1.15 0.93, 1.41 1.13 0.92, 1.38 1.09 0.89, 1.33 N 7384 9035 9793 10234 Estimates are weighted to allow for oversampling of reservists. *Adjusted for age, sex, rank, service type, and reservist status. **These estimates are slightly different from those reported in [12] because we use relative risks rather than odds ratios, and also because we have excluded any confounders that were likely to be misclassified which could cause extra bias. wave (which caused the increase in discrepancies by ing due to nonresponse is ignorable (i.e. that it does not wave) was used to simulate the 'observed' relative risks depend on non-measured factors) our findings seem (data not shown). plausible since they are supported by other studies, including that of Klesges et al. [21] who asked US Air Force personnel, who were required to complete a ques- Discussion We could find no evidence of nonresponse bias in the Iraq tionnaire on health, whether they would have partici- war study. In common with most surveys [20], response pated if it had not been compulsory. They found that the rate differed significantly according to age, rank (a meas- risk estimates were similar for those classed as possible ure of socio-economic status), gender and ethnic group responders compared with definite non-responders. and also according to cohort enlistment type (regular/ reservist), the address type (military or civilian) and Although difficult to quantify, the percentage of missing whether or not the unit was visited. However, the level of answers to health questions suggests that outcome mis- fitness (assessed from downgrading status) was not classification is at least 3%, and the percentage of related to response and adjustment for the factors listed responders who gave contradictory answers to two ques- above, using nonresponse weights, made little difference tions asking essentially the same thing, suggests that it to the results. Although the use of response weights to esti- could be as high as 14%. A significant upward trend in mate bias is based on the assumption that the data miss- missing answers suggests that carelessness in answering Table 4: Simulated true relative risks (RR's) for multiple physical symptoms and PTSD for a range of hypothesised true prevalence rates. The calculations are based on Stang's algorithm (using an iterative approach to obtain the true RR's). Outcome True prevalence (%) True RR Multiple symptoms: 13.22% error 6.5% error Observed prevalence 10.84% 6 3.05 1.48 Observed RR 1.22 8 2.15 1.41 10 1.83 1.35 12 1.65 1.31 14 1.52 1.28 16 1.44 1.26 PTSD Observed prevalence 3.90% 3 N/A 5.5 Observed RR 1.09 4 N/A 1.64 5 N/A 1.35 7 N/A 1.18 9 N/A 1.12 11 1.90 N/A 13 1.25 N/A 15 1.16 N/A * N/A indicates that the true prevalence for this row is not compatible with the specified error rate Page 6 of 8 (page number not for citation purposes) BMC Medical Research Methodology 2007, 7:51 http://www.biomedcentral.com/1471-2288/7/51 Table 5: Simulated true and cumulative relative risks (RR's) for TELIC 1/Era that would be observed at each wave if the misclassification rates correspond to the percentage of discrepancies* and the relative risks and the prevalence rates correspond to those observed for multiple physical symptoms and PTSD Prevalence (%) True RR Simulated cumulative 'observed' relative risks Observed True Wave 1 Wave 2 Wave 3 Wave 4+ (observed) Multiple Symptoms 10.84 6 3.05 1.26 1.26 1.24 1.22 8 2.15 1.24 1.24 1.23 1.22 10 1.83 1.24 1.24 1.23 1.22 12 1.65 1.24 1.24 1.23 1.22 14 1.52 1.23 1.23 1.23 1.22 16 1.44 1.23 1.23 1.22 1.22 PTSD 3.9 11 1.9 1.13 1.13 1.11 1.09 12 1.40 1.11 1.11 1.10 1.09 13 1.25 1.10 1.10 1.09 1.09 *Simulations are based on the cumulative percent misclassification of 12.84, 12.85, 13.07, 13.22, i.e. the cumulative percentages of discrepant 2. the questionnaire (our definition of misclassification) error can cause a large bias in the observed relative risk. increased with response wave. However, simulations For example, if misclassification is 6.5% the simulated based on the percentage of discrepancies and missing true relative risk for PTSD, for a true and observed preva- answers resulted in only a slight decrease in the relative lence of 4%, is 50% larger than that observed. If misclas- risks towards the null across response wave. A similar sification is differential, there is still likely to be bias, but small decrease was observed for the relative risks obtained it could go in either direction. Estimating the effects of dif- from the data. ferential misclassification was beyond the scope of this study. The results of this investigation suggest that, if the assumption of non-differential misclassification and con- Although there have been various attempts to quantify stant prevalence of outcome is correct, the relative risks for and correct for misclassification, for example by validat- health outcomes may be becoming slightly more biased ing survey answers using data from another source towards the null with each contact. We are aware that the [22,23], such attempts are beset with problems as not assumption of non-differential error may be unrealistic, only will there be error in the 'gold standard', but it is since the percentages of both missing answers and dis- often difficult to obtain measures that represent exactly crepancies differ according to deployment status, even the same thing. Indeed a study on a sample similar to that though the differences cancelled each other to some of the Iraq war study [24] found poor correspondence extent. This might be due to the fact that personnel between the questionnaire responses and the reports of deployed on TELIC 1 take slightly more care in answering the medical officers of the same patients. the questions, but have more doubts on how to complete them. We are also aware that using the discrepant answers Conclusion to the health questions to assess misclassification was In summary; the results suggest that multiple mailouts unusual (we could find no other reports that do so). How- were not associated with an increased bias. The estimates ever, the fact that the actual relative risks change little with changed little over wave, and nearly 90% of participants increasing response does suggest that increasing the had responded after one reminder, suggesting that the response rate using multiple follow-up attempts does not extra effort to recruit after the second mailing was proba- change the bias. bly not worthwhile. Although efforts to increase response rates are desirable in order to gain a larger sample and Of greater concern is the extent of misclassification bias. If more precise estimates, we suggest that at least equal, if misclassification is non-differential, the relative risks may not greater, efforts should be made to assess and to correct be considerably biased towards the null. The simulations for the effects of misclassification bias, for example by demonstrate how a relatively low rate of classification using validation data from another source of information, Page 7 of 8 (page number not for citation purposes) BMC Medical Research Methodology 2007, 7:51 http://www.biomedcentral.com/1471-2288/7/51 6. Stang A, Jockel KH: Studies with low response proportions may or including items within the questionnaire to be used to be less biased than studies with high response proportions. check for inconsistent answering. Am J Epidemiology 2004, 159:204-210. 7. Bross I: Misclassification in 2 × 2 tables. Biometrics 1954, 10(4):478-486. Competing interests 8. Newell DJ: Errors in interpretation of errors in epidemiology. The author(s) declare that they have no competing inter- Am J Public Health Nations Health 1962, 52(11):1925-1928. 9. Greenland S, Gustafson P: Accounting for independent nondif- ests. ferential misclassification does not increase certainty that an observed association is in the correct direction. Am J Epidemi- Authors' contributions ology 2006, 164:63-68. 10. Groves R, Eltinge J, RJAL , (Eds): Survey Nonresponse New York: AR Tate conceived and wrote the paper, and carried out all Wiley; 2002. the analyses. M Jones participated in the conduct of the 11. Helasoja V, Prattala R, Dregval L, Pudule I, Kasmel A: Late response research, the analysis, and the writing of the paper. L Hull and item nonresponse in the Finbalt Health Monitor Survey. Eur J Public Health 2002, 12(2):117-123. coordinated the study, and was involved in planning the 12. Hotopf M, Hull L, Fear NT, Browne T, Horn O, Iversen A, Jones M, study and writing the paper. NT Fear participated in the Murphy D, Bland D, Earnshaw M, Greenberg N, Hughes JH, Tate AR, Dandeker C, Rona R, Wessely S: The health of UK military per- planning of the study, made comments on the analysis sonnel who deployed to the 2003 Iraq War a cohort study. and contributed to the writing of this paper. R Rona, as a The Lancet 2006, 367:1731-1741. principal investigator, sought funding and participated in 13. Rothman KJ, Greenland S: Modern Epidemiology 2nd edition. Lippin- cott-Raven; 1998. the planning, supervision of data collection, and writing 14. Blanchard EB, JonesAlexander J, Buckley TC, Forneris CA: Psycho- the paper. S Wessely, as a principal investigator, sought metric properties of the PTSD checklist (PCL). Behaviour Research Therapy 1996, 34(8):669-673. funding, led the planning of the study and supervision of 15. Ware J, Snow K, Kosinski M, Gandek B: SF-36 Health Survey: Manual data collection, and made comments on the analysis and and Interpretation Guide The Health Institute, New England Medical writing of this paper. M Hotopf, as a principal investiga- Center Boston, Mass; 1993. 16. Goldberg D, Williams P: A users' guide to the General Health Question- tor, planned, supervised aspects of data collection, and naire Windsor: NFER-Nelson; 1988. participated in writing the paper. 17. Rona RJ, Hooper R, Greenberg N, Jones M, Wessely S: Medical downgrading, self-perception of health, and psychological symptoms in the British Armed Forces. Occupational Environ- Additional material mental Medicine 2006, 63(4):250-254. 18. Zou GY: A modified Poisson regression approach to prospec- tive studies with binary data. Am J Epidemiol 2004, 159:702-706. Additional file 1 19. R Development Core Team: R: A Language and Environment for Statis- tical Computing 2007 [http://www.R-project.org]. R Foundation for Supplementary Material: Methods for Simulations. The data provided Statistical Computing, Vienna, Austria [ISBN 3-900051-07-0] present the algorithms used for the simulations. 20. Chretien JP, Chu LK, Smith TC, Smith B, Ryan MA, the Millennium Click here for file Cohort Study Team: Demographic and occupational predictors [http://www.biomedcentral.com/content/supplementary/1471- of early response to a mailed invitation to enroll in a longitu- 2288-7-51-S1.pdf] dinal health study. BMC Medical Research Methodology 2007, 7:6. 21. Klesges RC, Williamson JE, Somes GW, Talcott GW, Lando HA, Had- dock CK: A population comparison of participants and non- participants in a health survey. Am J Public Health 1999, 89(8):1228-1231. 22. Greenland S: Variance-estimation for epidemiologic effect Acknowledgements estimates under misclassification. Statistics In Medicine 1988, We thank the UK Ministry of Defence for their cooperation; in particular 7(7):745-757. we thank the Defence Analytical Services Agency, the Veterans Policy Unit, 23. Savoca E: Sociodemographic correlates of psychiatric dis- the Armed Forces Personnel Administration Agency, and the Defence eases: accounting for misclassification in survey diagnoses of major depression, alcohol and drug use disorders. Health Serv- Medical Services Department. ices and Outcomes Research Methodology 2004, 5(17):175-191. 24. Rona RJ, Hooper R, Jones M, French C, Wessely S: Screening for References physical and psychological illness in the British Armed Forces: III: The value of a questionnaire to assist a Medical 1. Edwards P, Roberts I, Clarke M, DiGuiseppi C, Pratap S, Wentz R, Kwan I: Increasing response rates to postal questionnaires: Officer to decide who needs help. J Medical Screening 2004, 11(3):158-161. systematic review. British Medical J 2002, 324(7347):1183-1185. 2. Siemiatycki J, Campbell S: Nonresponse bias and early versus all responders in mail and telephone surveys. Am J Epidemiol 1984, Pre-publication history 120(2):291-301. The pre-publication history for this paper can be accessed 3. Kreiger N, Nishri ED: The effect of nonresponse on estimation of relative risk in a case-control study. Annals Epidemiology 1997, here: 7(3):194-199. 4. Brogger J, Bakke P, Eide GE, Gulsvik A: Contribution of follow-up of nonresponders to prevalence and risk estimates: a Norwe- http://www.biomedcentral.com/1471-2288/7/51/prepub gian respiratory health survey. Am J Epidemiology 2003, 157:558-566. 5. de Winter A, Oldehinkel AJ, Veenstra R, Brunnekreef JA, Verhulst FC, Ormel J: Evaluation of non-response bias in mental health determinants and outcomes in a large sample of pre-adoles- cents. European J Epidemiology 2005, 20(2):173-181. Page 8 of 8 (page number not for citation purposes) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png BMC Medical Research Methodology Springer Journals http://www.deepdyve.com/lp/springer-journals/how-many-mailouts-could-attempts-to-increase-the-response-rate-in-the-9IL3EXhQwB

Loading next page...

References (27)

(1988)
A users' guide to the General Health Questionnaire Windsor: NFER-Nelson
D Goldberg, P Williams (1988)
A users' guide to the General Health Questionnaire
R. Klesges, J. Williamson, G. Somes, G. Talcott, H. Lando, C. Haddock (1999)
A population comparison of participants and nonparticipants in a health survey.
American journal of public health, 89 8
I. Bross (1954)
Misclassification in 2 X 2 Tables
Biometrics, 10
(2004)
A modified Poisson regression approach to prospective studies with binary data
E. Blanchard, Jacqueline Jones-Alexander, T. Buckley, C. Forneris (1996)
Psychometric properties of the PTSD Checklist (PCL).
Behaviour research and therapy, 34 8
P Edwards, I Roberts, M Clarke, C DiGuiseppi, S Pratap, R Wentz, I Kwan (2002)
Increasing response rates to postal questionnaires: systematic review
British Medical J, 324
(2007)
R: A Language and Environment for Statistical Computing
J. Chretien, Laura Chu, T. Smith, Besa Smith, M. Ryan (2007)
Demographic and occupational predictors of early response to a mailed invitation to enroll in a longitudinal health study
BMC Medical Research Methodology, 7
J. Ware (2003)
SF-36 health survey: Manual and interpretation guide
KJ Rothman, S Greenland (1998)
Modern Epidemiology
E. Savoca (2004)
Sociodemographic correlates of psychiatric diseases: accounting for misclassification in survey diagnoses of major depression, alcohol and drug use disorders
Health Services and Outcomes Research Methodology, 5
B. Axelrod, M. Hotopf, L. Hull, N. Fear, Tess Browne, Oded Horn, A. Iversen, Margaret Jones, D. Murphy, D. Bland, Mark Earnshaw, N. Greenberg, J. Hughes, A. Tate, C. Dandeker, R. Rona, S. Wessely (2006)
The health of UK military personnel who deployed to the 2003 Iraq war: a cohort study
The Lancet, 367
R. Rona, R. Hooper, N. Greenberg, Margaret Jones, S. Wessely (2006)
Medical downgrading, self-perception of health, and psychological symptoms in the British Armed Forces
Occupational and Environmental Medicine, 63
S. Greenland, P. Gustafson (2006)
Accounting for independent nondifferential misclassification does not increase certainty that an observed association is in the correct direction.
American journal of epidemiology, 164 1
V. Helasoja, R. Prättälä, L. Dregval, I. Pudule, A. Kasmel (2002)
Late response and item nonresponse in the Finbalt Health Monitor survey.
European journal of public health, 12 2
R. Rona, R. Hooper, Margaret Jones, Claire French, S. Wessely (2004)
Screening for physical and psychological illness in the British Armed Forces: III: The value of a questionnaire to assist a Medical Officer to decide who needs help
Journal of Medical Screening, 11
J. Siemiatycki, S. Campbell (1984)
Nonresponse bias and early versus all responders in mail and telephone surveys.
American journal of epidemiology, 120 2
A. Stang, K. Jöckel (2004)
Studies with low response proportions may be less biased than studies with high response proportions.
American journal of epidemiology, 159 2
R. Wentz, I. Kwan, P. Edwards, I. Roberts, M. Clarke, C. DiGuiseppi, Sarah Pratap (2007)
questionnaires: systematic review Increasing response rates to postal
S. Greenland (1988)
Variance estimation for epidemiologic effect estimates under misclassification.
Statistics in medicine, 7 7
N. Kreiger, E. Nishri (1997)
The effect of nonresponse on estimation of relative risk in a case-control study.
Annals of epidemiology, 7 3
A. Winter, A. Oldehinkel, A. Oldehinkel, R. Veenstra, J. Brunnekreef, F. Verhulst, J. Ormel (2005)
Evaluation of non-response bias in mental health determinants and outcomes in a large sample of pre-adolescents
European Journal of Epidemiology, 20
J. Brøgger, P. Bakke, G. Eide, A. Gulsvik (2003)
Contribution of follow-up of nonresponders to prevalence and risk estimates: a Norwegian respiratory health survey.
American journal of epidemiology, 157 6
K. Rothman, S. Greenland, T. Lash (1986)
Modern Epidemiology 3rd edition
D. Newell (1962)
Errors in the interpretation of errors in epidemiology.
American journal of public health and the nation's health, 52
(2002)
Survey Nonresponse

Publisher: Springer Journals
Copyright: Copyright © 2007 by Tate et al; licensee BioMed Central Ltd.
Subject: Medicine & Public Health; Theory of Medicine/Bioethics; Statistical Theory and Methods; Statistics for Life Sciences, Medicine, Health Sciences
eISSN: 1471-2288
DOI: 10.1186/1471-2288-7-51
pmid: 18045472
Publisher site: See Article on Publisher Site

Abstract

Background: Low response and reporting errors are major concerns for survey epidemiologists. However, while nonresponse is commonly investigated, the effects of misclassification are often ignored, possibly because they are hard to quantify. We investigate both sources of bias in a recent study of the effects of deployment to the 2003 Iraq war on the health of UK military personnel, and attempt to determine whether improving response rates by multiple mailouts was associated with increased misclassification error and hence increased bias in the results. Methods: Data for 17,162 UK military personnel were used to determine factors related to response and inverse probability weights were used to assess nonresponse bias. The percentages of inconsistent and missing answers to health questions from the 10,234 responders were used as measures of misclassification in a simulation of the 'true' relative risks that would have been observed if misclassification had not been present. Simulated and observed relative risks of multiple physical symptoms and post-traumatic stress disorder (PTSD) were compared across response waves (number of contact attempts). Results: Age, rank, gender, ethnic group, enlistment type (regular/reservist) and contact address (military or civilian), but not fitness, were significantly related to response. Weighting for nonresponse had little effect on the relative risks. Of the respondents, 88% had responded by wave 2. Missing answers (total 3%) increased significantly (p < 0.001) between waves 1 and 4 from 2.4% to 7.3%, and the percentage with discrepant answers (total 14%) increased from 12.8% to 16.3% (p = 0.007). However, the adjusted relative risks decreased only slightly from 1.24 to 1.22 for multiple physical symptoms and from 1.12 to 1.09 for PTSD, and showed a similar pattern to those simulated. Conclusion: Bias due to nonresponse appears to be small in this study, and increasing the response rates had little effect on the results. Although misclassification is difficult to assess, the results suggest that bias due to reporting errors could be greater than bias caused by nonresponse. Resources might be better spent on improving and validating the data, rather than on increasing the response rate. Page 1 of 8 (page number not for citation purposes) BMC Medical Research Methodology 2007, 7:51 http://www.biomedcentral.com/1471-2288/7/51 Background Methods Poor response is a major source of concern in epidemio- Data and measures used For investigation of nonresponse bias logical surveys, and much effort is often spent on chasing up initial non-responders [1] with the implicit assump- We examined data on 17,370 personnel who had been tion that a higher response rate is associated with a more sampled for the first wave of data collection of the Iraq representative sample and hence lower bias. However, war cohort study. All personnel had been employed in the there is increasing evidence that this assumption may not military between January 18th and June 28th 2003: 7,621 always be true. Several reports have found little difference (labelled Op TELIC 1) were recorded as having been in the risk estimates obtained from the first wave of deployed in Iraq during this period and 9749 (labelled response and later waves [2-5]. In addition, a recent sim- Era). were not recorded as having been deployed on Op ulation study by Stang et al [6] suggests that if misclassifi- TELIC1. Participants were contacted by post, or were cation error increases with the number of contact asked to complete a questionnaire during military unit attempts, or the prevalence of the exposure decreases, visits made by the research team. Up to 5 further attempts then, if misclassification is non-differential (i.e. inde- were made to recruit initial non-responders. Reservist per- pendent of exposure status) the estimates after each sonnel were over-sampled by a ratio of 2:1. The study attempt will become successively biased towards the null received approval from the Ministry of Defence (Navy) hypothesis. Their results are consistent with the long- personnel research ethics committee and the King's Col- known fact that non-differential independent misclassifi- lege Hospital local research ethics committee. Full details cation error of a dichotomous outcome will always bias a of the study design, the participants and the questionnaire relative risk estimate for a binary exposure towards the are described in [12]. null value (i.e. no difference) [7-9]. 129 personnel who appeared to have never received a While there is an extensive literature on evaluating and questionnaire (i.e. all mailings were listed as return to dealing with the effects of survey nonresponse (e.g. the sender, or they had been recorded as absent during a mil- collection of articles in [10]) misclassification bias is itary unit visit) were excluded as were 42 who were mostly ignored in the survey literature particularly in rela- recorded as having died during the study and 166 (1%) tion to attempts to increase response. We could find only who refused to take part in the study. Of the remaining a few studies which reported the effect of increasing 17,162 personnel, 10,256 (60%) were listed as having response on the relative risks e.g. [2-4] and none that returned the questionnaire and were labelled 'responders'. explicitly examined whether increasing response rates increased the bias. This was surprising, since the propor- Demographic information, including age, rank, Service tion of missing information has been found to be greater and address, for individuals in our sample was provided for late responders [5,11] which suggests that late by the Defence Analytical Services Agency (DASA), who responders may take less care in answering a question- also provided a monthly fitness category for each person, naire and hence make more errors. indicating whether or not they were fit for active duty dur- ing that month, known in military jargon as "downgrad- To help redress this imbalance, we report an empirical ing status". This study is unusual in that we were able to evaluation of the effect of nonresponse bias and outcome ascertain the health of non-responders for over two years misclassification on the relative risks of two health out- following the start of the study. Fitness data were available comes which were obtained from a recent large study of for 99% of regulars and for 55% of reservists. For the pur- the health of United Kingdom (UK) military personnel pose of this study 'fit' was defined as fit to deploy at all deployed to the 2003 Iraq war [12]. In the first part of this times between May 2003 (end of TELIC 1) and August study we attempt to assess the effect of nonresponse bias 2005. Reservists were excluded from all analyses using the on the results by comparing the known characteristics of fitness data because of the large percentage with missing responders and non-responders. In the second part we data. They were, however, included in all other analyses investigate the pattern of misclassification and prevalence since reservists showed the biggest health differences of health risk factors in those who responded. We com- between TELIC 1 and Era. pare relative risks that were observed with those simulated using Stang's algorithm in an attempt to ascertain the For investigation of bias across response waves effect of reporting errors across successive waves of For this part of the analysis we used data on the response response, and whether increasing the initial response rate patterns, fitness indicators and replies to health questions of 43% to 60%, by numerous and diligent attempts at of 10,234 survey participants (labelled 'full responders') contact could possibly have been counterproductive. after excluding 18 responders who completed only the first page of the questionnaire. These respondents had been sent (or believed they had been sent) the incorrect Page 2 of 8 (page number not for citation purposes) BMC Medical Research Methodology 2007, 7:51 http://www.biomedcentral.com/1471-2288/7/51 questionnaire, i.e. a questionnaire tailored for the TELIC sender, or the person was listed as being not present at a 1 group when they had not been deployed on TELIC 1. A unit visit (e.g. wave 1 respondents are those that further 57 responders were re-assigned from the TELIC 1 responded at first contact). Two measures were used to to Era group and 22 individuals from Era to the TELIC 1 assess prevalence of the outcomes, those obtained from group after establishing that they had been wrongly clas- the questionnaires, and the fitness category for each per- sified [12]. son. Although previous evidence has shown that the cor- relation between fitness status and perceived health may The paper by Stang et al., on which we based the simula- be quite weak [17], fitness status will provide some indi- tions, considers error in the exposure variable, for exam- cation of the likely physical and mental health levels of ple alcohol consumption, and assumes that the outcome, respondents at each wave. for example liver cancer, is known. Since the exposure (deployment on TELIC 1) is known in the Iraq war study, Analysis we are concerned with misclassification of outcome, but Statistical analysis the same principles will apply [13]. We consider two Statistical analyses were carried out using Stata 9 (Stata health outcomes: multiple physical symptoms (18 or Corporation, Texas, USA), using the svy commands and more physical symptoms) and post-traumatic stress disor- sampling weights to adjust for the oversampling of reserv- der (PTSD) defined as having a score of 50 or more on the ists. Post traumatic Check List (PCL), a commonly used meas- ure of PTSD [14] We have defined outcome misclassifica- The factors which differed between responders and non- tion as "errors caused by carelessness in completing the responders were identified using the chi-squared test and questionnaire." Another possibility would have been to a multivariable logistic regression model, based on these define misclassification as under or over-reporting of mul- factors (including any significant interactions), was used tiple physical symptoms. However, since the purpose of to predict the probability of response. These probabilities the Iraq war study was to identify people who perceived were used to construct an inverse probability weight for that they had a health problem, rather than to identify each responder, which was then multiplied by the sam- those that had some quantifiable disease, the first defini- pling weight. Relative risks for the main health outcomes tion seemed more apt for this investigation. We used two were estimated with and without response weights and measures for assessing the extent of misclassification: 1. compared in order to determine the extent of nonre- the percentage of discrepant answers to a question on sponse bias. health that asked a similar question in a different way: and 2. the percentage of missing answers to PTSD, and other All relative risks were estimated using Poisson regression health questions. For the first measure respondents were [18]. The estimates of relative risks across response waves labelled 'discrepant' if they gave the same (contradictory) were adjusted for age, sex, rank, service type, and reservist answer to the two questions "I'm as healthy as anyone I status but (in contrast to [12]) we excluded any covariates know" and "I seem to get ill more easily than other peo- that might be misclassified and hence cause extra bias ple," where the choice of answers were "definitely true", [13]. The Rao and Scott second order correction was used "mostly true", "mostly false" or "definitely false" [15]. For for Chi squared tests and an extension of the Wilcoxon this measure two variables were constructed, 'discrepant rank-sum test was used to test for trends. Sample weights 1', excluded any missing values for the two questions, and were used for all analyses (and reported percentages) 'discrepant 2' labelled those with missing values for both except tests for trend and the Spearman correlation. All questions as discrepant. For the second measure, having reported p values are two-sided. missing health data was defined as falling into at least one Simulations of the following categories: 1. having at least 4 missing answers to either the PTSD or General Health Question- The equation presented on page 206 of [6] was used 1. to naire 12 [16]; 2. not answering either of the two questions simulate the 'true' (unbiased) relative risks that would described above; 3. not answering a question on general have been observed at wave 4 (for all responders) if there health. The questions on multiple physical symptoms had been no misclassification and 2. to simulate the were not included in this measure since participants were biased 'observed' relative risks for wave 1 – wave 3 that only required to respond to this question if they had at would result from these 'true' relative risks for a range of least one symptom. Full details on all the questions on 'true' prevalence rates. We compared the simulated health are provided in [12]. observed relative risks with those estimated from the data. We used the proportion of discrepant answers and miss- As in [6] wave was defined as the number of contacts that ing data as measures of misclassification (unlike [6] who were needed before a successful response, after excluding used hypothesised specificity and sensitivity). Full details any attempts where the questionnaire was returned to of the calculations are provided in the additional material Page 3 of 8 (page number not for citation purposes) BMC Medical Research Methodology 2007, 7:51 http://www.biomedcentral.com/1471-2288/7/51 (see Additional file 1). The R programming language was to have multiple physical symptoms and 3 times as likely used for all the simulations [19]. to be classified with PTSD. However, the number of symp- toms and PTSD score were only weakly correlated with fit- ness status (with Spearman correlation coefficients of - Results Comparison of responders with non-responders 0.2). The response rate to the survey was 60%. All of the factors we investigated were related to response (Table 1) except The percentage of full respondents who gave the same fitness status (p = 0.5), with 22.6% of responders and answer to the two health questions was 11.8 increasing to 22.3% of non-responders labelled as being unfit anytime 13.2 when those with missing answers to both questions between May 2003 and August 2005. were included. These percentages were the same for mail and unit visit responses. The most common pair of dis- Weighting to account for these factors (except ethnic crepant answers to the two questions: "I get ill more easily group which had 14% missing) had little effect on the rel- than other people" and "I am as healthy as anyone I ative risks. The relative risk for multiple physical symp- know" was "mostly false" (6.5%) followed by "definitely toms by deployment status was 1.19 (95% confidence false" and "mostly true" (both 2.6%), with only 0.2% interval: 1.07, 1.34) using sample weights alone and 1.19 answering both questions "definitely true". There were (1.06, 1.33) when nonresponse weights were employed. 2.7% with missing answers for at least one of the two For PTSD, the relative risks were 1.17 (0.96, 1.43) and questions and 1.7% with both. There were slightly fewer 1.15 (0.94, 1.42) respectively. discrepancies in the TELIC 1 cohort; 10.9% TELIC 1 vs. 12.6% Era (p = 0.01). This difference was mainly due to Investigation of responses the smaller percentage of TELIC 1 personnel answering 72% of the participants responded at first contact, and "definitely false" to both questions (1.7% vs. 3.3%). 88% had responded after one reminder (wave 2). 11% of These differences held after adjustment for the other only individuals were classified as having multiple physical factors found to be related to discrepancies, i.e. lower symptoms and 4% were categorized as having PTSD. rank, and Service (the Army had the highest percentage). Those labeled as unfit were two and a half times as likely However, the percentage with missing answers to both Table 1: Response rates according to demographic and other factors. Response differed significantly for all factors shown ((p < 0.001) Total N Percentage (weighted) who responded Age (years) at 01/01/05 <25 3,442 50 25–29 3,347 58 30–34 3,432 64 35–39 3,173 67 40–49 3,137 64 50–60 631 69 Gender Male 15,585 60 Female 1,577 67 Service Naval Service 2,943 57 Army 10,936 61 RAF 3,283 61 Rank Officer 2,718 70 Rank 14,444 59 Status Reservist 2,987 53 Regular 14,175 61 Deployment Era 9,627 58 TELIC 1 7,535 63 Ethnic group* White 14,142 62 Non-white 882 56 All addresses military Yes 12,705 70 No 4,457 30 Military unit visited Yes 5,252 68 No 11,910 57 Total 17,162 60 *Ethnic group was missing for 2456 personnel. Page 4 of 8 (page number not for citation purposes) BMC Medical Research Methodology 2007, 7:51 http://www.biomedcentral.com/1471-2288/7/51 questions was significantly (p = 0.02) greater for TELIC 1 each wave as the hypothesised misclassification rate. This than Era (2.1% vs. 1.5%). measure had the advantage that it represents the worse case scenario and provides an upper bound for the per- When the discrepancy variable was recalculated to include centage of true misclassification. The true relative risks those with missing data for both questions the difference that would lead to the observed relative risks at wave 4, i.e. between TELIC 1 and Era was reduced to 12.6% versus 1.22 for multiple physical symptoms and 1.09 for PTSD if 13.7% and became less significant (p = 0.11). For the pur- misclassification was 13.6% are shown in Table 4 column pose of this study, we shall assume that this measure is 3 for a range of true prevalence rates. This shows that the non-differential between TELIC 1 and Era. effect of misclassification decreases with increased true prevalence, so for example, a true prevalence of 8% for Investigation of misclassification bias across response multiple physical symptoms would mean that the true rel- waves ative risk was nearly double that observed, while a true The percentage of people giving discrepant answers to the prevalence of 16% would only increase it by 20%. health questions did not change significantly with number of contact attempts, unless those who had miss- The lowest true prevalence rate for PTSD, compatible with ing data for both of the two questions were included as 13.6% misclassification, was 11%. Since it seems unlikely discrepant, when there was a significant upward trend that the true prevalence of PTSD was over three times that (Table 2). There was also an upward trend in missing observed, we repeated the simulations using a more con- answers to any health question (Table 2). servative estimates for misclassification of 6.5%, i.e. half that of discrepant 2 (Table 4 column 4). The prevalence Since there was no apparent trend between number of range compatible with this percent was more plausible attempts at contact and fitness status, PTSD or multiple (3–9%). Even though the difference between the true and physical symptoms (Table 2) we assumed that the true observed relative risks is much smaller, the difference is and observed prevalence of both outcomes was constant still large when the true prevalence is small, most notably across wave. for PTSD at the lowest end of the compatible range (3%) which is associated with a low (and possibly implausible) Comparison of observed and simulated relative risks across response true positive rate of 0.2%. wave Table 3 shows the (adjusted) observed cumulative relative The simulated true relative risks shown in column 3 of risks of the two health outcomes by response wave, show- Table 4 were then used to calculate the cumulative relative ing that these risks are slightly higher at wave 1 than wave risks that would be expected at each wave if the percent of 4. misclassification was the same as discrepant 2 (Table 5). The simulated observed relative risks show a similar pat- Since the main aim was to assess the change in relative risk tern of changes as the actual observed relative risks across by response wave, and because we needed a non-differen- wave, with the differences across wave becoming less as tial measure, we chose to use the percentage of discrepan- the true prevalence increases. This same pattern was cies which included missing answers (discrepant 2) at observed when the percentages of missing data at each Table 2: Trends in discrepancies, PTSD data, fitness status and health outcomes by response wave (number of times a person was contacted before response) Number of contacts 1 2 3 4+ All p-value Full responders (N) 7,384 1,651 758 441 10,234 Indicators for misclassification Discrepant 1 (%) 11.8 11.1 13.1 12.3 11.8 0.5 Discrepant 2 (%) 12.8 12.9 15.8 16.3 13.2 0.007 At least 1 health question missing (%) 2.4 4.0 5.6 7.3 3.1 <0.001 Indicators for outcome prevalence Unfit (%) 22.8 21.8 22.3 21.7 22.4 0.4 PTSD 50+ (%) 3.7 4.7 4.1 3.3 3.9 0.6 Symptoms 18+ (%) 10.8 11.7 11.1 9.8 10.9 0.97 Estimates are weighted to allow for oversampling of reservists. p-values are for trend Those with missing values to either question are excluded from discrepant 1, but included as discrepancies in discrepant 2 Page 5 of 8 (page number not for citation purposes) BMC Medical Research Methodology 2007, 7:51 http://www.biomedcentral.com/1471-2288/7/51 Table 3: Cumulative observed relative risks* and 95% confidence intervals for health outcomes over response wave. Outcome Number of contacts 123 4+ (all)** Symptoms >17 1.24 1.09, 1.42 1.23 1.09, 1.39 1.21 1.08, 1.36 1.22 1.09, 1.36 PTSD 1.12 0.89, 1.43 1.15 0.93, 1.41 1.13 0.92, 1.38 1.09 0.89, 1.33 N 7384 9035 9793 10234 Estimates are weighted to allow for oversampling of reservists. *Adjusted for age, sex, rank, service type, and reservist status. **These estimates are slightly different from those reported in [12] because we use relative risks rather than odds ratios, and also because we have excluded any confounders that were likely to be misclassified which could cause extra bias. wave (which caused the increase in discrepancies by ing due to nonresponse is ignorable (i.e. that it does not wave) was used to simulate the 'observed' relative risks depend on non-measured factors) our findings seem (data not shown). plausible since they are supported by other studies, including that of Klesges et al. [21] who asked US Air Force personnel, who were required to complete a ques- Discussion We could find no evidence of nonresponse bias in the Iraq tionnaire on health, whether they would have partici- war study. In common with most surveys [20], response pated if it had not been compulsory. They found that the rate differed significantly according to age, rank (a meas- risk estimates were similar for those classed as possible ure of socio-economic status), gender and ethnic group responders compared with definite non-responders. and also according to cohort enlistment type (regular/ reservist), the address type (military or civilian) and Although difficult to quantify, the percentage of missing whether or not the unit was visited. However, the level of answers to health questions suggests that outcome mis- fitness (assessed from downgrading status) was not classification is at least 3%, and the percentage of related to response and adjustment for the factors listed responders who gave contradictory answers to two ques- above, using nonresponse weights, made little difference tions asking essentially the same thing, suggests that it to the results. Although the use of response weights to esti- could be as high as 14%. A significant upward trend in mate bias is based on the assumption that the data miss- missing answers suggests that carelessness in answering Table 4: Simulated true relative risks (RR's) for multiple physical symptoms and PTSD for a range of hypothesised true prevalence rates. The calculations are based on Stang's algorithm (using an iterative approach to obtain the true RR's). Outcome True prevalence (%) True RR Multiple symptoms: 13.22% error 6.5% error Observed prevalence 10.84% 6 3.05 1.48 Observed RR 1.22 8 2.15 1.41 10 1.83 1.35 12 1.65 1.31 14 1.52 1.28 16 1.44 1.26 PTSD Observed prevalence 3.90% 3 N/A 5.5 Observed RR 1.09 4 N/A 1.64 5 N/A 1.35 7 N/A 1.18 9 N/A 1.12 11 1.90 N/A 13 1.25 N/A 15 1.16 N/A * N/A indicates that the true prevalence for this row is not compatible with the specified error rate Page 6 of 8 (page number not for citation purposes) BMC Medical Research Methodology 2007, 7:51 http://www.biomedcentral.com/1471-2288/7/51 Table 5: Simulated true and cumulative relative risks (RR's) for TELIC 1/Era that would be observed at each wave if the misclassification rates correspond to the percentage of discrepancies* and the relative risks and the prevalence rates correspond to those observed for multiple physical symptoms and PTSD Prevalence (%) True RR Simulated cumulative 'observed' relative risks Observed True Wave 1 Wave 2 Wave 3 Wave 4+ (observed) Multiple Symptoms 10.84 6 3.05 1.26 1.26 1.24 1.22 8 2.15 1.24 1.24 1.23 1.22 10 1.83 1.24 1.24 1.23 1.22 12 1.65 1.24 1.24 1.23 1.22 14 1.52 1.23 1.23 1.23 1.22 16 1.44 1.23 1.23 1.22 1.22 PTSD 3.9 11 1.9 1.13 1.13 1.11 1.09 12 1.40 1.11 1.11 1.10 1.09 13 1.25 1.10 1.10 1.09 1.09 *Simulations are based on the cumulative percent misclassification of 12.84, 12.85, 13.07, 13.22, i.e. the cumulative percentages of discrepant 2. the questionnaire (our definition of misclassification) error can cause a large bias in the observed relative risk. increased with response wave. However, simulations For example, if misclassification is 6.5% the simulated based on the percentage of discrepancies and missing true relative risk for PTSD, for a true and observed preva- answers resulted in only a slight decrease in the relative lence of 4%, is 50% larger than that observed. If misclas- risks towards the null across response wave. A similar sification is differential, there is still likely to be bias, but small decrease was observed for the relative risks obtained it could go in either direction. Estimating the effects of dif- from the data. ferential misclassification was beyond the scope of this study. The results of this investigation suggest that, if the assumption of non-differential misclassification and con- Although there have been various attempts to quantify stant prevalence of outcome is correct, the relative risks for and correct for misclassification, for example by validat- health outcomes may be becoming slightly more biased ing survey answers using data from another source towards the null with each contact. We are aware that the [22,23], such attempts are beset with problems as not assumption of non-differential error may be unrealistic, only will there be error in the 'gold standard', but it is since the percentages of both missing answers and dis- often difficult to obtain measures that represent exactly crepancies differ according to deployment status, even the same thing. Indeed a study on a sample similar to that though the differences cancelled each other to some of the Iraq war study [24] found poor correspondence extent. This might be due to the fact that personnel between the questionnaire responses and the reports of deployed on TELIC 1 take slightly more care in answering the medical officers of the same patients. the questions, but have more doubts on how to complete them. We are also aware that using the discrepant answers Conclusion to the health questions to assess misclassification was In summary; the results suggest that multiple mailouts unusual (we could find no other reports that do so). How- were not associated with an increased bias. The estimates ever, the fact that the actual relative risks change little with changed little over wave, and nearly 90% of participants increasing response does suggest that increasing the had responded after one reminder, suggesting that the response rate using multiple follow-up attempts does not extra effort to recruit after the second mailing was proba- change the bias. bly not worthwhile. Although efforts to increase response rates are desirable in order to gain a larger sample and Of greater concern is the extent of misclassification bias. If more precise estimates, we suggest that at least equal, if misclassification is non-differential, the relative risks may not greater, efforts should be made to assess and to correct be considerably biased towards the null. The simulations for the effects of misclassification bias, for example by demonstrate how a relatively low rate of classification using validation data from another source of information, Page 7 of 8 (page number not for citation purposes) BMC Medical Research Methodology 2007, 7:51 http://www.biomedcentral.com/1471-2288/7/51 6. Stang A, Jockel KH: Studies with low response proportions may or including items within the questionnaire to be used to be less biased than studies with high response proportions. check for inconsistent answering. Am J Epidemiology 2004, 159:204-210. 7. Bross I: Misclassification in 2 × 2 tables. Biometrics 1954, 10(4):478-486. Competing interests 8. Newell DJ: Errors in interpretation of errors in epidemiology. The author(s) declare that they have no competing inter- Am J Public Health Nations Health 1962, 52(11):1925-1928. 9. Greenland S, Gustafson P: Accounting for independent nondif- ests. ferential misclassification does not increase certainty that an observed association is in the correct direction. Am J Epidemi- Authors' contributions ology 2006, 164:63-68. 10. Groves R, Eltinge J, RJAL , (Eds): Survey Nonresponse New York: AR Tate conceived and wrote the paper, and carried out all Wiley; 2002. the analyses. M Jones participated in the conduct of the 11. Helasoja V, Prattala R, Dregval L, Pudule I, Kasmel A: Late response research, the analysis, and the writing of the paper. L Hull and item nonresponse in the Finbalt Health Monitor Survey. Eur J Public Health 2002, 12(2):117-123. coordinated the study, and was involved in planning the 12. Hotopf M, Hull L, Fear NT, Browne T, Horn O, Iversen A, Jones M, study and writing the paper. NT Fear participated in the Murphy D, Bland D, Earnshaw M, Greenberg N, Hughes JH, Tate AR, Dandeker C, Rona R, Wessely S: The health of UK military per- planning of the study, made comments on the analysis sonnel who deployed to the 2003 Iraq War a cohort study. and contributed to the writing of this paper. R Rona, as a The Lancet 2006, 367:1731-1741. principal investigator, sought funding and participated in 13. Rothman KJ, Greenland S: Modern Epidemiology 2nd edition. Lippin- cott-Raven; 1998. the planning, supervision of data collection, and writing 14. Blanchard EB, JonesAlexander J, Buckley TC, Forneris CA: Psycho- the paper. S Wessely, as a principal investigator, sought metric properties of the PTSD checklist (PCL). Behaviour Research Therapy 1996, 34(8):669-673. funding, led the planning of the study and supervision of 15. Ware J, Snow K, Kosinski M, Gandek B: SF-36 Health Survey: Manual data collection, and made comments on the analysis and and Interpretation Guide The Health Institute, New England Medical writing of this paper. M Hotopf, as a principal investiga- Center Boston, Mass; 1993. 16. Goldberg D, Williams P: A users' guide to the General Health Question- tor, planned, supervised aspects of data collection, and naire Windsor: NFER-Nelson; 1988. participated in writing the paper. 17. Rona RJ, Hooper R, Greenberg N, Jones M, Wessely S: Medical downgrading, self-perception of health, and psychological symptoms in the British Armed Forces. Occupational Environ- Additional material mental Medicine 2006, 63(4):250-254. 18. Zou GY: A modified Poisson regression approach to prospec- tive studies with binary data. Am J Epidemiol 2004, 159:702-706. Additional file 1 19. R Development Core Team: R: A Language and Environment for Statis- tical Computing 2007 [http://www.R-project.org]. R Foundation for Supplementary Material: Methods for Simulations. The data provided Statistical Computing, Vienna, Austria [ISBN 3-900051-07-0] present the algorithms used for the simulations. 20. Chretien JP, Chu LK, Smith TC, Smith B, Ryan MA, the Millennium Click here for file Cohort Study Team: Demographic and occupational predictors [http://www.biomedcentral.com/content/supplementary/1471- of early response to a mailed invitation to enroll in a longitu- 2288-7-51-S1.pdf] dinal health study. BMC Medical Research Methodology 2007, 7:6. 21. Klesges RC, Williamson JE, Somes GW, Talcott GW, Lando HA, Had- dock CK: A population comparison of participants and non- participants in a health survey. Am J Public Health 1999, 89(8):1228-1231. 22. Greenland S: Variance-estimation for epidemiologic effect Acknowledgements estimates under misclassification. Statistics In Medicine 1988, We thank the UK Ministry of Defence for their cooperation; in particular 7(7):745-757. we thank the Defence Analytical Services Agency, the Veterans Policy Unit, 23. Savoca E: Sociodemographic correlates of psychiatric dis- the Armed Forces Personnel Administration Agency, and the Defence eases: accounting for misclassification in survey diagnoses of major depression, alcohol and drug use disorders. Health Serv- Medical Services Department. ices and Outcomes Research Methodology 2004, 5(17):175-191. 24. Rona RJ, Hooper R, Jones M, French C, Wessely S: Screening for References physical and psychological illness in the British Armed Forces: III: The value of a questionnaire to assist a Medical 1. Edwards P, Roberts I, Clarke M, DiGuiseppi C, Pratap S, Wentz R, Kwan I: Increasing response rates to postal questionnaires: Officer to decide who needs help. J Medical Screening 2004, 11(3):158-161. systematic review. British Medical J 2002, 324(7347):1183-1185. 2. Siemiatycki J, Campbell S: Nonresponse bias and early versus all responders in mail and telephone surveys. Am J Epidemiol 1984, Pre-publication history 120(2):291-301. The pre-publication history for this paper can be accessed 3. Kreiger N, Nishri ED: The effect of nonresponse on estimation of relative risk in a case-control study. Annals Epidemiology 1997, here: 7(3):194-199. 4. Brogger J, Bakke P, Eide GE, Gulsvik A: Contribution of follow-up of nonresponders to prevalence and risk estimates: a Norwe- http://www.biomedcentral.com/1471-2288/7/51/prepub gian respiratory health survey. Am J Epidemiology 2003, 157:558-566. 5. de Winter A, Oldehinkel AJ, Veenstra R, Brunnekreef JA, Verhulst FC, Ormel J: Evaluation of non-response bias in mental health determinants and outcomes in a large sample of pre-adoles- cents. European J Epidemiology 2005, 20(2):173-181. Page 8 of 8 (page number not for citation purposes)

Journal

BMC Medical Research Methodology – Springer Journals

Published: Nov 28, 2007

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

How many mailouts? Could attempts to increase the response rate in the Iraq war cohort study be counterproductive?

How many mailouts? Could attempts to increase the response rate in the Iraq war cohort study be counterproductive?

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

How many mailouts? Could attempts to increase the response rate in the Iraq war cohort study be counterproductive?

How many mailouts? Could attempts to increase the response rate in the Iraq war cohort study be counterproductive?

References (27)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies