In 1965, Sir Austin Bradford Hill published nine "viewpoints" to help determine if observed epidemiologic associations are causal. Since then, the "Bradford Hill Criteria" have become the most frequently cited framework for causal inference in epidemiologic studies. However, when Hill published his causal guidelines-just 12 years after the double-helix model for DNA was first suggested and 25 years before the Human Genome Project began-disease causation was understood on a more elementary level than it is today. Advancements in genetics, molecular biology, toxicology, exposure science, and statistics have increased our analytical capabilities for exploring potential cause-and-effect relationships, and have resulted in a greater understanding of the complexity behind human disease onset and progression. These additional tools for causal inference necessitate a re-evaluation of how each Bradford Hill criterion should be interpreted when considering a variety of data types beyond classic epidemiology studies. Herein, we explore the implications of data integration on the interpretation and application of the criteria. Using examples of recently discovered exposure-response associations in human disease, we discuss novel ways by which researchers can apply and interpret the Bradford Hill criteria when considering data gathered using modern molecular techniques, such as epigenetics, biomarkers, mechanistic toxicology, and genotoxicology.
Epidemiological studies often require measures of socio-economic position (SEP). The application of principal components analysis (PCA) to data on asset-ownership is one popular approach to household SEP measurement. Proponents suggest that the approach provides a rational method for weighting asset data in a single indicator, captures the most important aspect of SEP for health studies, and is based on data that are readily available and/or simple to collect. However, the use of PCA on asset data may not be the best approach to SEP measurement. There remains concern that this approach can obscure the meaning of the final index and is statistically inappropriate for use with discrete data. In addition, the choice of assets to include and the level of agreement between wealth indices and more conventional measures of SEP such as consumption expenditure remain unclear. We discuss these issues, illustrating our examples with data from the Malawi Integrated Household Survey 2004-5. Wealth indices were constructed using the assets on which data are collected within Demographic and Health Surveys. Indices were constructed using five weighting methods: PCA, PCA using dichotomised versions of categorical variables, equal weights, weights equal to the inverse of the proportion of households owning the item, and Multiple Correspondence Analysis. Agreement between indices was assessed. Indices were compared with per capita consumption expenditure, and the difference in agreement assessed when different methods were used to adjust consumption expenditure for household size and composition. All indices demonstrated similarly modest agreement with consumption expenditure. The indices constructed using dichotomised data showed strong agreement with each other, as did the indices constructed using categorical data. Agreement was lower between indices using data coded in different ways. The level of agreement between wealth indices and consumption expenditure did not differ when different consumption equivalence scales were applied. This study questions the appropriateness of wealth indices as proxies for consumption expenditure. The choice of data included had a greater influence on the wealth index than the method used to weight the data. Despite the limitations of PCA, alternative methods also all had disadvantages.
Empirical findings show that morbidity and mortality risks of migrants can differ considerably from those of populations in the host countries. However, while several explanatory models have been developed, most migrant studies still do not consider explicitly the situation of migrants before migration. Here, we discuss an extended approach to understand migrant health comprising a life course epidemiology perspective.The incorporation of a life course perspective into a conceptual framework of migrant health enables the consideration of risk factors and disease outcomes over the different life phases of migrants, which is necessary to understand the health situation of migrants and their offspring. Comparison populations need to be carefully selected depending on the study questions under consideration within the life course framework.Migrant health research will benefit from an approach using a life course perspective. A critique of the theoretical foundations of migrant health research is essential for further developing both the theoretical framework of migrant health and related empirical studies.
ABSTRACTS BACKGROUND: Individual-level data pooling of large population-based studies across research centres in international research projects faces many hurdles. The BioSHaRE (Biobank Standardisation and Harmonisation for Research Excellence in the European Union) project aims to address these issues by building a collaborative group of investigators and developing tools for data harmonization, database integration and federated data analyses. METHODS: Eight population-based studies in six European countries were recruited to participate in the BioSHaRE project. Through workshops, teleconferences and electronic communications, participating investigators identified a set of 96 variables targeted for harmonization to answer research questions of interest. Using each study's questionnaires, standard operating procedures, and data dictionaries, harmonization potential was assessed. Whenever harmonization was deemed possible, processing algorithms were developed and implemented in an open-source software infrastructure to transform study-specific data into the target (i.e. harmonized) format. Harmonized datasets located on server in each research centres across Europe were interconnected through a federated database system to perform statistical analysis. RESULTS: Retrospective harmonization led to the generation of common format variables for 73% of matches considered (96 targeted variables across 8 studies). Authenticated investigators can now perform complex statistical analyses of harmonized datasets stored on distributed servers without actually sharing individual-level data using the DataSHIELD method. CONCLUSION: New Internet-based networking technologies and database management systems are providing the means to support collaborative, multi-center research in an efficient and secure manner. The results from this pilot project show that, given a strong collaborative relationship between participating studies, it is possible to seamlessly co-analyse internationally harmonized research databases while allowing each study to retain full control over individual-level data. We encourage additional collaborative research networks in epidemiology, public health, and the social sciences to make use of the open source tools presented herein.
Systematic reviews based on the critical appraisal of observational and analytic studies on HIV prevalence and risk factors for HIV transmission among men having sex with men are very useful for health care decisions and planning. Such appraisal is particularly difficult, however, as the quality assessment tools available for use with observational and analytic studies are poorly established. We reviewed the existing quality assessment tools for systematic reviews of observational studies and developed a concise quality assessment checklist to help standardise decisions regarding the quality of studies, with careful consideration of issues such as external and internal validity. A pilot version of the checklist was developed based on epidemiological principles, reviews of study designs, and existing checklists for the assessment of observational studies. The Quality Assessment Tool for Systematic Reviews of Observational Studies (QATSO) Score consists of five items: External validity (1 item), reporting (2 items), bias (1 item) and confounding factors (1 item). Expert opinions were sought and it was tested on manuscripts that fulfil the inclusion criteria of a systematic review. Like all assessment scales, QATSO may oversimplify and generalise information yet it is inclusive, simple and practical to use, and allows comparability between papers. A specific tool that allows researchers to appraise and guide study quality of observational studies is developed and can be modified for similar studies in the future.
Background Epidemiology has contributed in many ways to identifying various risk factors for disease and to promoting population health. However, there is a continuing debate about the ability of epidemiology not only to describe, but also to provide results which can be better translated into public health practice. It has been proposed that participatory research approaches be applied to epidemiology as a way to bridge this gap between description and action. A systematic account of what constitutes participatory epidemiology practice has, however, been lacking. Methods A scoping review was carried out focused on the question of what constitutes participatory approaches to epidemiology for the purpose of demonstrating their potential for advancing epidemiologic research. Relevant databases were searched, including both the published and non-published (grey) literature. The 102 identified sources were analyzed in terms of comparing common epidemiologic approaches to participatory counterparts regarding central aspects of the research process. Exemplary studies applying participatory approaches were examined more closely. Results A highly diverse, interdisciplinary body of literature was synthesized, resulting in a framework comprised of seven aspects of the research process: research goal, research question, population, context, data synthesis, research management, and dissemination of findings. The framework specifies how participatory approaches not only differ from, but also how they can enhance common approaches in epidemiology. Finally, recommendations for the further development of participatory approaches are given. These include: enhancing data collection, data analysis, and data validation; advancing capacity building for research at the local level; and developing data synthesis. Conclusion The proposed framework provides a basis for systematically developing the emergent science of participatory epidemiology.
Doc number: 7 Abstract: The 21st century has seen the rise of Internet-based participatory surveillance systems for infectious diseases. These systems capture voluntarily submitted symptom data from the general public and can aggregate and communicate that data in near real-time. We reviewed participatory surveillance systems currently running in 13 different countries. These systems have a growing evidence base showing a high degree of accuracy and increased sensitivity and timeliness relative to traditional healthcare-based systems. They have also proven useful for assessing risk factors, vaccine effectiveness, and patterns of healthcare utilization while being less expensive, more flexible, and more scalable than traditional systems. Nonetheless, they present important challenges including biases associated with the population that chooses to participate, difficulty in adjusting for confounders, and limited specificity because of reliance only on syndromic definitions of disease limits. Overall, participatory disease surveillance data provides unique disease information that is not available through traditional surveillance sources.
Vulnerability has become a key concept in emergency response research and is being critically discussed across several disciplines. While the concept has been adopted into global health, its conceptualisation and especially its role in the conceptualisation of risk and therefore in risk assessments is still lacking. This paper uses the risk concept pioneered in hazard research that assumes that risk is a function of the interaction between hazard and vulnerability rather than the neo-liberal conceptualisation of vulnerability and vulnerable groups and communities. By seeking to modify the original pressure and release model, the paper unpacks the representation or lack of representation of vulnerability in risk assessments in global health emergency response and discusses what benefits can be gained from making the underlying assumptions about vulnerability, which are present whether vulnerability is sufficiently conceptualised and consciously included or not, explicit. The paper argues that discussions about risk in global health emergencies should be better grounded in a theoretical understanding of the concept of vulnerability and that this theoretical understanding needs to inform risk assessments which implicitly used the concept of vulnerability. By using the hazard research approach to vulnerability, it offers an alternative narrative with new perspectives on the value and limits of vulnerability as a concept and a tool.
The relationship between collapsibility and confounding has been subject to an extensive and ongoing discussion in the methodological literature. We discuss two subtly different definitions of collapsibility, and show that by considering causal effect measures based on counterfactual variables (rather than measures of association based on observed variables) it is possible to separate out the component of non-collapsibility which is due to the mathematical properties of the effect measure, from the components that are due to structural bias such as confounding. We provide new weights such that the causal risk ratio is collapsible over arbitrary baseline covariates. In the absence of confounding, these weights may be used for standardization of the risk ratio.
Background Increasing availability of the Internet allows using only online data collection for more epidemiological studies. We compare response patterns in a population-based health survey using two survey designs: mixed-mode (choice between paper-and-pencil and online questionnaires) and online-only design (without choice). Methods We used data from a longitudinal panel, the Hygiene and Behaviour Infectious Diseases Study (HaBIDS), conducted in 2014/2015 in four regions in Lower Saxony, Germany. Individuals were recruited using address-based probability sampling. In two regions, individuals could choose between paper-and-pencil and online questionnaires. In the other two regions, individuals were offered online-only participation. We compared sociodemographic characteristics of respondents who filled in all panel questionnaires between the mixed-mode group (n = 1110) and the online-only group (n = 482). Using 134 items, we performed multinomial logistic regression to compare responses between survey designs in terms of type (missing, “do not know” or valid response) and ordinal regression to compare responses in terms of content. We applied the false discovery rates (FDR) to control for multiple testing and investigated effects of adjusting for sociodemographic characteristic. For validation of the differential response patterns between mixed-mode and online-only, we compared the response patterns between paper and online mode among the respondents in the mixed-mode group in one region (n = 786). Results Respondents in the online-only group were older than those in the mixed-mode group, but both groups did not differ regarding sex or education. Type of response did not differ between the online-only and the mixed-mode group. Survey design was associated with different content of response in 18 of the 134 investigated items; which decreased to 11 after adjusting for sociodemographic variables. In the validation within the mixed-mode, only two of those were among the 11 significantly different items. The probability of observing by chance the same two or more significant differences in this setting was 22%. Conclusions We found similar response patterns in both survey designs with only few items being answered differently, likely attributable to chance. Our study supports the equivalence of the compared survey designs and suggests that, in the studied setting, using online-only design does not cause strong distortion of the results.
[...]studying this internal ecosystem independently and relative to external factors in the obesity epidemic is needed, and will help us understand the nature of the obesity epidemic and the novel role of the gut microbiome. [...]this review provides a rationale for employing a global epidemiologic model for studying the associations between the gut microbiota and the development of obesity, which allows capturing geographical diverse external environmental factors.
Dengue is the most important arthropod-borne viral disease of public health significance. Compared with nine reporting countries in the 1950s, today the geographic distribution includes more than 100 countries worldwide. Many of these had not reported dengue for 20 or more years and several have no known history of the disease. The World Health Organization estimates that more than 2.5 billion people are at risk of dengue infection. First recognised in the 1950s, it has become a leading cause of child mortality in several Asian and South American countries. This paper reviews the changing epidemiology of the disease, focusing on host and societal factors and drawing on national and regional journals as well as international publications. It does not include vaccine and vector issues. We have selected areas where the literature raises challenges to prevailing views and those that are key for improved service delivery in poor countries. Shifts in modal age, rural spread, and social and biological determinants of race- and sex-related susceptibility have major implications for health services. Behavioural risk factors, individual determinants of outcome and leading indicators of severe illness are poorly understood, compromising effectiveness of control programmes. Early detection and case management practices were noted as a critical factor for survival. Inadequacy of sound statistical methods compromised conclusions on case fatality or disease-specific mortality rates, especially since the data were often based on hospitalised patients who actively sought care in tertiary centres. Well- targeted operational research, such as population-based epidemiological studies with clear operational objectives, is urgently needed to make progress in control and prevention.
The instrumental variable method has been employed within economics to infer causality in the presence of unmeasured confounding. Emphasising the parallels to randomisation may increase understanding of the underlying assumptions within epidemiology. An instrument is a variable that predicts exposure, but conditional on exposure shows no independent association with the outcome. The random assignment in trials is an example of what would be expected to be an ideal instrument, but instruments can also be found in observational settings with a naturally varying phenomenon e.g. geographical variation, physical distance to facility or physician's preference. The fourth identifying assumption has received less attention, but is essential for the generalisability of estimated effects. The instrument identifies the group of in which exposure is pseudo-randomly assigned leading to exchangeability with regard to unmeasured confounders. Underlying assumptions can only partially be tested empirically and require subject-matter knowledge. Future studies employing instruments should carefully seek to validate all four assumptions, possibly drawing on parallels to randomisation.
Doc number: 5 Abstract Background: The study of non-atopic asthma/wheeze in children separately from atopic asthma is relatively recent. Studies have focused on single risk factors and had inconsistent findings. Objective: To review evidence on factors associated with non-atopic asthma/wheeze in children and adolescents. Methods: A review of studies of risk factors for non-atopic asthma/wheeze which had a non-asthmatic comparison group, and assessed atopy by skin-prick test or allergen-specific IgE. Results: Studies of non-atopic asthma/wheeze used a wide diversity of definitions of asthma/wheeze, comparison groups and methods to assess atopy. Among 30 risk factors evaluated in the 43 studies only 3 (family history of asthma/rhinitis/eczema, dampness/mold in the household, and lower respiratory tract infections in childhood) showed consistent associations with non-atopic asthma/wheeze. No or limited period of breastfeeding was less consistently associated with non-atopic asthma/wheeze. The few studies examining the effects of overweight/obesity and psychological/social factors showed consistent associations. We used a novel graphical presentation of different risk factors for non-atopic asthma/wheeze, allowing a more complete perception of the complex pattern of effects. Conclusions: More research using standardized methodology is needed on the causes of non-atopic asthma.
Doc number: 1 Abstract: Methods of diagrammatic modelling have been greatly developed in the past two decades. Outside the context of infectious diseases, systematic use of diagrams in epidemiology has been mainly confined to the analysis of a single link: that between a disease outcome and its proximal determinant(s). Transmitted causes ("causes of causes") tend not to be systematically analysed. The infectious disease epidemiology modelling tradition models the human population in its environment, typically with the exposure-health relationship and the determinants of exposure being considered at individual and group/ecological levels, respectively. Some properties of the resulting systems are quite general, and are seen in unrelated contexts such as biochemical pathways. Confining analysis to a single link misses the opportunity to discover such properties. The structure of a causal diagram is derived from knowledge about how the world works, as well as from statistical evidence. A single diagram can be used to characterise a whole research area, not just a single analysis - although this depends on the degree of consistency of the causal relationships between different populations - and can therefore be used to integrate multiple datasets. Additional advantages of system-wide models include: the use of instrumental variables - now emerging as an important technique in epidemiology in the context of mendelian randomisation, but under-used in the exploitation of "natural experiments"; the explicit use of change models, which have advantages with respect to inferring causation; and in the detection and elucidation of feedback. [PUBLICATION ABSTRACT
Background In many studies, it is of interest to identify population subgroups that are relatively homogeneous with respect to an outcome. The nature of these subgroups can provide insight into effect mechanisms and suggest targets for tailored interventions. However, identifying relevant subgroups can be challenging with standard statistical methods. Main text We review the literature on decision trees, a family of techniques for partitioning the population, on the basis of covariates, into distinct subgroups who share similar values of an outcome variable. We compare two decision tree methods, the popular Classification and Regression tree (CART) technique and the newer Conditional Inference tree (CTree) technique, assessing their performance in a simulation study and using data from the Box Lunch Study, a randomized controlled trial of a portion size intervention. Both CART and CTree identify homogeneous population subgroups and offer improved prediction accuracy relative to regression-based approaches when subgroups are truly present in the data. An important distinction between CART and CTree is that the latter uses a formal statistical hypothesis testing framework in building decision trees, which simplifies the process of identifying and interpreting the final tree model. We also introduce a novel way to visualize the subgroups defined by decision trees. Our novel graphical visualization provides a more scientifically meaningful characterization of the subgroups identified by decision trees. Conclusions Decision trees are a useful tool for identifying homogeneous subgroups defined by combinations of individual characteristics. While all decision tree techniques generate subgroups, we advocate the use of the newer CTree technique due to its simplicity and ease of interpretation.
BackgroundThe use of meta-analysis to aggregate multiple studies has increased dramatically over the last 30 years. For meta-analysis of homogeneous data where the effect sizes for the studies contributing to the meta-analysis differ only by statistical error, the Mantel–Haenszel technique has typically been utilized. If homogeneity cannot be assumed or established, the most popular technique is the inverse-variance DerSimonian–Laird technique. However, both of these techniques are based on large sample, asymptotic assumptions and are, at best, an approximation especially when the number of cases observed in any cell of the corresponding contingency tables is small.ResultsThis paper develops an exact, non-parametric test based on a maximum likelihood test statistic as an alternative to the asymptotic techniques. Further, the test can be used across a wide range of heterogeneity. Monte Carlo simulations show that for the homogeneous case, the ML-NP-EXACT technique to be generally more powerful than the DerSimonian–Laird inverse-variance technique for realistic, smaller values of disease probability, and across a large range of odds ratios, number of contributing studies, and sample size. Possibly most important, for large values of heterogeneity, the pre-specified level of Type I Error is much better maintained by the ML-NP-EXACT technique relative to the DerSimonian–Laird technique. A fully tested implementation in the R statistical language is freely available from the author.ConclusionsThis research has developed an exact test for the meta-analysis of dichotomous data. The ML-NP-EXACT technique was strongly superior to the DerSimonian–Laird technique in maintaining a pre-specified level of Type I Error. As shown, the DerSimonian–Laird technique demonstrated many large violations of this level. Given the various biases towards finding statistical significance prevalent in epidemiology today, a strong focus on maintaining a pre-specified level of Type I Error would seem critical.
Background Representative surveys collecting weight, height and MUAC are used to estimate the prevalence of acute malnutrition. The results are then used to assess the scale of malnutrition in a population and type of nutritional intervention required. There have been changes in methodology over recent decades; the objective of this study was to determine if these have resulted in higher quality surveys. Methods In order to examine the change in reliability of such surveys we have analysed the statistical distributions of the derived anthropometric parameters from 1843 surveys conducted by 19 agencies between 1986 and 2015. Results With the introduction of standardised guidelines and software by 2003 and their more general application from 2007 the mean standard deviation, kurtosis and skewness of the parameters used to assess nutritional status have each moved to now approximate the distribution of the WHO standards when the exclusion of outliers from analysis is based upon SMART flagging procedure. Where WHO flags, that only exclude data incompatible with life, are used the quality of anthropometric surveys has improved and the results now approach those seen with SMART flags and the WHO standards distribution. Agencies vary in their uptake and adherence to standard guidelines. Those agencies that fully implement the guidelines achieve the most consistently reliable results. Conclusions Standard methods should be universally used to produce reliable data and tests of data quality and SMART type flagging procedures should be applied and reported to ensure that the data are credible and therefore inform appropriate intervention. Use of SMART guidelines has coincided with reliable anthropometric data since 2007.
BackgroundParticipation in epidemiologic studies has declined, raising concerns about selection bias. While estimates derived from epidemiologic studies have been shown to be robust under a wide range of scenarios, additional empiric study is needed. The Georgia Study to Explore Early Development (GA SEED), a population-based case–control study of risk factors for autism spectrum disorder (ASD), provided an opportunity to explore factors associated with non-participation and potential impacts of non-participation on association studies.MethodsGA SEED recruited preschool-aged children residing in metropolitan-Atlanta during 2007–2012. Children with ASD were identified from multiple schools and healthcare providers serving children with disabilities; children from the general population (POP) were randomly sampled from birth records. Recruitment was via mailed invitation letter with follow-up phone calls. Eligibility criteria included birth and current residence in study area and an English-speaking caregiver. Many children identified for potential inclusion could not be contacted. We used data from birth certificates to examine demographic and perinatal factors associated with participation in GA SEED and completion of the data collection protocol. We also compared ASD-risk factor associations for the final sample of children who completed the study with the initial sample of all likely ASD and POP children invited to potentially participate in the study, had they been eligible. Finally, we derived post-stratification sampling weights for participants who completed the study and compared weighted and unweighted associations between ASD and two factors collected via post-enrollment maternal interview: infertility and reproductive stoppage.ResultsMaternal age and education were independently associated with participation in the POP group. Maternal education was independently associated with participation in the ASD group. Numerous other demographic and perinatal factors were not associated with participation. Moreover, unadjusted and adjusted odds ratios for associations between ASD and several demographic and perinatal factors were similar between the final sample of study completers and the total invited sample. Odds ratios for associations between ASD and infertility and reproductive stoppage were also similar in unweighted and weighted analyses of the study completion sample.ConclusionsThese findings suggest that effect estimates from SEED risk factor analyses, particularly those of non-demographic factors, are likely robust.
Postmenstrual and/or gestational age-corrected age (CA) is required to apply child growth standards to children born preterm (< 37 weeks gestational age). Yet, CA is rarely used in epidemiologic studies in low- and middle-income countries (LMICs), which may bias population estimates of childhood undernutrition. To evaluate the effect of accounting for GA in the application of growth standards, we used GA-specific standards at birth (INTERGROWTH-21st newborn size standards) in conjunction with CA for preterm-born children in the application of World Health Organization Child Growth Standards postnatally (referred to as 'CA' strategy) versus postnatal age for all children, to estimate mean length-for-age (LAZ) and weight-for-age (WAZ) scores at 0, 3, 12, 24, and 48-months of age in the 2004 Pelotas (Brazil) Birth Cohort. At birth (n = 4066), mean LAZ was higher and the prevalence of stunting (LAZ < -2) was lower using CA versus postnatal age (mean ± SD): - 0.36 ± 1.19 versus - 0.67 ± 1.32; and 8.3 versus 11.6%, respectively. Odds ratio (OR) and population attributable risk (PAR) of stunting due to preterm birth were attenuated and changed inferences using CA versus postnatal age at birth [OR, 95% confidence interval (CI): 1.32 (95% CI 0.95, 1.82) vs 14.7 (95% CI 11.7, 18.4); PAR 3.1 vs 42.9%]; differences in inferences persisted at 3-months. At 12, 24, and 48-months, preterm birth was associated with stunting, but ORs/PARs remained attenuated using CA compared to postnatal age. Findings were similar for weight-for-age scores. Population-based epidemiologic studies in LMICs in which GA is unused or unavailable may overestimate the prevalence of early childhood undernutrition and inflate the fraction of undernutrition attributable to preterm birth.