Recent Past Seminars

The Bio3 Seminar Series meets every second and fourth Friday of the month, during the academic year. Students are expected to attend the bi-weekly seminar series.

Seminars from Spring 2019

Friday, January 11, 2019 at 10:00 am

Faming Liang, Ph.D.
Professor, Department of Statistics, Purdue University

Abstract: The stochastic gradient Markov chain Monte Carlo (SGMCMC) algorithms, such as stochastic gradient Langevin dynamics and stochastic gradient Hamilton Monte Carlo, have recently received much attention in Bayesian computing for large-scale data for which the sample size can be very large, or the dimension can be very high, or both. However, these algorithms can only be applied to a small class of problems for which the parameter space has a fixed dimension and the log-posterior density is differentiable with respect to the parameters. We propose a class of extended SGMCMC algorithms which, by introducing appropriate latent variables and utilizing Fisher’s identity, can be applied to more general large-scale Bayesian computing problems, such as those involving dimension jumping and missing data. For a large-scale dataset with sample size N and dimension p, the proposed algorithms can achieve a computational complexity of O(N^{1+\epsilon} p^{1-\epsilon’}) for some small constants \epsilon and \epsilon’, which is quite comparable with the computational complexity O(N p^{1-\epsilon’}) achieved in general by the stochastic gradient descent (SGD) algorithm. The proposed algorithms are illustrated using high-dimensional variable selection, sparse deep learning with large-scale data, and a large-scale missing data problem. The numerical results show that the proposed algorithms have a significant computational advantage over traditional MCMC algorithms and can be highly scalable when mini-batch samples are used in simulations. Compared to frequentist methods, they can produce more accurate variable selection and prediction results, while exhibiting similar CPU costs when the dataset contains a large number of samples. The proposed algorithms have much alleviated the pain of Bayesian methods in large-scale computing.

Location: Warwick Evans Conference Room, Building D
4000 Reservoir Rd. NW, Washington, DC 20057

Friday, January 25, 2019 at 10:00 am

Hui Quan, Ph.D.
Associate VP & Global Head, Methodology Group, Department of Biostatistics and Programming, Sanofi

Abstract: Extensive research has been conducted in the Multi-Regional Clinical Trial (MRCT) area. To effectively apply an appropriate approach to a MRCT, we need to synthesize and understand the features of different approaches. In this presentation, numerical and real data examples are used to illustrate considerations regarding design, conduct, analysis and interpretation of result of MRCTs. We compare different models as well as their corresponding interpretations of the trial results. We highlight the importance of paying special attention to trial monitoring and conduct to prevent potential issues associated with the final trial results. Besides evaluating the overall treatment effect for the entire MRCT, we also consider other key analyses including quantification of regional treatment effects within a MRCT and the assessment of consistency of these regional treatment effects.

Location: Warwick Evans Conference Room, Building D
4000 Reservoir Rd. NW, Washington, DC 20057

Friday, February 8, 2019 at 10:00 am

Yanxun Xu, Ph.D.
Assistant Professor, Department of Applied Mathematics & Statistics, Whiting School of Engineering, Johns Hopkins University

Abstract: Developing targeted therapies based on patients’ baseline characteristics and genomic profiles such as biomarkers has gained growing interests in recent years. Depending on patients’ clinical characteristics, the expression of specific biomarkers or their combinations, different patient subgroups could respond differently to the same treatment. An ideal design, especially at the proof of concept stage, should search for such subgroups and make dynamic adaptation as the trial goes on. When no prior knowledge is available on whether the treatment works on the all-comer population or only works on the subgroup defined by one biomarker or several biomarkers, it’s necessary to estimate the subgroup effect adaptively based on the response outcomes and biomarker profiles from all the treated subjects at interim analyses. To address this problem, we propose an Adaptive Subgroup-Identification Enrichment Design, ASIED, to simultaneously search for predictive biomarkers, identify the subgroups with differential treatment effects, and modify study entry criteria at interim analyses when justified. More importantly, we construct robust quantitative decision-making rules for population enrichment when the interim outcomes are heterogeneous. Through extensive simulations, the ASIED is demonstrated to achieve desirable operating char- acteristics and compare favorably against the alternatives.  

Location: Warwick Evans Conference Room, Building D
4000 Reservoir Rd. NW, Washington, DC 20057

Friday, February 22, 2019 at 10:00 am

Steven G. Heeringa, Ph.D.
Senior Research Scientist and Associate Director, Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor

Abstract: Neuroscience has a strong research tradition that employs experimental and observational studies in laboratory settings and controlled testing and evaluation in both clinical, educational and volunteer populations. In the past two decades, there has been increasing interest in conducting population-scale epidemiological studies of early age brain development and functioning as well as later age neurological functioning including cognitive impairment, dementias and Alzheimer’s disease. The data collected in these population-based studies is not restricted to observations on neurological systems and functioning but is collected in parallel with a wide array of information on participants’ life events, medical history, social and environmental exposures, genetics and genomics. This rich array of observational data has the potential to greatly advance our understanding of how complex neurological systems develop, are modified by internal or external factors or otherwise change over the life course. The growing field of epidemiological research also presents many challenging problems in design, measurement, data integration and analysis that those of us trained in biostatistics, bioinformatics and biomathematics will be called on to help to solve.

This presentation will use two cases studies to illustrate the nature of the statistical challenges in conducting population-scale neuroscientific research, describe current best practices and outline opportunities for future research. The first case study will be the Adolescent Brain Cognitive Development project (ABCD,, a 12-year longitudinal investigation of brain morphology and functional development in U.S. adolescents and teens. The second case study will focus on the challenges in design, measurement and analysis faced in special supplemental investigations of dementia and Alzheimer’s disease conducted under the auspices of the larger Health and Retirement Study (HRS, Each case study review will include a description of the specific study challenges and current solutions. The major aim of this presentation is to increase awareness of these emerging lines of research and to promote interest on the part of the next generation of statisticians and data scientists who will be called upon to advance the various methodologies that will be required to better understand complex neurological systems and how they relate to our individual attributes and the world around us.

Location: Warwick Evans Conference Room, Building D
4000 Reservoir Rd. NW, Washington, DC 20057

Friday, April 12, 2019 at 10:00 am

Abera Wouhib, Ph.D.
Program Chief, Statistical Methods in Psychiatry Program, Adult Psychpathology & Psychosocial Interventions Research Branch, Division of Translational Research (DTR), NIH

Abstract:  Similar to its univariate counterpart, multivariate meta-analysis is a method to synthesize multiple outcome effects by taking in to account the available variance-covariance structure. It can improve efficiency over separate univariate syntheses and enables joint inferences across the outcomes. Multivariate meta-analysis is required to address the complexity of the research questions. Multivariate data can arise in meta-analysis due to several reasons.  The primary studies can be multivariate in nature by measuring multiple outcomes for each subject, typically known as multiple-endpoint studies, or it may arise when primary studies involve several comparisons among groups based on a single outcome or measures several parameters. Although it possesses many advantages over the more established univariate counterpart, multivariate meta-analysis has some challenges including modelling and estimating the parameter of interests. Under random-effects model assumption, we discuss the methods of estimating the heterogeneity parameters and effect sizes of the multivariate data and its application by using illustrative example and simulation results. 

Location: Warwick Evans Conference Room, Building D
4000 Reservoir Rd. NW, Washington, DC 20057

Friday, April 26, 2019 at 10:00 am

Ming Yuan, Ph.D.
Professor, Department of Statistics, Columbia University

Abstract:  “I see yellow; therefore, there is colocalization.” Is it really so simple when it comes to colocalization studies? Unfortunately, and fortunately, no. Colocalization is in fact a supremely powerful technique for scientists who want to take full advantage of what optical microscopy has to offer: quantitative, correlative information together with spatial resolution. Yet, methods for colocalization have been put into doubt now that images are no longer considered simple visual representations. Colocalization studies have notoriously been subject to misinterpretation due to difficulties in robust quantification and, more importantly, reproducibility, which results in a constant source of confusion, frustration, and error. In this talk, I will share some of our effort and progress to ease such challenges using novel statistical and computational tools.

Location: Warwick Evans Conference Room, Building D
4000 Reservoir Rd. NW, Washington, DC 20057

Seminars from Fall 2018

Friday, September 14, 2018 at 10:00 am

Ying Zhang, Ph.D.
Professor and Director of Biostatistics Education, Department of Biostatistics, School of Public Health, Indiana University

Abstract: Causal inference is a key component for comparative effectiveness research in observational studies. The inverse-propensity weighting (IPW) technique and augmented inverse-propensity weighting (AIPW) technique, which is known as a double-robust method, are the common methods for making causal inference in observational studies. However, these methods are known not stable, particularly when the models for propensity score and the study outcome are wrongly specified.  In this work, we propose a model-free approach for causal inference. While possessing standard asymptotic properties, this method also enjoys excellent finite sample performance and robustness. Simulation studies were conducted to compare with the well-known IPW and AIPW methods for causal inference. A real-life example from an ongoing Juvenile Idiopathic Arthritis Study was applied for the illustration of the proposed method.

Location: New Research Building Auditorium
3970 Reservoir Rd. NW, Washington, DC 20057

Friday, September 28, 2018 at 10:00 am

Yangxin Huang, Ph.D.
Professor, Departments of Epidemiology and Biostatistics, College of Public Health, University of South Florida – Tampa

Abstract: Joint modeling of longitudinal and survival data is an active area of statistics research and becoming increasingly essential in most epidemiological and clinical studies. As a result, a considerable number of statistical models and analysis methods have been suggested for analyzing such longitudinal-survival data. However, the following issues may standout. (i) a common assumption for longitudinal variables is that model errors are normally distributed due to mathematical tractability and computational convenience. This requires the variables to be “symmetrically” distributed. A violation of this assumption could lead to misleading inferences; (ii) in practice, many studies are often to collect multiple longitudinal exposures which may be significantly correlated, ignoring their correlations may lead to bias and reduce efficiency in estimation; (iii) the longitudinal responses may encounter nonignorable missing; (iv) repeatedly measured observations in time are often interrelated with a time-to-event of interest. Inferential procedures may complicate dramatically when one analyzes data with these features together. Under the umbrella of Bayesian inference, this talk explores a multivariate mixed-effects joint models with skewed distributions for longitudinal measures with an attempt to mediate correlation from multiple responses, adjust departure from normality, and tailor accuracy from nonignorable missingness as well as overcome shortage of confidence in specifying a time-to-event model. A data set arising from diabetes study is analyzed to demonstrate the methodology. Simulation studies are conducted to assess the performance of the proposed joint models and method under various scenarios.

Location: Proctor Harvey Amphitheater, Med-Dent Building C105 
3900 Reservoir Rd. NW, Washington, DC 20057

Friday, October 12, 2018 at 10:00 am

Kosuke Imai, Ph.D.
Professor of Government and of Statistics, Department of Statistics, Harvard University

Abstract: In many social science experiments, subjects often interact with each other and as a result one unit’s treatment influences the outcome of another unit. Over the last decade, a significant progress has been made towards causal inference in the presence of such interference between units. Researchers have shown that the two-stage randomization of treatment assignment enables the identification of average direct and spillover effects. However, much of the literature has assumed perfect compliance with treatment assignment. In this paper, we establish the nonparametric identification of the complier average direct and spillover effects in two-stage randomized experiments with interference and noncompliance. In particular, we consider the spillover effect of the treatment assignment on the treatment receipt as well as the spillover effect of the treatment receipt on the outcome. We propose consistent estimators and derive their randomization-based variances under the stratified interference assumption. We also prove the exact relationship between the proposed randomization-based estimators and the popular two-stage least squares estimators. Our methodology is motivated by and applied to the randomized evaluation of the India’s National Health Insurance Program (RSBY), where we find some evidence of spillover effects on both treatment receipt and outcome. The proposed methods are implemented via an open-source software package.  

Location: New Research Building Auditorium
3970 Reservoir Rd. NW, Washington, DC 20057

Friday, October 26, 2018 at 10:00 am

Vernon Chinchilli, Ph.D.
Distinguished Professor & Chair, Department of Public Health Sciences, Hershey College of Medicine, Penn State University

Abstract: Physicians frequently use N-of-1 (single-patient) trial designs in an informal manner to identify an optimal treatment for an individual patient. An N-of-1 clinical trial that focuses exclusively on optimizing the primary outcome for a specific patient clearly may not be very useful for generalizability to a population of patients. A series or collection of N-of-1 clinical trials, however, could be generalizable. We review current literature on the design and analysis of N-of-1 trials, and this includes Bayesian approaches as well. We next describe the “Best African American Response to Asthma Drugs (BARD)” trial, which invokes a four-way crossover design and has the flavor of a series of N-of-1 trials. We propose a nonlinear mixed-effects model with a quadrinomial logistic regression for the analysis of the BARD data that constructs six pairwise comparisons of the four asthma treatments to (1) assess the optimal treatment for each study participant and (2) estimate population-level treatment comparisons.

Location: New Research Building Auditorium
3970 Reservoir Rd. NW, Washington, DC 20057

Wednesday, November 7, 2018 at 10:00 am

Michael Proschan, Ph.D.
Mathematical Statistician, Biostatistics Research Branch, National Institute of Allergy and Infectious Diseases (NIAID), NIH

Abstract: Probability books sometimes present “cooked” counterexamples to warn students of the lurking dangers of compromising rigor.  Can such anomalies actually occur in practice?  This talk is proof that they can!  I present actual examples from my clinical trials experience in which I either fell into, or almost fell into, probability traps.  These experiences were a major motivation for our book, “Essentials of Probability Theory for Statisticians” (Proschan and Shaw, 2016, CRC Press, Taylor & Francis Group).   

Location: New Research Building Auditorium
3970 Reservoir Rd. NW, Washington, DC 20057

Seminars From Spring 2018

Friday, January 12, 2018 at 10:00 am

Yixin Fang, Ph.D.
Associate Professor, Department of Mathematical Sciences, New Jersey Institute of Technology

Abstract: In many applications involving large dataset or online updating, stochastic gradient descent (SGD) provides a scalable way to compute parameter estimates and has gained increasing popularity due to its numerical convenience and memory efficiency. While the asymptotic properties of SGD-based estimators have been established decades ago, statistical inference such as interval estimation remains much unexplored. The traditional resampling method such as the bootstrap is not computationally feasible since it requires to repeatedly draw independent samples from the entire dataset. The plug-in method is not applicable when there are no explicit formulas for the covariance matrix of the estimator. In this paper, we propose a scalable inferential procedure for stochastic gradient descent, which, upon the arrival of each observation, updates the SGD estimate as well as a large number of randomly perturbed SGD estimates. The proposed method is easy to implement in practice. We establish its theoretical properties for a general class of models that includes generalized linear models and quantile regression models as special cases. The finite-sample performance and numerical utility is evaluated by simulation studies and two real data applications.

Location: Warwick Evans Conference Room, Building D
4000 Reservoir Rd. NW, Washington, DC 20057

Friday, January 26, 2018 at 10:00 am

Janet Sinsheimer, Ph.D.
Professor, Departments of Human Genetics, Biomathematics and Biostatistics, University of California, Los Angeles, School of Medicine

Abstract: Linear mixed effect models (LMMs) have a long history in genetics, going back at least as far as when R. A. Fisher proposed the polygenic model. They have become a main stay in statistical modeling.  Because these models can be computationally intense, LMMs were dropped in favor of simpler statistical tests in the whole genome era of genetics.  However, quite recently LMMs surged in popularity for -omic studies and in particular for genome wide association studies.  In my talk, I will review what makes these models so popular now in genomics, discuss my groups’ recent work with LMMs to detect maternal gene by offspring gene interactions, and then touch on some open questions that deserve consideration. 

Location: Warwick Evans Conference Room, Building D
4000 Reservoir Rd. NW, Washington, DC 20057

Friday, February 9, 2018 at 10:00 am

Kirk Wolter, Ph.D.
Senior Fellow, Chief Statistician and Executive Vice President of Statistics and Methodology, Survey Research; Professor of Statistics, NORC at University of Chicago

Abstract: I will discuss the National Immunization Survey (NIS) conducted by the Centers for Disease Control and Prevention.  NIS-Child monitors vaccination coverage for the population of children 19-35 months, NIS-Teen focuses on vaccination coverage of adolescents 13-17 years, and NIS-Flu monitors influenza vaccination coverage for the population of children 6 months – 17 years, all living in the U.S.A.  This family of surveys is conducted annually in two phases: 1) a large survey of telephone numbers to identify households with age-eligible children, collect socio-demographic information about the child, and obtain contact information for the child’s immunization providers; and 2) a mail survey to collect immunization histories from identified providers for whom consent has been ob tained to contact providers.  Independent samples are selected quarterly in each of the 50 states and in specified urban areas.  The first-phase sample is obtained by random digit dialing from a national sampling frame consisting of landline and cell-phone numbers.  Because of insufficient validity of estimates derived from household-retained vaccination cards and parental recall, the NIS uses children with adequate provider-reported data to estimate vaccination coverage rates.  I discuss the NIS dual-frame estimation procedure.  

Location: Warwick Evans Conference Room, Building D
4000 Reservoir Rd. NW, Washington, DC 20057

Friday, February 23, 2018 at 10:00 am

Alex Sverdlov, Ph.D.
Director Statistical Scientist, Early Development Biostatistics – Translational Medicine, Novartis Pharmaceuticals Corporation

Abstract:  Dose-response studies play an important role in clinical drug development. In this presentation, I will give an overview of some of my research work on optimal adaptive designs for dose-response studies with time-to-event outcomes. These designs utilize response-adaptive allocation to most informative dose levels according to pre-defined statistical criteria. The designs can significantly improve estimation efficiency of the trial, which can potentially translate into reduction in study sample size. I will also emphasize some open topics in this field which can motivate some interesting research on optimal adaptive designs.

Location: Warwick Evans Conference Room, Building D
4000 Reservoir Rd. NW, Washington, DC 20057

Friday, March 23, 2018 at 10:00 am

Rajeshwari Sundaram, Ph.D.
Senior Investigator, Biostatistics and Bioinformatics Branch, Division of Intramural Population Health Research, National Institute of Child Health and Human Development, NIH

Abstract: Defining labor progression in women has been a long-standing challenge for obstetricians. Cervical dilation, as integer-valued measurement, is a key indicator of the first stage of labor progression. Assessing the distribution of the time to per-unit increments of cervical dilation is of considerable interest in aiding obstetricians with better management of labor. Given that women are observed only intermittently for cervical dilation after they get admitted to hospital and that the observation frequency is very likely correlated to how fast/slow she dilates, one could view such data as panel count data with informative observation times and unknown time-zero. We propose semiparametric proportional rate models for the cervical dilation process and the observation process, with a multiplicative subject-specific frailty variable capturing the correlation between the two processes. Inference procedures for the gap times between consecutive events are proposed for both the scenarios with known and unknown time-zero using maximum likelihood approach and estimating equations. The methodology is assessed through simulation study and its large sample properties. A detailed analysis using the proposed method applied to the longitudinal cervical dilation data from the National Collaborative Perinatal Project from 1960s and the Consortium of Safe Labor of 2000s will be presented providing interesting comparisons across time. We will discuss other statistical challenges in studying labor progression including second stage of labor as well as neonatal and maternal morbidities.

Location: Warwick Evans Conference Room, Building D
4000 Reservoir Rd. NW, Washington, DC 20057

Friday, April 13, 2018 at 10:00 am

Gengsheng Qin, Ph.D.
Professor, Department of Mathematics and Statistics, Georgia State University, Atlanta

Abstract:  The correlation coefficient (CC) is a standard measure of a possible linear association between two continuous random variables. The CC plays a significant role in many scientific disciplines. For a bivariate normal distribution, there are many types of confidence intervals for the CC, such as Z-transformation and maximum likelihood-based intervals. However, when the underlying bivariate distribution is unknown, the construction of confidence intervals for the CC is not well-developed. In this paper, we discuss various interval estimation methods for the CC. We propose a generalized confidence interval for the CC when the underlying bivariate distribution is a normal distribution, and two empirical likelihood-based intervals for the CC when the underlying bivariate distribution is unknown. We also conduct extensive simulation studies to compare the new intervals with existing intervals in terms of coverage probability and interval length. Finally, two real examples are used to demonstrate the application of the proposed methods.

Location: Warwick Evans Conference Room, Building D
4000 Reservoir Rd. NW, Washington, DC 20057

Friday, April 27, 2018 at 10:00 am

Ying Yuan, Ph.D.
Professor, Department of Biostatistics, Division of Science, University of Texas, MD Anderson Cancer Center

Abstract:  In this talk, I will introduce the Bayesian optimal interval (BOIN) design as a novel platform for designing various different types of early phase clinical trials, including single agent, combination, toxicity grades and late-onset toxicity, under a unified framework. The BOIN belongs a class of new designs, known as model-assisted designs, that use a model for efficient decision making like model-based designs, while their dose escalation and de-escalation rules can be tabulated before the onset of a trial as with algorithm-based designs. The BOIN design is easy to implement in a way similar to the 3+3 design, but is more flexible for choosing the target toxicity rate and cohort size and yields a substantially better performance that is comparable to that of more complex model-based designs, such as the continuous reassessment method. The BOIN design possesses intuitive Bayesian and frequentist interpretations with desirable finite-sample and large-sample properties, i.e., long-memory coherence and consistency. The software with graphical user interface will be demonstrated.

Location: Warwick Evans Conference Room, Building D
4000 Reservoir Rd. NW, Washington, DC 20057

Seminars from Fall 2017

Friday, September 8, 2017 at 10:00 am

Hao Wang, Ph.D.
Associate Professor of Oncology, Division of Oncology – Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, John’s Hopkins Medicine

Abstract: Despite the wide use of the design with statistical stopping guidelines to stop a randomized clinical trial early for efficacy, there are unsettled debates of potential harmful consequences of such designs. These concerns include the possible over-estimation of treatment effects in early-stopped trials, and a newer argument of a “freezing effect” that will halt future RCTs on the same comparison since an early-stopped trial represents an effective declaration that randomization to the un-favored arm is unethical. We determine the degree of bias in designs that allow for early stopping, and assess the impact on estimation if indeed future experimentation is “frozen” by an early-stopped trial. We discuss methods to correct for the over-estimate in early-stopped trials. We demonstrate that superiority established in a RCT stopping early and designed with appropriate statistical stopping rules is likely a valid inference, even if the estimate may be slightly inflated. 

Location: Warwick Evans Conference Room, Building D
4000 Reservoir Rd. NW, Washington, DC 20057

Friday, September 22, 2017 at 10:00 am

Jianhui Zhou, Ph.D.
Associate Professor, Department of Statistics, University of Virginia

Abstract: Quantile regression has been getting more attention recently in survival analysis due to its robustness and interpretability, and is considered as a powerful alternative to Cox proportional hazards model and accelerated failure time (AFT) model. Allowing a nonlinear relationship between survival time and risk factors, we study a single index model for censored quantile regression, and employ B-spline approximation for estimation. To avoid estimation bias cause by censoring, we consider the redistribution-of-mass to obtain a weighted quantile regression estimator. For high dimensional covariates, dimension reduction approach is adopted to alleviate the “curse of dimensionality”. Furthermore, we penalize the developed estimator for variable selection. The proposed methods can be efficiently implemented using the existing weighted linear quantile regression algorithm. The asymptotic properties of the developed estimators are investigated, and their numerical performance is evaluated in simulation studies. We apply the proposed methods to dataset from a kidney transplant study.

Location: Proctor Harvey Amphitheater, Med-Dent Building C105
3900 Reservoir Rd. NW, Washington, DC 20057

Friday, October 13, 2017 at 10:00 am

Feng Cheng, Ph.D.
Assistant Professor, Department of Pharmaceutical Science, College of Pharmacy; Department of Epidemiology & Biostatistics, College of Public Health, University of South Florida – Tampa

Background: The oral cavity contains a diverse microbiome with over 700 bacterial species, many of which influence human health.
Objective: We hypothesize that features of the salivary microbiome will distinguish gingival health from disease and that these attributes will be more prevalent in those with Type 1 Diabetes (T1D). 1) Characterize the composition of the salivary microbiome from 16s sequencing. 2) Identify features of the salivary microbiome that distinguish those with T1D from those without T1D.
Methods: Passive drool saliva samples and clinical data were obtained from 197 (97 with T1D and 100 without diabetes) adults attending the 12-year visit for the CACTI study.  Salivary DNA was extracted and 16S amplicons were sequenced. 16S reads will be mapped and clustered into operational taxonomic units (OTUs). Multi testing analysis was used to identify associations between the taxonomic microbial profiles and T1D status.
Results: At the phylum level, the main constituents of the salivary microbiome in both T1D and non-T1D were Bacteroidetes, Firmicutes, and Proteobacteria. However, we did find a significant increased abundance of Firmicutes in the saliva from T1D subjects (29%) compared to non-T1D subjects (25%, false-discovery rate (FDR)-adjusted p=0.019). At the genus level, the relative abundances of several genera were higher or lower in T1D compared to non-diabetics. The relative abundance of Prevotella was lower, and the relative abundances of Campylobacter and Streptococcus were higher in those with T1D compared to those without.
Conclusion: The composition of the salivary microbiome was largely made up with Bacteroidetes, Firmicutes, and Proteobacteria. There is association between taxonomic Profiles of the Salivary Microbiome and Type 1 Diabetes. At the phylum level, T1D were enriched with Firmicutes compared to non-T1D. At the genus level, the relative abundances of several genera were higher or lower in T1D compared to non-diabetics.

Location: New Research Building Auditorium
3800 Reservoir Rd. NW, Washington, DC 20057

Friday, October 27, 2017 at 10:00 am

Michael Stoto, PhD
Professor, Department of Health Systems Administration and Population Health, School of Nursing & Health Studies, Georgetown University

Abstract:  Meta-analysis has increasingly been used to identify adverse effects of drugs and vaccines, but the results have often been controversial. In one respect, metaanalysis is an especially appropriate tool in these settings. Efficacy studies are often too small to reliably assess risks that become important when a medication is in widespread use, so meta-analysis, which is a statistically efficient way to pool evidence from similar studies, seems like a natural approach. But, as the examples in this paper illustrate, different syntheses can come to qualitatively different conclusions, and the results of any one analysis are usually not as precise as they seem to be. There are three reasons for this: the adverse events of interest are rare, standard meta-analysis methods may not be appropriate for the clinical and methodological heterogeneity that is common in these studies, and adverse effects are not always completely or consistently reported. To address these problems, analysts should explore heterogeneity and use randomeffects or more complex statistical methods, and use multiple statistical models to see how dependent the results are to the choice of models.

Location: New Research Building Auditorium
3800 Reservoir Rd. NW, Washington, DC 20057

Friday, November 10, 2017 at 10:00 am

Eric Vance, PhD
Director, Laboratory for Interdisciplinary Statistical Analysis (LISA); Associate Professor, Department of Applied Mathematics, University of Colorado Boulder

Abstract: Statistics, analytics, and data science provide powerful methods, tools, and ways of thinking for solving problems and making decisions, but not everyone who could benefit from applying statistics and data science to their research has the knowledge or skills to apply it correctly. The Laboratory for Interdisciplinary Statistical Analysis (LISA) is a statistical collaboration laboratory recently create at the University of Colorado Boulder that generates, applies, and spreads new knowledge throughout the state, the nation, and the world. LISA’s mission is to train statisticians to become interdisciplinary collaborators, provide research infrastructure to enable and accelerate high impact research, and engage with the community in outreach activities to improve statistical literacy. LISA has learned how to create statistical collaboration laboratories to train students to become effective statistical collaborators and to provide research infrastructure for the university. LISA is spreading this knowledge globally through the LISA 2020 Program to help scientists, government officials, businesses, and NGOs in developing countries discover local solutions to local problems through collaborations with statisticians from newly created statistical collaboration laboratories. The LISA 2020 goal is to build a global network of 20 statistical collaboration laboratories in developing countries by 2020. So far seven stat labs have been created in developing countries to train students to become effective interdisciplinary collaborators and enable researchers and government officials to solve problems and make better decisions.

Location: Proctor Harvey Amphitheater, Med-Dent Building C105
3900 Reservoir Rd. NW, Washington, DC 20057

Seminars from Spring 2017

Friday, April 28, 2017 at 10:00 am

Yuelin Li, Ph.D.
Associate Attending Behavioral Scientist, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center

Abstract: The Rasch Model (RM) is a classic IRT (Item Response Theory) model in psychometrics. RM is used to solve various applied problems including the measurement of a psychological construct, the scoring of patient-reported outcomes, and in understanding the politics in the high court. I will begin with the measurement of aggression as an example on how to fit a Rasch Model using Gibbs sampling. Next, the RM can be extended to quantify the politics of the Supreme Court. These first two examples will be brief. I will spend most of the time investigating a paradox—that many cancer survivors report memory deficits, yet their memory seems intact by standard neurocognitive tests. A Bayesian latent regression RM (Li, et al. 2016) helps to make sense of this apparent contradiction. I will provide a practical guide, on how to fit the Bayesian latent RM, and how to use the MCMC chains to derive empirical estimates that are harder to get with non-Bayesian methods. The overall goal is to show how psychometric methods have many useful applications across diverse disciplines. Hopefully, a brief introduction will stir discussions on how IRT models may be useful in your own research.

Location: Warwick Evans Conference Room, Building D
4000 Reservoir Rd, Washington, DC 20057-1484

Friday, March 24, 2017 at 10:00 am

Jenna Krall, Ph.D.
Assistant Professor, Department of Global and Community Health, College of Health and Human Services, George Mason University

Abstract: Exposure to particulate matter (PM) air pollution has been associated with increased mortality and morbidity. PM is a complex chemical mixture, and associations between PM and health vary by its chemical composition. Identifying which sources of PM, such as motor vehicles or wildfires, emit the most toxic pollution can lead to a better understanding of how PM impacts health. However, exposure to source-specific PM is not directly observed and must be estimated from PM chemical component data. Source apportionment models aim to estimate source-specific concentrations of PM and the chemical composition of PM emitted by each source. These models, while useful, have some limitations. Specifically, the models are not identifiable without additional information, the estimated source chemical compositions may not match known source compositions, and the models are difficult to apply in multicity studies. In this talk, I introduce source apportionment models and discuss current challenges and opportunities in their application. I estimate sources and their health effects in two studies: a study of commuters in Atlanta, GA and a multicity time series study of four U.S. cities.

Location: Warwick Evans Conference Room, Building D
4000 Reservoir Rd, Washington, DC 20057-1484

Friday, February 24, 2017 at 10:00 am

Peter Song, Ph.D.
Professor, Department of Biostatistics, University of Michigan at Ann Arbor

Abstract: As data sets of related studies become more easily accessible, combining data sets of similar studies is often undertaken in practice to achieve a larger sample size and higher power. A major challenge arising from data integration pertains to data heterogeneity in terms of study population, study design, or study coordination. Ignoring such heterogeneity in data analysis may result in biased estimation and misleading inference. Traditional techniques of remedy to data heterogeneity include the use of interactions and random effects, which are inferior to achieving desirable statistical power or providing a meaningful interpretation, especially when a large number of smaller data sets are combined. In this paper, we propose a regularized fusion learning method that allows us to identify and merge inter-model homogeneous parameter clusters in regression analysis, without the use of hypothesis testing approach. Using the fused lasso, we establish a computationally efficient procedure to deal with large-scale integrated data. Incorporating the estimated parameter ordering in the fused lasso facilitates computing speed with no loss of statistical power. We conduct extensive simulation studies and provide an application example to demonstrate the performance of the new method with a comparison to the conventional methods. This is a joint work with Lu Tang.

Location: Warwick Evans Conference Room, Building D
4000 Reservoir Rd, Washington, DC 20057-1484

Friday, February 10, 2017 at 9:30 am

Yinglei Lai, Ph.D.
Professor of Statistics, Department of Statistics, George Washington University

Abstract: The development of microarray and sequencing technologies enables biomedical researchers to collect and analyze large-scale molecular data. We will introduce our recent studies on the concordant integrative approach to the analysis of multiple related two-sample genome-wide expression data sets. A mixture model is developed and yields concordant integrative differential expression analysis as well as concordant integrative gene set enrichment analysis. As the number of data sets increases, it is necessary to reduce the number of parameters in the model. Motivated by the well-known generalized estimating equations (GEEs) for longitudinal data analysis, we focus on the concordant components and assume some special structures for the proportions of non-concordant components in the mixture model. The advantage and usefulness of this approach are illustrated on experimental data.

Location: Warwick Evans Conference Room, Building D
4000 Reservoir Rd, Washington, DC 20057-1484

Thursday, February 2, 2017 at 10:45 am

Kelly H. Zou, PhD, PStat, ASA Fellow
Senior Director and Analytic Science Lead, Real World Data & Analytics, Global & Health Impact

Abstract: Given the desire to enhance the effectiveness and efficiency of health care systems, it is important to understand and evaluate the risk factors for disease progression, treatment patterns such as medication uses, and utilizations such as hospitalization. Statistical analyses via observational studies and data mining may help evaluate patients’ diagnostic and prognostic outcomes, as well as inform policies to improve patient outcomes and to control costs. In the era of big data, real-world longitudinal patient-level databases containing the insurance claims of commercially insured adults, electronic health records, or cross-sectional surveys, provide useful insights to such analyses. Within the pharmaceutical industry, executing rapid queries to inform development and commercialization strategies, as well as pre-specified non-interventional observation studies, are commonly performed. In addition, pragmatic studies are increasingly being conducted to examine health-related outcomes. In this presentation, selective published examples on real-world data analyses are illustrated. Results typically suggest that paying attention to patient comorbidities and pre-index or at index health care service utilization may help identify patients at higher risk and unmet needs for treatments. Finally, fruitful collaborative opportunities exist across different sectors among academia, industry and the government.

Location: Med-Dent C-104, W. Proctor Harvey Amphitheater
3900 Reservoir Rd, Washington, DC 20057-1484

Friday, January 27, 2017 at 10:00 am

Goodarz Danaei, M.D.
Associate Professor, Department of Epidemiology, School of Public Health, Harvard University

Abstract: This presentation reviews methods for comparative effectiveness research using observational data. The basic idea is using an observational study to emulate a hypothetical randomised trial by comparing initiators versus non-initiators of treatment. After adjustment for measured baseline confounders, one can then conduct the observational analogue of an intention-to-treat analysis. We also explain two approaches to conduct the analogues of per-protocol and as-treated analyses after further adjusting for measured time varying confounding and selection bias using inverse-probability weighting. As an example, we implemented these methods to estimate the effect of statins for primary prevention of coronary heart disease (CHD) using data from electronic medical records in the UK. Despite strong confounding by indication, our approach detected a potential benefit of statin therapy. The analogue of the intention-to treat hazard ratio (HR) of CHD was 0.89 (0.73, 1.09) for statin initiators versus non-initiators. The HR of CHD was 0.84 (0.54, 1.30) in the per-protocol analysis and 0.79 (0.41, 1.41) in the as-treated analysis for 2 years of use versus no use. In contrast, a conventional comparison of current users versus never users of statin therapy resulted in a HR of 1.31 (1.04, 1.66). We provide a flexible and annotated SAS program to implement the proposed analyses.

Location: Warwick Evans Conference Room, Building D
4000 Reservoir Rd, Washington, DC 20057-1484

Friday, January 13, 2017 at 9:30 am

Mei-Ling Ting Lee, Ph.D.
Professor, Department of Epidemiology and Biostatistics; Director, Biostatistics and Risk Assessment Center, University of Maryland at College Park

Abstract: This presentation reviews methods for comparative effectiveness research using Cox regression methods are well known. It has, however, a strong proportional hazards assumption. In many medical contexts, a disease progresses until a failure event (such as death) is triggered when the health level first reaches a failure threshold. I’ll present the Threshold Regression (TR) model for patient’s latent health process that requires few assumptions and, hence, is quite general in its potential application. We use TR to analyze data from a randomized clinical trial of treatment for multiple myeloma. A comparison is made with a Cox proportional hazards regression analysis of the same data.

Location: Warwick Evans Conference Room, Building D
4000 Reservoir Rd, Washington, DC 20057-1484

Seminars from Fall 2016

Friday, November 11, 2016 at 10:00 am

Keith Muller, Ph.D.
Associate Chair and Professor, Institute for Child Health Policy, University of Florida

Abstract: Concerns about reproducibility in science are widespread. In response, the National Institutes of Health has changed review procedures and training requirements for applicants ( The Director of NIH and his deputy outlined their plans in Collins and Tabak (2014). Key methodological concerns include poor study designs, incorrect statistical analyses, inappropriate sample size selection, and misleading reporting. Planners can avoid the concerns by following four statistical guidelines. 1) Explicitly control both Type I errors (false positives) and Type II errors (false negatives). 2) Align the scientific goals, study design, data analysis plan, and the sample size analysis. 3) Vary inputs to the sample size analysis to determine the sensitivity to the values assumed. 4) Account for statistical uncertainty in inputs to sample size computations. Extending the guidelines to sequences of studies requires careful allocation of exploratory and confirmatory analyses (leapfrog designs) and allows some forms of adaptive designs. We give examples in the talk for a variety of designs and hypotheses. Case studies include a randomized drug trial in kidney disease, an observational study of quality of care in Medicaid, and a neurotoxicology experiment in rats. Analytic and simulation results provide the foundation for the conclusions.

Location: Warwick Evans Conference Room, Building D
4000 Reservoir Rd, Washington, DC 20057-1484

Friday, October 28, 2016 at 10:00 am

Felix Elwert, Ph.D.
Associate Professor of Sociology, University of Wisconsin-Madison

Abstract: This talk introduces the three central uses of directed acyclic graphs (DAGs) for causal inference in the observational biomedical and social sciences.  First, DAGs provide clear notation for the researcher’s theory of data generation, against which all causal inferences must be judged. Second, DAGs reveal to what extent the researcher’s data-generating model can be tested. Third, researchers can inspect the DAG to determine whether a given causal question can be answered (“identified”) from the data. After introducing basic building blocks, we will discuss a number of real examples to demonstrate how DAGs help solve thorny practical problems in causal inference.

Location: Warwick Evans Conference Room, Building D
4000 Reservoir Rd, Washington, DC 20057-1484

Friday, October 14, 2016 at 10:00 am

Dennis Lin, Ph.D.
University Distinguished Professor of Statistics, Pennsylvania State University

Abstract: Dimensional Analysis (DA) is a fundamental method in the engineering and physical sciences for analytically reducing the number of experimental variables prior to the experimentation. The principle use of dimensional analysis is to reduce from a study of the dimensions of the variables on the form of any possible relationship between those variables. The method is of great generality. In this talk, an overview/introduction of DA will be first given. A basic guideline for applying DA will be proposed, using examples for illustration. Some initial ideas on using DA for Data Analysis and Data Collection will be discussed. Future research issues will be proposed.

Location: Warwick Evans Conference Room, Building D
4000 Reservoir Rd, Washington, DC 20057-1484

Friday, September 23, 2016 at 10:00 am

Yongzhao Shao, Ph.D.
Professor, Population Health and Environmental Medicine and Deputy Director of New York University Cancer Institute Biostatistics Shared Resources 

Abstract: An unmet significant challenge in the treatment of many early-stage cancers is the lack of effective prognostic models to identify patients who are at high risk of disease progression from a large number of potentially cured patients. Semi-parametric mixture cure models can account for latent cure fractions in patient populations thus are more suitable prognostic models than standard survival models such as Cox Proportional Hazard models or Proportional Odds models that ignore the existence of latent cure fractions. Without the requirement of knowing who is surely cured, the semiparametric mixture cure models can be used to evaluate predictive utility of biomarkers on cure probability and on survival of uncured subjects. However, appropriate statistical metrics to evaluate prognostic efficiency in the presence of cured patients have been lacking. In this paper, we introduce concordance-based prognostic metrics for semi-parametric mixture cure models and develop consistent estimates. The asymptotic normality and confidence intervals of these estimates are also established. Finite sample applicability of the developed indices and estimates are investigated using numerical simulations and illustrated using a melanoma data set. This talk is based on joint work with Dr. Yilong Zhang at Merck.

Location: Warwick Evans Conference Room, Building D
4000 Reservoir Rd, Washington, DC 20057-1484

Friday, September 9, 2016 at 10:00 am

Jianguo Sun, Ph.D.
Professor, Department of Statistics, University of Missouri

Abstract: The analysis of failure time data plays an important and essential role in many studies, especially medical studies such as clinical trials and follow-up studies. One key feature of failure time data that separates the failure time data analysis from other fields, is censoring, which can occur in different forms.  In this talk, we will discuss and review a general form, interval censoring, and the existing literature for the analysis of interval-censored data as well as some research topics.

Location: Warwick Evans Conference Room, Building D
4000 Reservoir Rd, Washington, DC 20057-1484