Recent Past Seminars

The Bio3 Seminar Series meets every second and fourth Friday of the month, during the academic year. MS & PhD biostatistics students are expected to attend the bi-weekly seminar series, as part of their curriculum.

Seminars from Spring 2025

Friday, April 25, 2025 at 10:00 am

Neil J. Perkins, Ph.D.
Staff Scientist, Biostatistics and Bioinformatics Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD), National Institute of Health (NIH)

“Missing Data in Epidemiologic Research: Contemporary Approaches for Enduring Problems”

Missing data is a common challenge in epidemiology and other data analytic areas that can affect the validity of analysis and research findings. Common methods for addressing missing data are complete case analysis, multiple imputation, and inverse probability weighting. However, there are nuances to using these tools and as statistical and epidemiological techniques continue to evolve, it is important to also consider newer and cutting-edge methods for addressing this persistent problem. This talk will give and overview of the potential impacts of missing data, common methods to reduce the impact and explore new methods for handling missing data. The session will provide an opportunity to understand the benefits and drawbacks of different strategies and help you decide which methods are best suited for your analysis.

Location: Building D, Warwick Evans Conference Room

Friday, March 14, 2025 at 10:30 am

Gang Han, Ph.D.
Professor, Department of Epidemiology and Biostatistics, School of Public Health, Texas A&M University

“Bayesian-Frequentist Hybrid Inference in Research Applications”

Abstract: The Bayesian-frequentist hybrid model and associated inference can combine the advantages of both Bayesian and frequentist methods and avoid their limitations. However, the computation under the hybrid model is generally nontrivial or even unsolvable. We develop a computation algorithm for hybrid inference under any general loss functions. Three simulation examples demonstrate that hybrid inference can improve upon frequentist inference and Bayesian inference based on non-informative priors. The proposed method is illustrated in applications including an RNA single cell sequencing study, a biomechanical engineering design, and an analysis of the HIV viral load dynamics.

Location: Medical Dental Building, SW107

Friday, February 14, 2025 at 10:00 am

Robert Podolsky, Ph.D.
Manager, Center for Biostatistics, Informatics, and Data Science (CBIDS), MedStar Health Research Institute

“Trying to See Clearly: Statistical Challenges in Retinal Imaging”

Analysis of imaging data typically involves examining spatially structured data. Many analytical methods rely on a functional data approach, which enables the evaluation of changes in means and derivatives. However, these methods often are not able to account for more complex random effects encountered in designed experiments. Further complicating analysis, key markers of retinal morphology and physiology in optical coherence tomography are characterized in different ways: one by changes in intensity, another by the distance between features, and a third by peak shape. In this talk, I will discuss approaches to model these markers individually, as well our ongoing efforts to develop joint modeling approaches.

Location: Building D, Warwick Evans Conference Room

Friday, January 24, 2025 at 10:00 am

Lizhen Lin, Ph.D.
Professor, Department of Statistics, University of Maryland

“Statistical Foundations of Deep Generative Models”

Abstract: Deep generative models are probabilistic generative models where the generator is parameterized by a deep neural network. They are popular models for modeling high-dimensional data such as texts, images and speeches, and have achieved remarkable empirical success. Despite demonstrated success in empirical performance, theoretical properties of such modelsremain less explored. We investigate theoretical foundations of deep generative models from a nonparametric distribution estimation viewpoint. In the considered model, data are assumed to be observed in some high-dimensional ambient space but concentrate around some low-dimensional structure such as a lower-dimensional manifold structure. This talk will provide a theoretical underpinning of deep generative models from the lens of statistical theory. In particular, I will present theoretical insights into i) how deep generative models can avoid the curse of dimensionality and outperform classical nonparametric estimates, and ii) how likelihood-based approaches work for high-dimensional distribution estimation under a deep generative model, especially in adapting to the intrinsic geometry of the data.

Location: Online via Zoom

Friday, January 10, 2025 at 10:00 am

Sungkyu Jung, Ph.D.
Professor, Department of Statistics, Director, Institute for Data Innovation in Science, Seoul National University, South Korea

“Principal Component Analysis for Zero-Inflated Compositional Data”

Abstract: Recent advances in DNA sequencing technology have heightened interest in microbiome data, which is often high-dimensional and presents challenges due to its compositional nature and zero-inflation. In this talk, I will introduce new PCA methods for zero-inflated compositional data, based on a framework called principal compositional subspace. These methods aim to identify both the principal compositional subspace and corresponding principal scores that best approximate the data while maintaining its compositional properties. Theoretical properties such as existence and consistency of the principal compositional subspace are investigated. Simulation studies show these methods achieve lower reconstruction errors than existing log-ratio PCA methods in linear patterns and perform comparably in curved patterns. The methods successfully uncover the low-rank structure in four microbiome compositional datasets with excessive zeros.

Location: Building D, Warwick Evans Conference Room

Seminars from Fall 2024

Friday, November 8, 2024 at 10:00 am

Seong Jun Yang, Ph.D.
Associate Professor, Department of Statistics, Jeonbuk National University, South Korea
Visiting Scholar, Department of Biostatistics, Bioinformatics, & Biomathematics, Georgetown University

“Varying Coefficient Models for Survival Data Analysis”

Abstract: The varying coefficient model has been extensively studied over the past 20 years since it was first proposed. This gives an easy interpretation due to its direct connectivity to the classical linear model and is very flexible since nonparametric functions which accommodate various nonlinear interaction effects between covariates are admitted in the model. The model has been extended to various situations. In this talk, the model will be introduced specifically in the context of survival data analysis, and some research findings will be presented.

Location: Building D, Warwick Evans Conference Room

Friday, October 25, 2024 at 10:00 am

Yixin Fang, Ph.D.
Director, Medical Affairs and Health Technology Assessment (MA & HTA) Statistics Research Fellow, AbbVie Community of Science

“Causal Inference in Pharmaceutical Statistics”

Abstract: In the literature of causal inference, a variety of statistical methods have been proposed to adjust for confounding bias. However, it is challenging for the users of these methods to understand the statistical properties enjoyed by each method and then explicitly specify its underlying model assumptions. In this presentation, I will discuss with you two basic statistical strategies of conducting causal inference in non-interventional studies, which lead to many commonly used methods. These two strategies are the weighting strategy and the standardization strategy. The weighting strategy defines a target estimand using a propensity-score model (treatment assignment ~ confounders), while the standardization strategy defines an estimand using an outcome-regression model (outcome variable ~ treatment assignment + confounders). Although these two strategies are different at the beginning, at the end they are robust for estimating the treatment effect under the same set of identifiability conditions and therefore the same kind of sensitivity analysis is needed for evaluating the impact caused by the violation of these conditions. The materials in this presentation are selected from my book titled “Causal Inference in Pharmaceutical Statistics” published recently.

Location: Online via Zoom

Friday, September 27, 2024 at 10:00 am

Seo Young Park, Ph.D.
Assistant Professor, Department of Statistics and Data Science, Korea National Open University , South Korea
Visiting Scholar, Department of Biostatistics, Bioinformatics, & Biomathematics, Georgetown University

“Marginal Structural Model to Estimate the Effect of Time-Varying, Nonrandomized Treatment”

Abstract: Estimation of the causal effect of time-varying treatment based on longitudinal data from observational study is a common problem in clinical science. When there are time-varying confounders that are also intermediate factors between the time-varying treatment and outcome, standard approaches to control for confounders can lead to substantial bias in estimates of treatment effect. We describe Marginal Structural Models (MSM) and Inverse-Probability-of-Treatment-Weighted (IPTW) estimators, which can provide unbiased estimates of causal effects when there are time-varying confounders which are also mediators. We apply MSM to the data of patients with ankylosing spondylitis to estimate the effect of biologics on radiographic progression controlling for the effect of inflammation. Here, inflammation affects the subsequent biologics prescription and the radiographic progression, as well as is affected by previous biologics administration. We demonstrate how this method corrects for the imbalance in inflammation at the time of treatment initiation vs. discontinuation, and thus provides an unbiased estimate of the biologic effect.

Location: Building D, Warwick Evans Conference Room

Friday, September 19, 2024 at 10:00 am

Anru Zhang, Ph.D.
Associate Professor, Department of Biostatistics & Bioinformatics, Department of Computer Science, Duke University

“High-order Singular Value Decomposition in Tensor Analysis”

Abstract: The analysis of tensor data, i.e., arrays with multiple directions, is motivated by a wide range of scientific applications and has become an important interdisciplinary topic in data science. In this talk, we discuss the fundamental task of performing singular value decomposition on tensors, exploring both general cases and scenarios with specific structures like smoothness and longitudinally. Through the developed frameworks, we can achieve accurate denoising for 4D scanning transmission electron microscopy images; in longitudinal microbiome studies, we can extract key components in the trajectories of bacterial abundance, identify representative bacterial taxa for these key trajectories, and group subjects based on the change of bacteria abundance over time. We also showcase the development of statistically optimal methods and computationally efficient algorithms that harness valuable insights from high-dimensional tensor data, grounded in theories of computation and non-convex optimization.

Location: Building D, Warwick Evans Conference Room

Friday, September 13, 2024 at 10:45 am

Dylan Cable, Ph.D.
Assistant Professor, School of Public Health, University of Michigan

“A Statistical Framework for Characterizing Cellular Behavior in Spatial Transcriptomics”

Abstract: Spatial transcriptomics technologies are an emerging class of high-throughput sequencing methodologies for measuring gene expression at near single-cell resolution at spatially-defined measurement spots across a biological tissue. We show how measuring cells in their native environment has the potential to identify spatial patterns of cell types, cell-to-cell interactions, and spatial variation in cellular behavior. However, several technical challenges necessitate the development of appropriate statistical methods, including additive mixtures of single cells, overdispersion, and technical platform effects across technologies. We develop a statistical framework accounting for these challenges to identify cell types within spatial transcriptomics datasets. We extend this approach to a general regression framework that can, accounting for multiple replicates, learn cell type-specific differential gene expression (DE) across many scenarios including DE across spatial regions and due to cell-to-cell interactions. We apply our framework to a metastatic tumor clone and discover an association between immune cell localization and an epithelial-mesenchymal transition of cancer cells. We also discuss extensions and future research.

Location: Online via Zoom

Seminars from Spring 2024

Friday, April 26, 2024 at 10:00 am

Peter H. Gruber, Ph.D.
Data Economist and Senior Lecturer, Università della Svizzera Italiana (USI) at Lugano, Switzerland

“How Large Language Models Support Statistical Analysis”

Abstract: It seems like a contradiction: how can a language model help deeply with mathematical tasks in data science and statistical analysis? Yet the Data Analyst GPT promises to do exactly that and has become one of the most widely used functions of ChatGPT. In this seminar, I will discuss why many statistical problems are at their core language and translation problems, why it is important to use precise statistical language, what a statistical tribe is and how data, language and statistical computation can be integrated with the help of generative AI. I will show several practical examples of how ChatGPT changes the data science landscape and conclude by discussing this technology changes the skill set required from a modern data scientist.

Location: Online via Zoom

Friday, April 12, 2024 at 10:00 am

Hana Lee, Ph.D.
Senior Statistical Reviewer, Office of Biostatistics, Center for Drug Evaluation and Research (CDER), Food and Drug Administration (FDA)

Biostatistics at the FDA

Abstract: This talk will provide an introduction to Biostatistics at the FDA and Center for Drug Evaluation and Research (CDER), which includes (1) responsibilities of FDA and CDER in general, (2) new drug regulation process and role of CDER statisticians in the office of Biostatistics (OB), and (3) opportunities to join or work with FDA for Master students, PhD students and faculty in Biostatistics. The OB is actively hiring future statistical reviewers and analysts, and this seminar will be a great opportunity to learn about scientific careers at FDA including fellowship opportunities. This talk will also provide a brief overview of research and other regulatory collaboration opportunities/examples for faculty.

Location: New Research Building, W402

Friday, March 22, 2024 at 10:00 am

Adrian Dobra, Ph.D.
Professor, Department of Statistics, University of Washington

“Statistical Modeling of Human Mobility Data”

Abstract: Human mobility, or movement over short or long spaces for short or long distances of time, is an important yet under-studied phenomenon in the social and demographic sciences. While there have been consistent advances in understanding migration (more permanent movement patterns) and its impact on human well-being, macro-social, political, and economic organization, advances in studies of mobility have been stymied by difficulty in recording and measuring how humans move on a minute and detailed scale. Today a broad range of spatial data are available for studying human mobility, such as geolocated residential histories, high-resolution GPS trajectory data, and large-scale human-generated geospatial data sources such as mobile phone records and geolocated social media data.

Statistical approaches will be presented, that take advantage of these types of geospatial data sources to measure the geometry, size and structure of activity spaces, to assess the temporal stability of human mobility patterns, and to study the complex relationship between population mobility and the risk of HIV acquisition in South Africa.

Location: Online via Zoom

Friday, February 23, 2024 at 10:00 am

Ryan Sun, Ph.D.
Assistant Professor, Department of Biostatistics, University of Texas MD Anderson Cancer Center

“InterpretableLarge-Scale Testing of Composite Null Hypotheses for Translational Genetics Studies in Modern Biobanks”

Abstract: The increasing availability of massive, publicly available biomedical compendiums such as the UK Biobank has generated much interest in genetic study designs that test composite null hypotheses. Specifically, important approaches such as causal mediation analysis, pleiotropy analysis, and replication analysis have become much more feasible with advancements in data access and infrastructure. Although these analyses address different scientific questions, the underlying statistical goal is to determine whether all null hypotheses in a set of individual tests should simultaneously be rejected. In contrast, past genetic studies were much more focused on testing global null hypotheses, with the goal of determining whether at least one individual null should be rejected. Various recent methodology has been proposed for composite null situations, and an appealing empirical Bayes strategy is to apply the well-known two-group model, calculating local false discovery rates (lfdr) for each set of hypotheses. However, in practice, such a strategy is challenged by the need for difficult multivariate density estimation, leading to poor operating characteristics and uninterpretable lfdr-values that contradict standard intuition about statistical significance and p-values. This work proposes a model to simplify two-group testing in composite null settings. The model demonstrates more robust operating characteristics than recently-proposed alternatives while also offering provable interpretability guarantees, harmonizing empirical Bayes lfdr-values and frequentist test statistics. We demonstrate application on a collection of translational lung cancer genetic association studies that motivated this work.

Location: Online via Zoom

Friday, February 9, 2024 at 10:00 am

Sholto David, Ph.D.
Analytical Scientist

“Identifying Errors and Manipulation in the Scientific Literature, with a Focus on Images”

Abstract: The seminar will discuss different types of image errors and how to identify them. Literature related to the scale of the problem will be summarized, and I will provide some examples of errors in different journals, subject areas, research groups, and institutions. I will also discuss my personal experience of identifying over 2000 papers with problematic images, my quixotic efforts to resolve them, and I will offer some thoughts on why I think image (and other) errors matter. Finally, the floor will be open for criticisms (and questions if you like).

Location: Online via Zoom

Friday, January 26, 2024 at 10:00 am

Keegan Hines, Ph.D.
Principal Applied Scientist, Microsoft

“Generative AI: History and Implications for Biological Research”

Abstract: In the span of just a few years, generative models have evolved from a scientific curiosity into an everyday tool used by many. In this talk, I’ll present a historical overview of major developments in the field. In particular, I will focus on generative language models and the key moment in deep learning research that have led to today’s powerful LLMs. I will then close by focusing on nascent research in molecular biology that uses these tools to advance fundamental scientific questions.

Location: Online via Zoom

Seminars from Fall 2023

Friday, November 10, 2023 at 10:00 am

Molin Wang, Ph.D.
Associate Professor, Department of Epidemiology and Biostatistics, Harvard T. H. Chan School of Public Health/ Harvard Medical School, Harvard University

“Survival Analysis Adjusting for Measurement Error in a Cumulative Exposure Variable: Radon Progeny to Lung Cancer Mortality”

Abstract: Exposure measurement error is a common occurrence in various epidemiological fields, with radiation epidemiology at the top of the list. Failure to properly assess and adjust for uncertainties in radiation dosimetry could lead to biased effect estimates. Moreover, characterizations of health impacts obtained without countering error in exposure levels could potentially misinform policy makers, when they are, for example, setting the radiation safety levels in occupational and residential settings referencing unadjusted dose-response relationships between error-prone radiation levels and observed adverse health outcomes. Therefore, from both the statistical advancement and public health policy perspectives, it is of great importance to develop and discuss statistical methods in countering the influences of such exposure measurement error and providing valid health outcome effects into the policy decision pipeline. In this talk, I will present statistical methods for estimating exposure-outcome associations adjusting for the exposure measurement errors, when the exposure takes the form of a cumulative total. The proposed methods will be illustrated using data from the field of radiational epidemiology.

Location: Online via Zoom

Friday, October 27, 2023 at 10:00 am

Daniel Almirall, Ph.D.
Associate Professor, Institute for Social Research, Department of Statistics, University of Michigan

“Multi-level Adaptive Implementation Strategies (MAISYs): Design Principles, Optimization Questions and Choosing the Right Experimental Design”

Abstract: Evidence-based practices often fail to be implemented or sustained due to barriers at multiple levels of an organization (e.g., system-level, practitioner-level). A growing cadre of implementation strategies can help mitigate challenges at these multiple levels, but significant heterogeneity exists in whether, and to what extent, organizations—and the practitioners who deliver treatment within them—respond to different strategies. However, it is impractical to provide all (or even most) of these strategies to all levels, at all times. This suggests the need for an approach that sequences and adapts the provision of implementation strategies to the changing context and needs of practitioners within the multiple levels of an organization. A multilevel adaptive implementation strategy (MAISY) offers a replicable, approach to precision implementation that guides implementers in how best to adapt and re-adapt (e.g., augment, intensify, switch) implementation strategies based on the changing context and changing needs at multiple levels.

Location: Online via Zoom

Friday, October 13, 2023 at 10:00 am

Ding-Geng (Din) Chen, Ph.D.
Executive Director and Professor in Biostatistics, College of Health Solutions, Arizona State University, SARCHI Research Professor in Biostatistics, Department of Statistics, University of Pretoria, South Africa

“Big Data Inference and Statistical Meta-Analysis”

Abstract: Statistical meta-analysis (MA) is a common statistical approach in big data inference to combine meta-data from diverse studies to reach a more reliable and efficient conclusion. It can be performed by either synthesizing study-level summary statistics (MA-SS) or modeling individual participant-level data (MA-IPD), if available. However, it remains not fully understood whether the use of MA-IPD indeed gains additional efficiency over MA-SS. In this talk, we review the classical fixed-effects and random-effects meta-analyses, and further discuss the relative efficiency between MA-SS and MA-IPD under a general likelihood inference setting. We show theoretically that there is no gain of efficiency asymptotically by analyzing MA-IPD. Our findings are further confirmed by extensive Monte-Carlo simulation studies and real data analyses.

*This talk is based on the joint publication: Chen, D.G, Liu, D., Min, X. and Zhang H. (2020). Relative efficiency of using summary and individual information in random-effects meta-analysis. Biometrics, 76(4): 119-1329. (https://doi.org/10.1111/biom.13238).

Location: Online via Zoom

Friday, September 22, 2023 at 10:00 am

Bret Musser, Ph.D.
Executive Director, Head of Biostatistics, Biostatistics & Data Management, Regeneron

“Unlocking the Potential of Next-Generation Clinical Trials Together: An Overview of the Regeneron-Georgetown University Partnership”

Abstract: The increasing complexity of clinical research objectives has fueled the demand for the next-generation clinical trials with more effective designs and analysis strategies. For example, medicines such as gene therapies have features vastly different from those of typical drugs, which presents a new challenge for statisticians in designing their dose-finding, proofs of concept, and confirmatory studies. In addition, regulatory agencies have been promoting complex and innovative designs for the industry, such as clinical trials for rare diseases, leveraging real-world data and real-world evidence in clinical trials, clinical trial with master protocols, and many more. These opportunities for developing next-generation clinical trials also come with unprecedented statistical difficulties. To conquer such difficulties and fully unlock the potential of next-generation clinical trials, a strong partnership between industry and academic stakeholders are needed. In this presentation, we will examine the potential of the Regeneron-Georgetown partnership in modernizing the clinical studies industry are conducting.

Location: Building D, Warwick Evans Conference Room

Friday, September 8, 2023 at 10:00 am

Shelby Haberman, Ph.D.
Educator, Statistician

“Measures of Agreement and Measures of Prediction Accuracy”

Abstract: Measures of agreement are compared to measures of prediction accuracy. Differences in appropriate use are emphasized, and approaches are examined for both numerical and nominal variables. General estimation methods are developed, and their large-sample properties are compared.

Location: Online via Zoom

Seminars from Spring 2023

Friday, January 13, 2023 at 10:00 am

Li C. Cheung, Ph.D.
Stadtman Investigator, Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NCI

“Development of Rick-Based Management Guidelines for Abnormal Cervical Screening Results”

Abstract: To address the appropriate management of individuals under an ever-changing screening landscape (i.e., the introduction of new screening technology, electronic health records providing greater access to patient histories, and HPV vaccination), representatives from 19 professional organizations agreed to change from issuing recommendations based on test results to recommendations based on precancer risk and a pre-agreed set of clinical action risk thresholds. Using electronic-health records from nearly 2 million women undergoing routine screening from 2003 to 2017, we estimated precancer risk for combinations of screening test results and relevant past histories. Because there can be precancers prevalent at the initial screen and precancer status is intermittently observed, resulting in left-, interval-, and right-censored time of precancer onset, we fit the data using prevalence-incidence mixture models (i.e., jointly estimated logistic regression and proportional hazards models). To inform the consensus risk thresholds, we provided to the working groups estimates of delayed diagnosis vs. colposcopic efficiency trade-offs. The new risk-based management recommendations were then externally validated using data from 2 trials, the New Mexico HPV-precancer registry, and a CDC program that provided screening for underinsured and uninsured individuals.

Location: Online via Zoom

Friday, January 27, 2023 at 10:00 am

Gary Cline, Ph.D. & Lan-Feng Tsai, M.S.
Early Biometrics & Statistical Innovation Data Science & Artificial Intelligence, R&D AstraZeneca, Gaithersburg, US

“Statisticians’ Career in the Pharmaceutical Industry”

Abstract: Drug development is a complex and expensive scientific endeavor. Statisticians play a unique role in the pharmaceutical industry with their quantitative training. Their work impacts business decisions that drive the success of a drug development. They are involved in all stages of a clinical trial, from trial design, protocol writing, data collection, to data analysis and results interpretation. Their contributions do not stop there. They play a pivotal role in the entire clinical development program and its lifecycle management. They have the opportunities to innovate clinical trials, to advance science, and to create new drugs that benefit millions of patients. In this presentation, we will provide an overview of how statisticians contribute to the drug development and what they do in their daily life from an early phase biometrics point of view. Particularly, we will cover some of the exciting innovations that we are working on at AstraZeneca.

Location: Building D, Warwick Evans Conference Room

Friday, February 10, 2023 at 10:00 am

Yi Zhao, Ph.D.
Assistant Professor, Department of Biostatistics and Health Data Science, Indiana University School of Medicine

“Principal Regression of Covariance Matrix Outcomes”

Abstract: In this study, we consider the problem of regressing covariance matrices on covariates of interest. The goal is to use covariates to explain variation in covariance matrices across units. Building upon our previous work, the Covariate Assisted Principal (CAP) regression, an optimization-based method for identifying components associated with the covariates using a generalized linear model, two approaches for high-dimensional covariance matrix outcomes will be discussed.Our studies are motivated by resting-state functional magnetic resonance imaging (fMRI) studies. In the studies, resting-state functional connectivity is an important and widely used measure of individual and group differences. Yet, extant statistical methods are limited to linking covariates with variations in functional connectivity across subjects, especially at the voxel-wise level of the whole brain. Our work introduces modeling approaches that regress whole-brain functional connectivity on covariates and enable identification of brain sub-networks. The first approach identifies subnetworks that are composite of spatially independent components discovered by a dimension reduction approach (such as whole-brain group ICA) and covariate-related projections determined by the CAP regression. The second approach directly performs generalized linear regression by introducing a well-conditioned linear shrinkage estimator of the high-dimensional covariance matrix outcomes, where the shrinkage coefficients are proposed to be common across matrices. The superior performance of the proposed approaches over existing methods are illustrated through simulation studies and resting-state fMRI data applications.

Location: Online via Zoom

Friday, March 24, 2023 at 10:00 am

James O’Malley, Ph.D.
Peggy Y. Thomson Professorship in the Evaluative Clinical Sciences, Department of Biomedical Data Science, The Dartmouth Institute of Health Policy and Clinical Practice, Geisel School of Medicine at Dartmouth

“Constructing Optimal Shared-Patient Physician Networks to Advance Study of the Diffusion of Medical Practices”

Abstract: Social network analysis has created a productive framework for the analysis of the histories of patient-physician interactions and physician collaboration. Notable is the construction of networks based on the data of “referral paths” — sequences of patient-specific temporally linked physician visits — in this case, culled from a large set of Medicare claims data in the United States. Network constructions depend on a range of choices regarding the underlying data. In this talk we first introduce the use of a five-factor experiment that produces 80 distinct projections of the bipartite patient-physician mixing matrix to a unipartite physician network derived from referral path data, which is further analyzed at the level of the 2,219 hospitals in the final analytic sample. We summarize the networks of physicians within a given hospital using a range of directed and undirected network features (quantities that summarize structural properties of the network such as its size, density, and reciprocity). The different projections and their underlying factors are evaluated in terms of the heterogeneity of the network features across the hospitals and association with a hospital-level outcome. In the second part of the talk, we use the findings from the first part to construct a shared-patient network of 10,661 physicians who delivered care to Medicare patients in Ohio that we examine for associations with the physicians’ risky prescribing behaviors. Risky-prescribing is the excessive or inappropriate prescription of drugs (e.g., Opioids) that singly or in combination pose significant risks of adverse health outcomes. This enables the novel decomposition of peer-effects of risky-prescribing into directional (outbound and inbound) and bidirectional (mutual) relationship components. Using this framework, we develop models of peer-effects for contagion in risky-prescribing behavior. Estimated peer-associations were strongest when the patient-sharing relationship was mutual as opposed to directional. Using simulations, we confirmed that our modeling and estimation strategies accurately and precisely estimate each type of peer-effect (mutual and directional). We also show that failing to account for these distinct mechanisms, a form of model misspecification, produces misleading results, demonstrating the importance of retaining directional information in the construction of physician shared-patient networks.

Location: Online via Zoom

Friday, April 14, 2023 at 10:00 am

Ronald L. Wasserstein, Ph.D.
Executive Director, American Statistical Association (ASA)

“Moving to a World Beyond p<0.05”

Abstract: For nearly a hundred years, the concept of “statistical significance” has been fundamental to statistics and to science. And for nearly that long, it has been controversial and misused as well. In a completely non-technical (and generally humorous) way, ASA Executive Director Ron Wasserstein will explain this controversy, and say why he and others have called for an end to the use of statistical significance as means of determining the worth of scientific results. He will talk about why this change is so hard for the scientific community to make, but why it is good for science and for statistics, and will point to alternate approaches.

Location: Warwick Evans Conference Room, Building D

Friday, April 28, 2023 at 10:00 am

David L. Rosen, BS Pharm., JD
Partner, Foley & Lardner LLP

“Understanding Biosimilars and FDA’s Position on Interchangeability”

Abstract: This session will provide you with an understanding of what a biosimilar product is and how it compares to the reference biologic product. The session will include a review of how a company establishes that the proposed biosimilar is highly similar to the reference product. In addition, the session will provide an overview of how FDA has handled interchangeability of biosimilar insulin products for the reference product. The session will also review how someone can find information about biosimilar and reference products in FDA’s Purple Book as well as information on small molecules in FDA’s Orange Book.

Location: Online via Zoom

Seminars from Fall 2022

Friday, September 23, 2022 at 10:00 am

Junrui Di, Ph.D.
Associate Director, Early Clinical Development Digital Sciences & Translational Imaging Quantitative Sciences, Pfizer Inc.

“Wearable and Digital Devices: Methodological Advancements and Their Applications in Clinical and Public Health Research”

Abstract: In recent years, wearable and digital devices are gaining popularity in clinical trials and public health research due to technology advances, thanks to their reduced size, prolonged battery life, larger storage, and faster data transmission. In public health, countless research has been conducted to better understand the complex signals collected by wearable and digital devices and their relationship with human health. In clinical trials, emphasis has been given to the derivation and validation of novel digital endpoints and proper statistical methods to model such digital endpoints to objectively quantify disease progression and treatment effects. Such devices provide us a unique opportunity to reconsider the way data can be collected, from snapshot in-clinic measurements to continuous recording of behaviors of daily life. This presentation will highlight some representative examples of statistical and machine learning methodologies and applications of wearable and digital devices in health research.

Location: Online via Zoom

Friday, October 14, 2022 at 10:00 am

Deepak Parashar, Ph.D.
Associate Professor, University of Warwick, England, UK

“Designing Clinical Trials for Dynamic Precision Medicine in Cancer”

Abstract: Current precision medicine of cancer matches therapies to patients based on average molecular properties of the tumour, resulting in significant patient benefit. However, despite the success of this approach, resistance to drugs develops leading to variability in the duration of response. The approach is based on static molecular patterns observed at diagnosis whereas cancers are constantly evolving. We, therefore, focus on Dynamic Precision Medicine, an evolutionary-guided precision medicine strategy that explicitly considers intra-tumour heterogeneity and subclonal evolution and plans ahead in order to delay or prevent resistance. Clinical validation of such an evolutionary strategy poses challenges and requires bespoke development of clinical trial designs. In this talk, I will present preliminary results on the construction of such trial designs. The work is a joint collaboration with Georgetown (Robert Beckman, Matthew McCoy).

Location: Online via Zoom

Friday, October 28, 2022 at 10:00 am

Aaron C. Courville, Ph.D.
Associate Professor, Department of Computer Science and Operations Research, Université de Montréal, Canada

“Have you tried resetting it? Parameter resets in supervised and reinforcement learning.”

Abstract: In this presentation, I’ll talk about recent work in my group that looked into a strange finding: when training neural networks, periodically resetting some or all parameters can be helpful in promoting better solutions. Beginning with a discussion of our findings in supervised learning where we relate this strategy to Iterated Learning, a method for promoting compositionality in emergent languages. We then show how parameter resets appear to offset a common flaw of deep reinforcement learning (RL) algorithms: a tendency to rely on early interactions and ignore useful evidence encountered later. We apply this reset mechanism to algorithms in both discrete (Atari 100k) and continuous action (DeepMind Control Suite) domains and observe consistently improving performance. I conclude with some recent finding and speculation about the underlying causes behind the observed effects of parameter resets.

Location: Online via Zoom

Friday, November 11, 2022 at 10:00 am

Robert Lund, Ph.D.
Professor and Chair, Department of Statistics, University of California, Santa Cruz

“Changepoint Issues and Climate Controversies”

Abstract: This talk introduces changepoint issues in time-ordered data sequences and discusses their uses in resolving climate problems. An asymptotic description of the single mean shift changepoint case is first given. Next, a penalized likelihood method is developed for the multiple changepoint case from minimum description length information theory principles. Optimizing the objective function yields estimates of the changepoint numbers and location time(s). The audience is then walked through an example of a climate precipitation homogenization. The talk closes by addressing the climate hurricane controversy: are North Atlantic Basin hurricanes becoming more numerous and/or stronger?

Location: Online via Zoom

Seminars from Spring 2022

Friday, January 14, 2022 at 10:00 am

Ming-Hui Chen, Ph.D.
Board of Trustees Distinguished Professor, Department of Statistics, University of Connecticut, CT

“Flexible Conditional Borrowing Approaches for Leveraging Historical Data in the Bayesian Design of Superiority Trials”

Abstract: In this paper, we consider the Bayesian design of a randomized, double-blind, placebo-controlled superiority clinical trial. To leverage multiple historical data sets to augment the placebo-controlled arm, we develop three conditional borrowing approaches built upon the borrowing-by-parts prior, the hierarchical prior, and the robust mixture prior. The operating characteristics of the conditional borrowing approaches are examined. Extensive simulation studies are carried out to empirically demonstrate the superiority of the conditional borrowing approaches over the unconditional borrowing or no-borrowing approaches in terms of controlling type I error, maintaining good power, having a large “sweet-spot” region, minimizing bias, and reducing the mean squared error of the posterior estimate of the mean parameter of the placebo-controlled arm. Computational algorithms are also developed for calculating the Bayesian type I error and power as well as the corresponding simulation errors. This is a joint work with Wenlin Yuan and John Zhong.

Location: Online via Zoom

Friday, February 11, 2022 at 10:00 am

Michelle Shardell, Ph.D.
Professor, Department of Epidemiology and Public Health and the Institute for Genome Sciences, University of Maryland, MD

“Joint Mixed-Effects Models for Causal Inference with Longitudinal Data”

Abstract: Causal inference with observational longitudinal data and time-varying exposures is complicated due to the potential for time-dependent confounding and unmeasured confounding. Most causal inference methods that handle time-dependent confounding rely on either the assumption of no unmeasured confounders or the availability of an unconfounded variable that is associated with the exposure (e.g., an instrumental variable). Furthermore, when data are incomplete, validity of many methods often depends on the assumption of missing at random. The proposed approach combines a parametric joint mixed-effects model for the study outcome and the exposure with g-computation to identify and estimate causal effects in the presence of time-dependent confounding and unmeasured confounding. G-computation can estimate participant-specific or population-average causal effects using parameters of the joint model. The joint model is a type of shared parameter model where the outcome and exposure-selection models share common random effect(s). The joint model is also extended to handle missing data and truncation by death when missingness is possibly not at random. Performance of the proposed method is evaluated using simulation studies, and the method is compared to both linear mixed- and fixed-effects models combined with g-computation as well as to targeted maximum likelihood estimation. The method is applied to an epidemiologic study of vitamin D and depressive symptoms in older adults and can be implemented using SAS PROC NLMIXED software, which enhances accessibility of the method to applied researchers.

Location: Online via Zoom

Friday, February 25, 2022 at 10:00 am

Peter Müller, Ph.D.
Professor, Department of Mathematics, University of Texas at Austin, TX

“Single Arm Trials with a Synthetic Control Arm Built from RWD”

Abstract: Randomized clinical trials (RCT) are the gold standard for approvals by regulatory agencies. However, RCT’s are increasingly time consuming, expensive, and laborious with a multitude of bottlenecks involving volunteer recruitment, patient truancy, and adverse events. An alternative that fast tracks clinical trials without compromising quality of scientific results is desirable to more rapidly bring therapies to consumers. We propose a model-based approach using nonparametric Bayesian common atoms models for patient baseline covariates. This specific class of models has two critical advantages in this context: First, the models have full prior support, i.e., allow to approximate arbitrary distributions without unreasonable restrictions or shrinkage in specific parametric families; and second, inference naturally facilitates a re-weighting scheme to achieve equivalent populations. We prove equivalence of the synthetic and other patient cohorts using an independent separate verification. Failure to classify a merged data set using a flexible statistical learning method such as random forests, support vector machines etc. proves equivalence. We implement the proposed approach in two motivating case studies.

Location: Online via Zoom

Friday, March 25, 2022 at 10:00 am

Mehryar Mohri, Ph.D.
Professor, Computer Science, Courant Institute of Mathematical Sciences, New York University, NY

“Multiple-Source Adaptation Theory and Algorithms”

Abstract: We present a general theoretical and algorithmic analysis of the problem of multiple-source adaptation, a key learning problem in applications such as medical diagnosis, sentiment analysis, speech recognition, and object recognition. We will also report the results of several experiments demonstrating the effectiveness of our algorithms and showing that they outperform all previously known baselines.

Location: Online via Zoom

Friday, April 22, 2022 at 10:00 am

Torbjörn Callréus, M.D., Ph.D.
Medical Advisor, Malta Medicines Authority

“Regulatory Pharmacoepidemiology”

Abstract: Science applied in a medicines regulatory context (“regulatory science”) sometimes has features that differs from traditional “academic science”. At the centre of regulatory science is that analyses must be decision-relevant, timely, and occasionally must rely on poor data. This seminar will present examples of biostatistical analyses that can support regulatory decision-making in the post-authorisation surveillance phase (e.g. pharmacoepidemiology). Examples will include analyses relying on data from the network of population-based Nordic healthcare databases. Lastly, the challenges posed with the advent of Advanced Therapeutic Medicinal Products (e.g. gene and cell therapies) will be discussed. These therapies have characteristics that are different from traditional medicinal products thereby having implications for approaches to pre- and post-authorisation evaluation.

Location: Online via Zoom

Seminars from Fall 2021

Friday, September 10, 2021 at 10:00 am

Jared K. Lunceford, Ph.D.
Distinguished Scientist, Merck Research Laboratories, Early Development Statistics

“Statistical Bridging Studies for Companion Diagnostic Development: A case study in lung cancer.”

Abstract: Early or late phase clinical trials that aim to enroll a biomarker selected patient population are often initiated using a clinical trial assay (CTA) which may differ in assay components compared to the final companion diagnostic assay (CDx), potentially necessitating further clinical analysis to bridge study conclusions to results based on the CDx. There may be substantial missing data due to the retrospective nature of CDx sample testing. Key elements of the ideas behind bridging will be reviewed using a case study conducted for a randomized trial of pembrolizumab in second line metastatic non-small cell lung cancer. Emphasis is on methods aimed at constructing an imputation model to (1) confirm the robustness of clinical trial conclusions via the Bayesian posterior predictive distribution of the study’s intention-to-treat testing results and (2) conduct sensitivity analysis for estimands of the intention-to-diagnose population, while capturing all sources of variability.

Location: Online via Zoom

Friday, October 8, 2021 at 10:00 am

Margaret Gamalo, Ph.D.
Senior Director, Biostatistics, Pfizer

“Novel Applications of the use of Synthetic Control for Causal Estimation of Effects of Therapeutic Interventions within in the Framework of Augmented-Control Designs”

Abstract: Since 2014, synthetic controls, defined as a statistical method used to evaluate the comparative effectiveness of an intervention using a weighted combination of external controls in successful regulatory submissions in rare diseases and oncology is growing. In the next few years, the utilization of synthetic controls is likely to increase within the regulatory space owing to concurrent improvements in medical record collection, statistical methodologies, and sheer demand. In this talk, I will focus on existing and new strategies from the applications of synthetic controls in the framework of augmented control designs. This will include (1) matching strategies and use of entrophy balancing; (2) distinction of causal estimands in augmented designs; (3) Bayesian methodologies for incorporating external information; (4) novel adaptive designs incorporating external information.

Location: Online via Zoom

Friday, October 22, 2021 at 10:00 am

Jeanne Kowalski, Ph.D.
Professor, Department of Oncology in Dell Medical School, University of Texas, Austin, TX

“You had me at DNA! The Ongoing Dissection of DNA-Level Variations in Cancer”

Abstract: Big data is best viewed not in terms of its size but in terms of its purpose: if we can measure it all, maybe we can describe it all. In a molecular context, big data challenges the biomedical research community to describing all the levels of variation based on hundreds of thousands simultaneous measures and data types of DNA, RNA, and protein function alongside patient and tumor features. The bigger and more molecular data, the bigger the promise of advances and the greater the opportunities and challenges posed to the biomedical research community in our ability to harness the power of such data to obtain quick and reliable insights from it. Cancer research has been witnessed to many opportunities on the analytical front from the use of big data for big advances to usher in the era of precision medicine. We posit that the use of big data for big advances in cancer starts at the DNA-level and requires the synergy with, not replacement of, classical hypothesis-driven methods. We describe several methods, data types and metrics for detection of DNA variation at both the gene-and sample-level within the context of pancreatic cancer and their fit for purpose to improve our understanding of DNA level heterogeneity associated with phenotype diversity.

Location: Online via Zoom

Friday, November 12, 2021 at 10:00 am

Kristian Kleinke, Ph.D.
Senior Lecturer, Institute of Psychology, University of Siegen, Siegen, North Rhine-Westphalia, Germany

“Applied Multiple Imputation”

Abstract: Empirical data are seldom complete. Missing data pose a threat to the validity of statistical inferences when missingness is not a completely random process. Model based multiple imputation (MI) can make uses of all available information in the data file to predict missing information and can produce valid statistical inferences in many scenarios. In this talk, I give an introduction to MI, discuss pros and cons of MI and demonstrate how to use the popular mice package in R to create model based multiple imputations of missing values. Finally, I also show how to specify more advanced imputation models (using further add-ons to the mice package) for example for longitudinal count data based on piecewise growth curve models assuming a zero-inflated Poisson or negative Binomial data generating process.

Location: Online via Zoom

Seminars from Spring 2021

Friday, January 25, 2021 at 10:00 am

James Zou, Ph.D.
Assistant Professor of Biomedical Data Science, Computer Science and Electrical Engineering at Stanford University, CA

“Computer vision to phenotype human diseases across physiological and molecular scales”

Abstract: In this talk, new computer vision algorithms to learn complex morphologies and phenotypes that are important for human diseases will be presented. I will illustrate this approach with examples that capture physical scales from macro to micro: 1) video-based AI to assess heart function (Ouyang et al Nature 2020), 2) generating spatial transcriptomics from histology images (He et al Nature BME 2020), 3) and learning morphodynamics of immune cells. Throughout the talk I’ll illustrate new design principles/tools for human-compatible and robust AI that we developed to enable these technologies (Ghorbani et al. ICML 2020, Abid et al. Nature MI 2020).

Location: Online via Zoom

Friday, February 12, 2021 at 10:00 am

Yuedon Wang, Ph.D.
Professor, Department of Statistics and Applied Probability at the University of California, Santa Barbara, CA

“Smoothing spline mixed-effects density models for clustered data”

Abstract: Smoothing spline mixed-effects density models are proposed for the nonparametric estimation of density and conditional density functions with clustered data. The random effects in a density model introduce within-cluster correlation and allow us to borrow strength across clusters by shrinking cluster specific density function to the population average, where the amount of shrinkage is decided automatically by data. Estimation is carried out using the penalized likelihood and computed using a Markov chain Monte Carlo stochastic approximation algorithm. We apply our methods to investigate evolution of hemoglobin density functions over time in response to guideline changes on anemia management for dialysis patients. Smoothing spline mixed-effects density models are proposed for the nonparametric estimation of density and conditional density functions with clustered data. The random effects in a density model introduce within-cluster correlation and allow us to borrow strength across clusters by shrinking cluster specific density function to the population average, where the amount of shrinkage is decided automatically by data. Estimation is carried out using the penalized likelihood and computed using a Markov chain Monte Carlo stochastic approximation algorithm. We apply our methods to investigate evolution of hemoglobin density functions over time in response to guideline changes on anemia management for dialysis patients.

Location: Online via Zoom

Friday, February 26, 2021 at 10:00 am

Yimei Li, Ph.D.
Assistant Professor of Biostatistics, Department of Biostatistics, Epidemiology and Informatics at the University of Pennsylvania, PA

“Innovative Bayesian adaptive designs for early phase cancer clinical trials”

Abstract: Phase I oncology trials aim to identify the optimal dose that will be recommended for phase II trials, but the standard designs are inefficient and inflexible. In this talk, I will introduce two innovative Bayesian adaptive designs we developed to improve the efficiency of the trial or incorporate the complex features of the trial. In the first example, we propose PA-CRM, a design for pediatric phase I trials when concurrent adult trials are being conducted. The design automatically and adaptively incorporate information from the concurrent adult trial into the ongoing pediatric trial, and thus greatly improves the efficiency of the pediatric trial. In the second example, we propose a design for early phase platform trials, where multiple doses of multiple targeted therapies are evaluated in patients with different biomarkers, with an objective to identify the best drug at an efficacious and safe dose for the subpopulation defined by a given biomarker. This design addresses complex issues of such platform trials, incorporates information about toxicity and response outcomes from multiple tested doses and biomarkers, and maximizes potential patients benefit from the targeted therapies.

Location: Online via Zoom

Friday, March 12, 2021 at 10:00 am

Michael G. Hudgens, Ph.D.
Professor, Department of Biostatistics; Director, Center for AIDS Research (CFAR) Biostatistics Core at the University of North Carolina, Chapel Hill, NC

“Causal Inference in the Presence of Interference”

Abstract: A fundamental assumption usually made in causal inference is that of no interference between individuals (or units), i.e., the potential outcomes of one individual are assumed to be unaffected by the treatment assignment of other individuals. However, in many settings, this assumption obviously does not hold. For example, in infectious diseases, whether one person becomes infected may depend on who else in the population is vaccinated. In this talk we will discuss recent approaches to assessing treatment effects in the presence of interference.

Location: Online via Zoom

Friday, March 26, 2021 at 10:00 am

Pang Du, Ph.D.
Associate Professor, Department of Statistics at Virginia Tech, VA

“A new change point analysis problem motivated by a liver procurement study”

Abstract: Literature on change point analysis mostly requires a sudden change in the data distribution, either in a few parameters or the distribution as a whole. We are interested in the scenario where the variance of data may make a significant jump while the mean changes in a smooth fashion. The motivation is a liver procurement experiment monitoring organ surface temperature. Blindly applying the existing methods to the example can yield erroneous change point estimates since the smoothly-changing mean violates the sudden-change assumption. We propose a penalized weighted least squares approach with an iterative estimation procedure that integrates variance change point detection and smooth mean function estimation. The procedure starts with a consistent initial mean estimate ignoring the variance heterogeneity. Given the variance components the mean function is estimated by smoothing splines as the minimizer of the penalized weighted least squares. Given the mean function, we propose a likelihood ratio test statistic for identifying the variance change point. The null distribution of the test statistic is derived together with the rates of convergence of all the parameter estimates. Simulations show excellent performance of the proposed method. Application analysis offers numerical support to non-invasive organ viability assessment by surface temperature monitoring. Extension to functional variance change point detection will also be presented if time allows.

Location: Online via Zoom

Friday, April 9, 2021 at 10:00 am

Xiaofei Wang, Ph.D.
Professor, Department of Biostatistics and Bioinformatics at Duke University, NC

“An Integrated Analysis of Randomized Clinical Trials and Real World Evidence Studies: Theory and Application”

Abstract: In this talk, we exploit the complementing features of randomized clinical trials (RCT) and real world evidence (RWE) to estimate the average treatment effect of the target population. We will review existing methods in conducting integrated analysis of the data from RCTs and RWEs. We will then discuss in detail new calibration weighting estimators that are able to calibrate the covariate information between RCTs and RWEs. We will briefly review asymptotic results under mild regularity conditions, and confirm the finite sample performances of the proposed estimators by simulation experiments. In a comparison of existing methods, we illustrate our proposed methods to estimate the effect of adjuvant chemotherapy in early-stage resected non–small-cell lung cancer integrating data from a RCT and a sample from the National Cancer Database.

Location: Online via Zoom

Friday, April 23, 2021 at 10:00 am

Manisha Sengupta, Ph.D.
Acting Chief, Long-Term Care Statistics Branch National Center for Health Statistics (NCHS), MD

“National Post-acute and Long-term Care Study: a rich source of data on long-term care services providers and users.”

Abstract: In 2018, there were 52.4 million Americans aged 65 years and older and represented 16% of the population. By 2040, it is projected that there will be about 80.8 million people in this age group. The 85 and older population is projected to more than double from 6.5 million in 2018 to 14.4 million in 2040. Although people of all ages may need post-acute or long-term care services, the risk of needing these services increases with age.

The National Post-acute and Long-term Care study monitors trends in the supply, provision, and use of the major sectors of paid, regulated long-term care services. NPALS uses survey data on the residential care community and adult day services sectors, and administrative data on the home health, nursing home, hospice, inpatient rehabilitation, and long-term care hospital sectors. This presentation will describe the study, methodological issues, some findings from the study, and a brief discussion of future possibilities.

Location: Online via Zoom

Seminars from Fall 2020

Thursday, September 10, 2020 at 12:00 pm

Jing Qin, Ph.D.
Mathematical Statistician, Biostatistics Research Branch, National Institute of Allergy and Infectious Diseases (NIAID)

“Estimation of incubation period distribution of COVID-19 using disease onset forward time: A novel cross-sectional and forward follow-up study”

Abstract: We have proposed a novel, accurate low-cost method to estimate the incubation-period distribution of COVID-19 by conducting a cross-sectional and forward follow-up study. We identified those pre-symptomatic individuals at their time of departure from Wuhan and followed them until the development of symptoms. The renewal process was adopted by considering the incubation period as a renewal and the duration between departure and symptoms onset as a forward time. Such a method enhances the accuracy of estimation by reducing recall bias and using the readily available data. The estimated median incubation period was 7.76 days [95% confidence interval (CI): 7.02 to 8.53], and the 90th percentile was 14.28 days (95% CI: 13.64 to 14.90). By including the possibility that a small portion of patients may contract the disease on their way out of Wuhan, the estimated probability that the incubation period is longer than 14 days was between 5 and 10%.

This is a joint work with Dr. Andrew Zhou’s group at the Biostatistics Department in Peking University.

Location: Online via Zoom

Friday, September 25, 2020 at 10:00 am

Philip Westgate, Ph.D.
Associate Professor, Department of Biostatistics; Associate Director, Biostatistics & Research Design Core, Center for Health Equity Transformation, University of Kentucky

“Study Design and Statistical Analysis Options for Cluster Randomized Trials: Examples Utilizing Stepped Wedge Designs and Generalized Estimating Equations”

Abstract: Cluster randomized trials (CRTs) are often used in public health research in which the intervention is naturally applied to clusters, or groups, of individuals. Common examples include schools, medical care practices, and even entire communities. In this talk, we introduce possible study design and statistical analysis options for cluster randomized trial. Utilizing real-life examples, we focus on stepped wedge designs and the use of generalized estimating equations. Furthermore, because CRTs often involve only a small number of clusters, small-sample adjustments with the GEE approach are explored.

Location: Online via Zoom

Friday, October 9, 2020 at 10:00 am

Yevgen Tymofyeyev, Ph.D.
Scientific Director, Statistical and Decision Sciences, Janssen Research & Development (Johnson & Johnson)

“Multiple Testing Graphical Approaches for Group Sequential Designs, a Case Study”

Abstract: As nowadays confirmatory clinical trials become increasingly complex pursuing several goals simultaneously, the graphical testing approach is often used to address testing of multiple hypotheses of interest. This approach for a strong control of the familywise error rate (FWER) has several features that are appealing to clinical teams. For example, it allows for a structured visualized evaluation while developing strategies based on importance of hypotheses and corresponding success likelihoods.

At this presentation we consider recent extensions of the graphical testing procedure to group sequential designs using a case study involving “dual” primary endpoints (pCR and MFS) along with several key secondary endpoints. This constitutes potential multiple paths to establish efficacy based on results at interim and final analyses that needs to be statistically addressed. Furthermore, hypothesis tests might have different numbers of analysis points and different rates of accumulation of statistical information. Study design optimization and practical implementations will be explored.

Location: Online via Zoom

Friday, October 23, 2020 at 10:00 am

Ying Lu, Ph.D.
Professor, Department of Biomedical Data Science, Stanford University School of Medicine; Co-Director, Biostatistics Core, Stanford Cancer Institute

“A Composite Endpoint Approach to Compare Treatments Based on Patient Preferences of Multivariate Outcomes”

Abstract: Evaluation of medical products needs to consider its totality of evidence for benefits and harms that are measured through multiple relevant endpoints. The importance of these endpoints can vary for different clinical settings, clinicians, and patients. Evans and Follmann (2016) advocated the use the desirability of outcome ranking (DOOR) to integrate multiple outcomes according to their importance for the design and analysis of clinical trials. Wang and Chen (2019) proposed testing procedures for trend in benefit-risk analysis based on the importance of multiple outcomes. In these approaches, the priority levels can be determined by clinicians or based on survey of patients. In this paper, we proposed a stratified randomization design and a composite win ratio endpoint (Pocock et al. 2012) to evaluate the treatment benefits based on patient individually reported outcomes and importance. We introduce motivation examples to illustrate importance of patient heterogeneity in endpoint importance, discuss mathematical properties and difference of our proposed approach with others, and demonstrate the differences and advantages via design examples. We will also discuss our experience using this approach in the development of a shared decision making trial for AFib stroke prevention.

This is a joint work with Drs. Qian Zhao, Shiyan Yan, Lu TIan, Min M. Tu, and Jie Chen.

Location: Online via Zoom

Friday, November 13, 2020 at 10:00 am

Mihai Giurcanu, Ph.D.
Research Associate Professor, Biostatistics Laboratory, University of Chicago Medical Center

“Causal Inference using Generalized Empirical Likelihood”

Abstract: We propose a generalized empirical likelihood (GEL) method for causal inference based on a moment condition model that balances the treatment groups with respect to the potential confounders (e.g., as indicated by a directed acyclic graph). Allowing the number of moment conditions grow with the sample size, we show the asymptotic normality of the causal effects estimates without relying on regression models for the counterfactuals nor for the propensity scores. In a simulation study, we assess the finite sample properties of the proposed estimators in terms of the mean squared error and coverage of confidence intervals. We illustrate an application of the proposed method to data analysis of a training program study.

Authors: Pierre Chausse, Mihai Giurcanu, and George Luta

Location: Online via Zoom

Seminars from Spring 2020

Friday, January 10, 2020 at 10:00 am

Edsel Pena, Ph.D.
Professor, Department of Statistics, University of South Carolina

“The Search for Truth through Data: Fisher, Neyman & Pearson, Bayes, P-Values, and Knowledge Updating”

Abstract: The talk will concern the role of statistical thinking in the Search for Truth using data. This will bring us to a discussion of P-values, which has been, and still is, a central concept in Statistics and Biostatistics and has become a critical and highly controversial tool of scientific research. Recently it has elicited much, sometimes heated, debates and discussions. The American Statistical Association (ASA) was even compelled to release an official statement in early March 2016 regarding this issue, a psychology journal has gone to the extreme of banning the use of P-values in articles appearing in its journal, and a special issue of The American Statistician this year was fully devoted to this issue. The concerns about P-values has also been in relation to important issues of reproducibility and replicability in scientific research. This issue goes to the core of inductive inference and the different schools of thought (Fisher’s null hypothesis significance testing (NHST) approach, Neyman-Pearson paradigm, Bayesian approach, etc.) on how inductive inference (that is, the process of discovering the Truth through sample data) should be done. Some new perspectives on the use of P-values and on the search for truth through data will be discussed. In particular, we will discuss the representation of knowledge and its updating based on observations, and ask the question: “When given the P-value, what does it provide in the context of the updated knowledge of the phenomenon under consideration, and what additional information should accompany it?” This talk is based on the preprint in the Math ArXiV. (http://arxiv.org/abs/1910.05486)

Location: Warwick Evans Conference Room, Building D
4000 Reservoir Rd. NW, Washington, DC 20057

Friday, January 24, 2020 at 10:00 am

Tiemen Woutersen, Ph.D.
Professor, Department of Economics, Eller College of Management, University of Arizona

“Confidence Sets for Continuous and Discontinuous Functions of Parameters”

Abstract: Applied researchers often need to construct confidence sets for policy effects based on functions of estimated parameters, and often these functions are such that the delta method cannot be used. In this case, researchers currently use one of two bootstrap procedures, but we show that these procedures can produce confidence sets that are too small. We provide two alternatives, both of which produce consistent confidence sets under reasonable assumptions that are likely to be met in empirical work. Our second, more efficient method produces consistent confidence sets in cases when the delta method cannot be used, but is asymptotically equivalent to the delta method if the use of the latter is valid. Finally, we show that our second procedure works well when we conduct counterfactual policy analyses using a dynamic model for employment.

Location: Warwick Evans Conference Room, Building D
4000 Reservoir Rd. NW, Washington, DC 20057

Friday, February 14, 2020 at 10:00 am

Chris Amos, Ph.D.
Associate Director of Quantitative Science; Director for the Institute of Clinical & Translational Medicine, Baylor College of Medicine

“Understanding Complex Etiology of Diseases by Application of Machine Learning Tools”

Abstract: Genetic analyses have identified etiological factors for several complex disease but understanding how genetic factors interact to increase risk is challenging. In this talk I describe the development of novel methods for large-scale identification of population-level attributes that may affect genome-wide association studies. Because studies we conduct are planned for hundreds of thousands of samples, we had to develop machine-learning based procedures that could be generalized for identifying ethnic variability. We also evaluated tools for jointly estimating the effects of multiple genetic factors by using machine learning tools including random forests and classification trees. These particular analytical schemes identified novel interactions among alleles at multiple loci influencing risk for a rare autoimmune disease. We are now studying the efficacy of modeling with classification tree based analysis versus more traditional approaches such as logistic regression with lasso for high dimensional SNP studies.

Location: Warwick Evans Conference Room, Building D
4000 Reservoir Rd. NW, Washington, DC 20057

Friday, February 28, 2020 at 10:00 am

Chen Hu, Ph.D.
Associate Professor, Division of Oncology, Biostatistics & Bioinformatics, School of Medicine, Johns Hopkins University

“Utility of Restricted Mean Survival Time Related Methods in Oncology Clinical Trials”

Abstract: For oncology clinical trials, time-to-event endpoints, such as overall survival or progression-free survival, are used widely as key endpoints and of great interest. While the classic log-rank test and Cox proportional hazards model have been considered as the “default” analysis methods for statistical inference and for quantifying treatment benefit, we have witnessed many challenges and issues when they cannot be readily or properly applied. Furthermore, in cancer treatment trials we also concurrently observe disease-related longitudinal processes or multiple outcomes that are dependently censored by some “terminal” event like death, and there is increasing need and interest in developing and applying alternative and statistically valid metrics and inference tools to assess the overall disease prognosis and benefit-risk profile more efficiently and in a timely fashion. In recent years, restricted mean survival time (RMST) has gained growing interests as an important alternative metric to evaluate survival data. In this talk we will review and discuss these recent methodological developments, as well as their potential use and implications in oncology clinical trials and drug development.

Location: Warwick Evans Conference Room, Building D
4000 Reservoir Rd. NW, Washington, DC 20057

Friday, March 27, 2020 at 10:00 am

Laura Balzer, Ph.D.
Associate Professor, Department of Biostatistics & Epidemiology, School of Public Health & Health Sciences, University of Massachusetts Amherst

“Improving Community Health in East Africa with Causal Inference and Machine Learning”

Abstract: In this talk, we highlight the use of causal inference and machine learning methods to reduce bias when assessing disease burden and to improve precision when estimating intervention effects in randomized trials. We illustrate with data from the SEARCH Study, a large (>320,000-person) cluster randomized trial for HIV prevention and improved community health in rural Kenya and Uganda (NCT01864603).

Location: Zoom only

Seminars from Fall 2019

Friday, September 13, 2019 at 10:00 am

Xiaochun Li, Ph.D.
Associate Professor, Department of Biostatistics, School of Medicine, Fairbanks School of Public Health, University of Indiana

“A Machine Learning Approach to Causal Inference in the Presence of Missing Data”

Abstract: Observational medical databases increasingly find uses for comparative effectiveness and safety research. However, the lack of analytic methods that simultaneously handle the issues of missing data and confounding bias along with the onus of model specification, limit the use of these valuable data sources. We derived a novel machine-learning approach based on trees to estimate the average treatment effect. In order to evaluate causal estimation by model-free machine-learning methods in data with incomplete observations, we conducted a simulation study with data generated from known models of exposure, outcome and missing mechanisms. Thus the true causal effect was known and used as the benchmark for evaluations. Two settings were studied. We compared the bias and standard error of causal estimates from our method to a multiply robust parametric method, the complete case analysis (CC) and a regression analysis after multiple imputations (MI). The proposed methods were applied to a real observational data set of implantable cardioverter defibrillator use.

Location: Proctor Harvey Amphitheater, Med-Dent Building C104
3900 Reservoir Rd. NW, Washington, DC 20057

Friday, September 27, 2019 at 1:00 pm

Ming T. Tan, Ph.D.
Chair and Professor, Department of Biostatistics, Bioinformatics & Biomathematics, Georgetown University Medical Center

“Statistics and Data Science in Biomedicine I: Discovery, Design and Analysis of Multi-drug Combinations: from Experiments to Clinical Trials”

Abstract: This seminar presents applications of statistics and data science in the design and analysis of biomedical research studies including drug combinations, adaptive and/or targeted clinical trials designs, statistical learning and predictive modeling in head and neck cancer with a series of validation in three large randomized studies, and search for patient subgroups in precision medicine. Part I will focus on drug combinations which are the hallmark of therapies for complex diseases such as cancer. A statistical approach for evaluating the joint effect of the combination is necessary because even in vitro experiments often demonstrate significant variation in dose-effect. I will present a novel methodology for efficient design and analysis of preclinical drug combination studies with applications in the combination of Vorinostat and Ara-C and etoposide. Further discussion will be on the utilization of the preclinical data to guide early phase clinical trial design and a novel adaptive phase I design for drug combinations.

Location: Proctor Harvey Amphitheater, Med-Dent Building C104
3900 Reservoir Rd. NW, Washington, DC 20057

Friday, October 11, 2019 at 10:00 am

Robert Makuch, Ph.D.
Professor, Department of Biostatistics, School of Public Health, Yale University

“Treatment Effects Assessment in Multi-Regional Clinical Trials (MRCTs) Using Bayesian Methods”

Abstract: Multi-regional clinical trials (MRCTs) help to synchronize drug development globally, whereby the time lag in marketing authorization among different countries is minimized. However, there are statistical concerns associated with analysis and interpretation of MRCTs. The results of certain countries/regions could vary significantly from the overall results. In this case, controversy exists regarding the extent to which country-specific result should be minimized/ignored and medical scientists/regulators should defer to the overall global treatment effect. Rather than analyzing data separately in each region, our discussion today focuses on developing a Bayesian framework for assessing heterogeneity of regional treatment effects that leverages data collected in other regions. The goal is to make scientifically valid judgments about the play of chance versus real regional differences when comparing results to the overall trial outcome.

Location: New Research Building Auditorium
3970 Reservoir Rd. NW, Washington, DC 20057

Friday, October 25, 2019 at 10:00 am

Kellie J. Archer, Ph.D.
Professor and Chair, Division of Biostatistics, College of Public Health, Ohio State University

“Algorithmic and Statistical Methods for High-Dimensional Variable Selection”

Abstract:

Pathological evaluations are frequently reported on an ordinal scale. Moreover, diseases may progress from less to more advanced stages. For example, HCV infection progresses from normal liver tissue to cirrhosis, dysplastic nodules, and potentially to hepatocellular carcinoma. To elucidate molecular mechanisms associated with disease progression, genomic characteristics from samples procured from these different tissue types are often assayed using a high-throughput platform. Such assays result in a high-dimensional feature space where the number of predictors (P) greatly exceeds the available sample size (N). In this talk, various approaches to modeling an ordinal response when P>N will be described including algorithmic and statistical methods.

Location: New Research Building Auditorium
3970 Reservoir Rd. NW, Washington, DC 20057

Friday, November 8, 2019 at 10:00 am

Sergeant Jannie M. Clipp
Georgetown University Police Department

“RUN, HIDE, FIGHT”

Abstract: Based on best practices from federal law enforcement officials, this training program is designed to increase awareness of the “run, hide, fight” protocols in case of an active shooter or violent incident. The course, which uses the “See Something, Say Something” concept, as well as detailed steps in an active shooter incident, is taught by the Georgetown University Police Department (GUPD).

Location: New Research Building Auditorium
3970 Reservoir Rd. NW, Washington, DC 20057

Thursday, November 14, 2019 at 1:00 pm

Tai Xie, Ph.D.
CEO & Founder, Brightech International & CIMS Global

“Design and Monitor Clinical Trials with a Dynamic Adaptive System”

Abstract: Modern clinical trials need to be designed with much flexibility and efficiency. Flexibility includes adaptation based on new information while trial is ongoing and being monitored dynamically with cumulative data. Efficiency includes adequate study power and timely decision making regarding the future course of the trial. Both aspects require the protection of trial integrity and validity. Adaptive sequential designs have been proposed for clinical trials for the last decades. However, traditional computing environment could not accommodate trial monitoring at any time in a timely fashion. In this high-speed, AI – (artificial intelligence) – everywhere era, we introduce Dynamic Data Monitoring concept in which the cumulative data are monitored continuously, and the treatment effect is examined timely over the trial duration. Changing frequency and schedules of the interim analyses can be monitored at real time. The accumulating data can be viewed whenever new data is available and the timing of efficacy and futility assessments and sample size adaptation is made very flexibly, and the type I error rate is always controlled. Numerical and real study examples and simulations are presented. (Collaborated with Drs. Gordon Lan, Joe Shih, and Peng Zhang)

Location: Pre-Clinical Science Building, LA4
3900 Reservoir Rd. NW, Washington, DC 20057

Friday, November 22, 2019 at 10:00 am

Junyong Park, Ph.D.
Associate Professor, Department of Mathematics & Statistics, University of Maryland, Baltimore County

“Estimation of Species Richness and Rarefaction Curve using Nonparametric Empirical Bayes by Quadratic Optimization”

Abstract: We have proposed new algorithms to estimate the rarefaction curve and accordingly the number of species. The key idea in developing the algorithms is based on an interpolated rarefaction curve through quadratic optimization with linear constraints. Our proposed algorithms are easily implemented and show better performances than existing methods in terms of computational speed and accuracy. We also provide a criterion of model selection to choose tuning parameters and the idea of confidence interval. A broad range of numerical studies including simulations and real date examples are also conducted, and the gain that it produces has been compared to existing methods.

Location: Proctor Harvey Amphitheater, Med-Dent Building C104
3900 Reservoir Rd. NW, Washington, DC 20057

Seminars from Spring 2019

Friday, January 11, 2019 at 10:00 am

Faming Liang, Ph.D.
Professor, Department of Statistics, Purdue University

“Extended Stochastic Gradient MCMC Algorithms for Large-Scale Bayesian Computing”

Abstract: The stochastic gradient Markov chain Monte Carlo (SGMCMC) algorithms, such as stochastic gradient Langevin dynamics and stochastic gradient Hamilton Monte Carlo, have recently received much attention in Bayesian computing for large-scale data for which the sample size can be very large, or the dimension can be very high, or both. However, these algorithms can only be applied to a small class of problems for which the parameter space has a fixed dimension and the log-posterior density is differentiable with respect to the parameters. We propose a class of extended SGMCMC algorithms which, by introducing appropriate latent variables and utilizing Fisher’s identity, can be applied to more general large-scale Bayesian computing problems, such as those involving dimension jumping and missing data. For a large-scale dataset with sample size N and dimension p, the proposed algorithms can achieve a computational complexity of O(N^{1+\epsilon} p^{1-\epsilon’}) for some small constants \epsilon and \epsilon’, which is quite comparable with the computational complexity O(N p^{1-\epsilon’}) achieved in general by the stochastic gradient descent (SGD) algorithm. The proposed algorithms are illustrated using high-dimensional variable selection, sparse deep learning with large-scale data, and a large-scale missing data problem. The numerical results show that the proposed algorithms have a significant computational advantage over traditional MCMC algorithms and can be highly scalable when mini-batch samples are used in simulations. Compared to frequentist methods, they can produce more accurate variable selection and prediction results, while exhibiting similar CPU costs when the dataset contains a large number of samples. The proposed algorithms have much alleviated the pain of Bayesian methods in large-scale computing.

Location: Warwick Evans Conference Room, Building D
4000 Reservoir Rd. NW, Washington, DC 20057

Friday, January 25, 2019 at 10:00 am

Hui Quan, Ph.D.
Associate VP & Global Head, Methodology Group, Department of Biostatistics and Programming, Sanofi

“Considerations on trial design and data analysis of multi-regional clinical trials”

Abstract: Extensive research has been conducted in the Multi-Regional Clinical Trial (MRCT) area. To effectively apply an appropriate approach to a MRCT, we need to synthesize and understand the features of different approaches. In this presentation, numerical and real data examples are used to illustrate considerations regarding design, conduct, analysis and interpretation of result of MRCTs. We compare different models as well as their corresponding interpretations of the trial results. We highlight the importance of paying special attention to trial monitoring and conduct to prevent potential issues associated with the final trial results. Besides evaluating the overall treatment effect for the entire MRCT, we also consider other key analyses including quantification of regional treatment effects within a MRCT and the assessment of consistency of these regional treatment effects.

Location: Warwick Evans Conference Room, Building D
4000 Reservoir Rd. NW, Washington, DC 20057

Friday, February 8, 2019 at 10:00 am

Yanxun Xu, Ph.D.
Assistant Professor, Department of Applied Mathematics & Statistics, Whiting School of Engineering, Johns Hopkins University

“ASIED: A Bayesian Adaptive Subgroup-Identification Enrichment Design”

Abstract: Developing targeted therapies based on patients’ baseline characteristics and genomic profiles such as biomarkers has gained growing interests in recent years. Depending on patients’ clinical characteristics, the expression of specific biomarkers or their combinations, different patient subgroups could respond differently to the same treatment. An ideal design, especially at the proof of concept stage, should search for such subgroups and make dynamic adaptation as the trial goes on. When no prior knowledge is available on whether the treatment works on the all-comer population or only works on the subgroup defined by one biomarker or several biomarkers, it’s necessary to estimate the subgroup effect adaptively based on the response outcomes and biomarker profiles from all the treated subjects at interim analyses. To address this problem, we propose an Adaptive Subgroup-Identification Enrichment Design, ASIED, to simultaneously search for predictive biomarkers, identify the subgroups with differential treatment effects, and modify study entry criteria at interim analyses when justified. More importantly, we construct robust quantitative decision-making rules for population enrichment when the interim outcomes are heterogeneous. Through extensive simulations, the ASIED is demonstrated to achieve desirable operating char- acteristics and compare favorably against the alternatives.

Location: Warwick Evans Conference Room, Building D
4000 Reservoir Rd. NW, Washington, DC 20057

Friday, February 22, 2019 at 10:00 am

Steven G. Heeringa, Ph.D.
Senior Research Scientist and Associate Director, Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor

“Neuroscience on a Population Scale: Design, Measurement, Data Integration and Analysis”

Abstract: Neuroscience has a strong research tradition that employs experimental and observational studies in laboratory settings and controlled testing and evaluation in both clinical, educational and volunteer populations. In the past two decades, there has been increasing interest in conducting population-scale epidemiological studies of early age brain development and functioning as well as later age neurological functioning including cognitive impairment, dementias and Alzheimer’s disease. The data collected in these population-based studies is not restricted to observations on neurological systems and functioning but is collected in parallel with a wide array of information on participants’ life events, medical history, social and environmental exposures, genetics and genomics. This rich array of observational data has the potential to greatly advance our understanding of how complex neurological systems develop, are modified by internal or external factors or otherwise change over the life course. The growing field of epidemiological research also presents many challenging problems in design, measurement, data integration and analysis that those of us trained in biostatistics, bioinformatics and biomathematics will be called on to help to solve.

This presentation will use two cases studies to illustrate the nature of the statistical challenges in conducting population-scale neuroscientific research, describe current best practices and outline opportunities for future research. The first case study will be the Adolescent Brain Cognitive Development project (ABCD, https://abcdstudy.org), a 12-year longitudinal investigation of brain morphology and functional development in U.S. adolescents and teens. The second case study will focus on the challenges in design, measurement and analysis faced in special supplemental investigations of dementia and Alzheimer’s disease conducted under the auspices of the larger Health and Retirement Study (HRS, https://hrs.isr.umich.edu/about). Each case study review will include a description of the specific study challenges and current solutions. The major aim of this presentation is to increase awareness of these emerging lines of research and to promote interest on the part of the next generation of statisticians and data scientists who will be called upon to advance the various methodologies that will be required to better understand complex neurological systems and how they relate to our individual attributes and the world around us.

Location: Warwick Evans Conference Room, Building D
4000 Reservoir Rd. NW, Washington, DC 20057

Friday, April 12, 2019 at 10:00 am

Abera Wouhib, Ph.D.
Program Chief, Statistical Methods in Psychiatry Program, Adult Psychpathology & Psychosocial Interventions Research Branch, Division of Translational Research (DTR), NIH

“Estimation of Heterogeneity Parameters in Multivariate Meta-Analysis”

Abstract: Similar to its univariate counterpart, multivariate meta-analysis is a method to synthesize multiple outcome effects by taking in to account the available variance-covariance structure. It can improve efficiency over separate univariate syntheses and enables joint inferences across the outcomes. Multivariate meta-analysis is required to address the complexity of the research questions. Multivariate data can arise in meta-analysis due to several reasons. The primary studies can be multivariate in nature by measuring multiple outcomes for each subject, typically known as multiple-endpoint studies, or it may arise when primary studies involve several comparisons among groups based on a single outcome or measures several parameters. Although it possesses many advantages over the more established univariate counterpart, multivariate meta-analysis has some challenges including modelling and estimating the parameter of interests. Under random-effects model assumption, we discuss the methods of estimating the heterogeneity parameters and effect sizes of the multivariate data and its application by using illustrative example and simulation results.

Location: Warwick Evans Conference Room, Building D
4000 Reservoir Rd. NW, Washington, DC 20057

Friday, April 26, 2019 at 10:00 am

Ming Yuan, Ph.D.
Professor, Department of Statistics, Columbia University

“Quantitation in Colocalization Analysis: Beyond “Red + Yellow = Green”

Abstract: “I see yellow; therefore, there is colocalization.” Is it really so simple when it comes to colocalization studies? Unfortunately, and fortunately, no. Colocalization is in fact a supremely powerful technique for scientists who want to take full advantage of what optical microscopy has to offer: quantitative, correlative information together with spatial resolution. Yet, methods for colocalization have been put into doubt now that images are no longer considered simple visual representations. Colocalization studies have notoriously been subject to misinterpretation due to difficulties in robust quantification and, more importantly, reproducibility, which results in a constant source of confusion, frustration, and error. In this talk, I will share some of our effort and progress to ease such challenges using novel statistical and computational tools.

Location: Warwick Evans Conference Room, Building D
4000 Reservoir Rd. NW, Washington, DC 20057

Seminars from Fall 2018

Friday, September 14, 2018 at 10:00 am

Ying Zhang, Ph.D.
Professor and Director of Biostatistics Education, Department of Biostatistics, School of Public Health, Indiana University

“Model-Free Causal Inference in Observational Studies”

Abstract: Causal inference is a key component for comparative effectiveness research in observational studies. The inverse-propensity weighting (IPW) technique and augmented inverse-propensity weighting (AIPW) technique, which is known as a double-robust method, are the common methods for making causal inference in observational studies. However, these methods are known not stable, particularly when the models for propensity score and the study outcome are wrongly specified. In this work, we propose a model-free approach for causal inference. While possessing standard asymptotic properties, this method also enjoys excellent finite sample performance and robustness. Simulation studies were conducted to compare with the well-known IPW and AIPW methods for causal inference. A real-life example from an ongoing Juvenile Idiopathic Arthritis Study was applied for the illustration of the proposed method.

Location: New Research Building Auditorium
3970 Reservoir Rd. NW, Washington, DC 20057

Friday, September 28, 2018 at 10:00 am

Yangxin Huang, Ph.D.
Professor, Departments of Epidemiology and Biostatistics, College of Public Health, University of South Florida – Tampa

“Bayesian Joint Models for Multivariate Longitudinal and Event Time Data with Multiple Features, with an Application to Diabetes Study”

Abstract: Joint modeling of longitudinal and survival data is an active area of statistics research and becoming increasingly essential in most epidemiological and clinical studies. As a result, a considerable number of statistical models and analysis methods have been suggested for analyzing such longitudinal-survival data. However, the following issues may standout. (i) a common assumption for longitudinal variables is that model errors are normally distributed due to mathematical tractability and computational convenience. This requires the variables to be “symmetrically” distributed. A violation of this assumption could lead to misleading inferences; (ii) in practice, many studies are often to collect multiple longitudinal exposures which may be significantly correlated, ignoring their correlations may lead to bias and reduce efficiency in estimation; (iii) the longitudinal responses may encounter nonignorable missing; (iv) repeatedly measured observations in time are often interrelated with a time-to-event of interest. Inferential procedures may complicate dramatically when one analyzes data with these features together. Under the umbrella of Bayesian inference, this talk explores a multivariate mixed-effects joint models with skewed distributions for longitudinal measures with an attempt to mediate correlation from multiple responses, adjust departure from normality, and tailor accuracy from nonignorable missingness as well as overcome shortage of confidence in specifying a time-to-event model. A data set arising from diabetes study is analyzed to demonstrate the methodology. Simulation studies are conducted to assess the performance of the proposed joint models and method under various scenarios.

Location: Proctor Harvey Amphitheater, Med-Dent Building C105
3900 Reservoir Rd. NW, Washington, DC 20057

Friday, October 12, 2018 at 10:00 am

Kosuke Imai, Ph.D.
Professor of Government and of Statistics, Department of Statistics, Harvard University

“Causal Inference with Interference and Noncompliance in Two-Stage Randomized Experiments”

Abstract: In many social science experiments, subjects often interact with each other and as a result one unit’s treatment influences the outcome of another unit. Over the last decade, a significant progress has been made towards causal inference in the presence of such interference between units. Researchers have shown that the two-stage randomization of treatment assignment enables the identification of average direct and spillover effects. However, much of the literature has assumed perfect compliance with treatment assignment. In this paper, we establish the nonparametric identification of the complier average direct and spillover effects in two-stage randomized experiments with interference and noncompliance. In particular, we consider the spillover effect of the treatment assignment on the treatment receipt as well as the spillover effect of the treatment receipt on the outcome. We propose consistent estimators and derive their randomization-based variances under the stratified interference assumption. We also prove the exact relationship between the proposed randomization-based estimators and the popular two-stage least squares estimators. Our methodology is motivated by and applied to the randomized evaluation of the India’s National Health Insurance Program (RSBY), where we find some evidence of spillover effects on both treatment receipt and outcome. The proposed methods are implemented via an open-source software package.

Location: New Research Building Auditorium
3970 Reservoir Rd. NW, Washington, DC 20057

Friday, October 26, 2018 at 10:00 am

Vernon Chinchilli, Ph.D.
Distinguished Professor & Chair, Department of Public Health Sciences, Hershey College of Medicine, Penn State University

“N-of-1 Trials: Do They Have a Role in Clinical Research?”

Abstract: Physicians frequently use N-of-1 (single-patient) trial designs in an informal manner to identify an optimal treatment for an individual patient. An N-of-1 clinical trial that focuses exclusively on optimizing the primary outcome for a specific patient clearly may not be very useful for generalizability to a population of patients. A series or collection of N-of-1 clinical trials, however, could be generalizable. We review current literature on the design and analysis of N-of-1 trials, and this includes Bayesian approaches as well. We next describe the “Best African American Response to Asthma Drugs (BARD)” trial, which invokes a four-way crossover design and has the flavor of a series of N-of-1 trials. We propose a nonlinear mixed-effects model with a quadrinomial logistic regression for the analysis of the BARD data that constructs six pairwise comparisons of the four asthma treatments to (1) assess the optimal treatment for each study participant and (2) estimate population-level treatment comparisons.

Location: New Research Building Auditorium
3970 Reservoir Rd. NW, Washington, DC 20057

Wednesday, November 7, 2018 at 10:00 am

Michael Proschan, Ph.D.
Mathematical Statistician, Biostatistics Research Branch, National Institute of Allergy and Infectious Diseases (NIAID), NIH

“Probability Paradoxes That Actually Happened”

Abstract: Probability books sometimes present “cooked” counterexamples to warn students of the lurking dangers of compromising rigor. Can such anomalies actually occur in practice? This talk is proof that they can! I present actual examples from my clinical trials experience in which I either fell into, or almost fell into, probability traps. These experiences were a major motivation for our book, “Essentials of Probability Theory for Statisticians” (Proschan and Shaw, 2016, CRC Press, Taylor & Francis Group).

Location: New Research Building Auditorium
3970 Reservoir Rd. NW, Washington, DC 20057