Department & Faculty Research

The Department of Biostatistics, Bioinformatics and Biomathematics maintains an active research program both in the development of new biostatistical methodology and in the collaboration on important research projects in the prevention and treatment of cancer and other biomedical research areas.

The faculty in the Department have authored over 100 statistics and bioinformatics publications, and have co-authored more than 200 publications in various biomedical research areas. Our collaborative research drives statistical methodological research and our innovative statistical methods benefit the scientific investigations.

Our Research & Collaboration Objectives:

  • To collaborate with principal investigators and other clients on the biostatistical, bioinformatical, and biomathematical aspects of basic science, clinical, and population science research projects, especially those likely to lead to research grant support
  • To participate effectively in clinical trials by providing biostatistical, bioinformatical, and biomathematical input to the planning of all Lombardi Comprehensive Cancer Center (LCCC) clinical trials, by active membership on the Clinical Research Committee and providing biostatistical reviews of proposed protocols, and by the monitoring of all LCCC trials through the Data and Safety Monitoring Committee and the Protocol Review and Monitoring System
  • To educate evidence-based research investigators, staff and students in biostatisticsal, bioinformatical, and biomathematical methodology for the planning, carrying out, analysis and interpretation of cancer research studies and other medical, population-based and statistical research
  • To research biostatistical, bioinformatical, and biomathematical methodology on problems arising in collaborations with investigators on a wide variety of research projects.


Several areas of Methodological research:
1. Design and Analysis of Clinical Trials and Translational Research Studies

The adaptive clinical trial designs developed by the faculty provide an innovative approach for efficient personalized therapy. Research work has also been done on adaptive two-stage designs with treatment selection. In addition, a novel experimental design and analysis method for drug combinations is developed by integrating concepts in modern statistics and pharmacology; and more fundamental research in experimental design provides a way to make laboratory research more efficient. (See publications of Drs. Tan, Luta, Makambi, Fang, and Wang)

2. Statistical Bioinformatics and high dimensional data

Statistical bioinformatics applies statistical and computational methods or tools in analyzing high dimensional data, such as gene expression microarrays (multiple platforms, including Affymetrix, Agilent, Illumina, and other customized arrays), genome wide association study (GWAS), single nucleotide polymorphism (SNP) analyses, copy number variant (CNV), microRNA profiling, integrated genomic data, next generation sequencing, proteomics, metabolomics, flow cytometry and imaging data. Faculty members in the Department are at the forefront of developing statistical bioinformatics methodology. (See publications of Drs. Tan, Luta, Makambi, Fang, Li, Ahn, and Zhong)

3. Predictive Modeling and Evaluation of Biomarkers

Evaluation of biomarkers and predictive modeling are the strength and interest of many faculty members at the division (See publications of Drs. Tan, Makambi, and Fang)

4. Statistical methods for population sciences research

In population sciences research, statistical methods involves multi-level/hierarchical models, causal inference, and Bayesian models where the division faculty is at the forefront of the development. (See publications of Drs. TanLuta, and Makambi)

5. Survival analysis

Survival analysis is a popular statistical method for evaluating treatment effects in time to event data with varying follow-up periods, and censoring. Faculty have developed statistical methods for the observed data are interval-censored and panel-count data which often occurs during clinical trials and follow-up studies. (See publications of Dr. Fang)

6. Incomplete Data methods

Incomplete data (i.e., missing, censored, or coarsened data) pose a unique problem in population research. Recently, methods for missing data problems using Bayesian approach are summarized in a book. (See publications of Drs. Tan and Ahn)

7. Data Mining / Machine Learning Method to Handle High-Throughput “Omics” Data

Data mining is a designated process that attempts to discover patterns in large datasets, which is a perfect tool to deal with high dimensional biomedical “omics” data. A variety of machine learning methods can be utilized to recognize hidden patterns behind the high-throughput data, e.g., hierarchical/k-means clustering, principal component analysis (PCA), support vector machine, and decision tree. Faculty members in the Department are highly experienced in applying data mining applications and tools to support investigators to discover in-depth knowledge from the data. (See publications of Drs. Tan and Li).

8. Pathway / Network Analysis

Faculty members in the DBBB are very experienced to provide investigators on pathway analysis, function analysis, gene ontology (GO) identification, and other system biology methodologies. (See publications of Drs. Li and Zhong)

9. Next Generation Sequence Analysis, RNA-seq, Exome Sequence

Faculty members in the Department have extensive experience with processing of next-generation DNA sequencer data as well as genotyping and validation data along with downstream analysis of this data. Faculties have developed algorithms for this data for medical and population genetics and cancer applications, as well as applying these algorithms to answer fundamental scientific questions. (See publications of Drs. Tan, Li, Ahn, and Zhong)

10. Cross-experimental Processing and Mining on Gene Expression Microarray Data

This is ongoing research involving non-small cell lung cancer and obesity. The idea is to utilize the existing public available data to do an in-depth downstream cross-experimental analysis to reveal more biological knowledge.

11. Privacy Preserving Data Mining (PPDM) for Distributed Bioinformatics Datasets

When there are requirements for collaborations across multiple bioinformatics datasets to conduct data mining but the data is privacy sensitive, PPDM is a solution to provide the equivalent result as from physically merged data, but the original data was not shared to any outsiders. Therefore, the data mining result is built on a global view of all datasets, and at the same time, data privacy is preserved. Dr. Li has developed PPDM work using principal component analysis (PCA) to gene expression data clustering. (See publications of Dr. Li)

12. Ontology-Driven and Knowledge-Based Bioinformatics Workflow Management System

Faculty members in the Department are active in bioinformatics and database management research, and have proficient skills to help investigators to build their customized system to manage their unique data and data processing pipeline. (See publications of Dr. Li)

13. Collaborative Cancer Control and Prevention Research

Faculty members have collaborated with population sciences researchers on a large number of important projects. Selected research topics include the evaluation of perceived risk of breast cancer among Latinas, medical providers' willingness to recommend genetic testing, psychosocial telephone counseling Intervention in BRCA1 and BRCA2 mutation carriers, breast cancer adjuvant chemotherapy decisions in older women, cancer screening among Latino immigrants from safety net clinics, and long-term disease-specific functioning among prostate cancer survivors and controls in the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial. (See publications of Drs. Luta and Makambi)

14. Gene-Environment Interaction Study

Recent results from large-scale genome-wide association studies indicate that for many complex diseases, only a limited fraction of the variability in disease traits is explained by the confirmed and replicated genetic susceptibility markers. Most complex diseases, such as cancer, have a multi-factorial etiology which is a combination of the genetic architecture of a disease and exposure to environmental factors. Characterizing and identifying such interactions are statistically challenging, as we need adequate sample size for rare gene-exposure configurations. Under case-control designs, multiple approaches have been proposed to optimize both type 1 error and power under the departure of gene-environment independence, whereas no consensus on the optimal approach has been reached. Developing tools for testing gene-environment interaction in genome-wide association studies are of interest. (See publications of Drs. Makambi and Ahn)

15. Spatio-Temporal Studies

Due to the development of the GPS system, recent years have seen an explosion in methods and applications for spatial inference problems ranging from association studies between geographically referenced covariates and outcomes to the prediction of unobserved variables at a desired location. If the changes in GPS-formatted data are recorded over time, the correlations due to the repeated measurements need to be accounted for. (See publications of Dr. Ahn)