Research Areas


The Department maintains an active research program both in the development of new biostatistical methodology and in the collaboration on important research projects in the prevention and treatment of cancer and other biomedical research areas.

Biomedical, Statistical, and Informatics Research Areas

The DBBB faculty have expertise in a variety of biomedical research areas and topics such as:

The adaptive clinical trial designs developed by the faculty provide an innovative approach for efficient personalized therapy. Research work has also been done on adaptive two-stage designs with treatment selection. In addition, a novel experimental design and analysis method for drug combinations is developed by integrating concepts in modern statistics and pharmacology; and more fundamental research in experimental design provides a way to make laboratory research more efficient. (See publications of Drs. TanLutaMakambiFang, and Wang)

Statistical bioinformatics applies statistical and computational methods or tools in analyzing high dimensional data, such as gene expression microarrays (multiple platforms, including Affymetrix, Agilent, Illumina, and other customized arrays), genome wide association study (GWAS), single nucleotide polymorphism (SNP) analyses, copy number variant (CNV), microRNA profiling, integrated genomic data, next generation sequencing, proteomics, metabolomics, flow cytometry and imaging data. Faculty members in the Department are at the forefront of developing statistical bioinformatics methodology. (See publications of Drs. TanLutaMakambiFangLiAhn, and Zhong)

Evaluation of biomarkers and predictive modeling are the strength and interest of many faculty members at the division (See publications of Drs. TanMakambi, and Fang)

In population sciences research, statistical methods involves multi-level/hierarchical models, causal inference, and Bayesian models where the division faculty is at the forefront of the development. (See publications of Drs. TanLuta, and Makambi)

Survival analysis is a popular statistical method for evaluating treatment effects in time to event data with varying follow-up periods, and censoring. Faculty have developed statistical methods for the observed data are interval-censored and panel-count data which often occurs during clinical trials and follow-up studies. (See publications of Dr. Fang)

Incomplete data (i.e., missing, censored, or coarsened data) pose a unique problem in population research. Recently, methods for missing data problems using Bayesian approach are summarized in a book. (See publications of Drs. Tan and Ahn)

Data mining is a designated process that attempts to discover patterns in large datasets, which is a perfect tool to deal with high dimensional biomedical “omics” data. A variety of machine learning methods can be utilized to recognize hidden patterns behind the high-throughput data, e.g., hierarchical/k-means clustering, principal component analysis (PCA), support vector machine, and decision tree. Faculty members in the Department are highly experienced in applying data mining applications and tools to support investigators to discover in-depth knowledge from the data. (See publications of Drs. Tan and Li).

Faculty members in the DBBB are very experienced to provide investigators on pathway analysis, function analysis, gene ontology (GO) identification, and other system biology methodologies. (See publications of Drs. Li and Zhong)

Faculty members in the Department have extensive experience with processing of next-generation DNA sequencer data as well as genotyping and validation data along with downstream analysis of this data. Faculties have developed algorithms for this data for medical and population genetics and cancer applications, as well as applying these algorithms to answer fundamental scientific questions. (See publications of Drs. TanLiAhn, and Zhong)

This is ongoing research involving non-small cell lung cancer and obesity. The idea is to utilize the existing public available data to do an in-depth downstream cross-experimental analysis to reveal more biological knowledge.

When there are requirements for collaborations across multiple bioinformatics datasets to conduct data mining but the data is privacy sensitive, PPDM is a solution to provide the equivalent result as from physically merged data, but the original data was not shared to any outsiders. Therefore, the data mining result is built on a global view of all datasets, and at the same time, data privacy is preserved. Dr. Li has developed PPDM work using principal component analysis (PCA) to gene expression data clustering. (See publications of Dr. Li)

Faculty members in the Department are active in bioinformatics and database management research, and have proficient skills to help investigators to build their customized system to manage their unique data and data processing pipeline. (See publications of Dr. Li)

Faculty members have collaborated with population sciences researchers on a large number of important projects. Selected research topics include the evaluation of perceived risk of breast cancer among Latinas, medical providers’ willingness to recommend genetic testing, psychosocial telephone counseling Intervention in BRCA1 and BRCA2 mutation carriers, breast cancer adjuvant chemotherapy decisions in older women, cancer screening among Latino immigrants from safety net clinics, and long-term disease-specific functioning among prostate cancer survivors and controls in the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial. (See publications of Drs. Luta and Makambi)

Recent results from large-scale genome-wide association studies indicate that for many complex diseases, only a limited fraction of the variability in disease traits is explained by the confirmed and replicated genetic susceptibility markers. Most complex diseases, such as cancer, have a multi-factorial etiology which is a combination of the genetic architecture of a disease and exposure to environmental factors. Characterizing and identifying such interactions are statistically challenging, as we need adequate sample size for rare gene-exposure configurations. Under case-control designs, multiple approaches have been proposed to optimize both type 1 error and power under the departure of gene-environment independence, whereas no consensus on the optimal approach has been reached. Developing tools for testing gene-environment interaction in genome-wide association studies are of interest. (See publications of Drs. Makambi and Ahn)

Due to the development of the GPS system, recent years have seen an explosion in methods and applications for spatial inference problems ranging from association studies between geographically referenced covariates and outcomes to the prediction of unobserved variables at a desired location. If the changes in GPS-formatted data are recorded over time, the correlations due to the repeated measurements need to be accounted for. (See publications of Dr. Ahn)