Jiang Bian, PhD, MS


bianjiang@ufl.edu, My UF Profile

Chief Data Scientist, University of Florida Health
Chief Data Scientist, OneFlorida+ Clinical Research Consortium
Director of the Cancer Informatics Shared Resource (CISR), University of Florida Health Cancer Center
Associate Director of the Biomedical Informatics Program, University of Florida Clinical and Translational Science Insitute

Professor, Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida

Always looking for self-motivated Postdoctoral Research Scientists as well as MS and PhD students into Biomedical Informatics. Drop me an email at bianjiang@ufl.edu

I love doing research in biomedical informatics, but I also enjoy writing high quality code and producing real-world products that will make a broader impact...

Data-driven Medicine

Data-driven medicine—applications of data science techniques, including machine-/deep-learning methods on solving heterogeneous and big data problems in medicine.

Data-driven approaches that aim to discover patterns from massive amount of data and make clinically relevant predictions are becoming increasingly common in translational research. Although advances have been made in computing systems, the bridge from data processing to knowledge discovery, unfortunately, is still challenging. Unlocking the value of big data computing systems to achieve the optimal use of those systems demands strong background in multiple disciplines (e.g., data integration, machine-/deep-learning methods, distributed systems, algorithms). Our work in this area focuses on (1) developing software tools and systems that are capable of handling big data in terms of integration, storage, searching, sharing, analyzing, and visualizing; and (2) using heterogeneous data sources, especially real-world data (RWD) such as electronic health records (EHRs) and administrative claims data to generate real-world evidence (RWE). For example, our work on clinical trial generalizability assessment has used RWD (e.g., large collections of linked EHRs and claims data from the OneFlorida network) to identify trial generalizability issues during its design phase.

Clinical and Clinical Research Informatics

Development of novel informatics methods, tools and systems to support clinical and clinical research activities.

It is imperative to have tools and software designed and developed using informatics principles to support clinical and clinical research activities. I have been deeply involved in various software development and data infrastructure research projects. In particular, my efforts have led to the development of the Comprehensive Research Informatics Suits (CRIS). CRIS now is the standard clinical research management system at the University of Arkansas for Medical Sciences (UAMS). Further, I have led the development of the Enterprise Image Repository (EIR) and the Trauma Image Repository (TIR) – a medical image exchange platform for telemedicine that has been widely used in the state of Arkansas. Moreover, I have lead the development of the CLinicAl Research Administration (CLARA) system, as part of UAMS's CTSA effort, to manage the complex regulatory requirements of clinical trials. At the University of Florida (UF), I have crucial role in building the OneFlorida Data Trust, a secure data repository currently containing ~15 million linked patient records of different types integrating diverse data sources.

Semantic Web and Ontology

Semantic web and ontology—advancing knowledge representation and creating applications of knowledge bases/graphs.

This line of research started with an interest in using network science and graph theory as a conceptual framework for interpreting health and health care data from a unique perspective. Naturally, we investigated applications and other areas that use graph-based data structures. In particular, we have used ontology and semantic web technology stack as a data and knowledge representation framework to solve various informatics challenges: (1) creating ontology-based semantic data integration infrastructure to harmonize heterogeneous datasets; (2) turning free-text clinical trial eligibility criteria into machine-computable form to support cohort discovery and recruitment; and (3) developing knowledge bases/graphs to provide evidence-based information to lay consumers.

eHealth applications

Design, development, and testing of eHealth applications for management of chronic diseases and promotion of healthy behaviors through multi-level interventions.

eHealth is an emerging field in the intersection of informatics, computer science, health communication, public health, and health care delivery, leveraging the booming and related technologies (e.g., web- or mobile-based apps and wearables) to enhance and deliver health services and information. Following best practices (e.g., user-centered design, Agile software development), we designed and developed a wide range of eHealth applications and have shown their effectiveness in funded clinical trials. Further, one key focus area is to identify multi-level interventions (targeting patients, caregivers, providers, and health care systems). Thus, our research also taps into innovations in health information technology (IT) such as integrating patient-reported outcome (PRO) collection toolkit into EHR workflows and tools that provide evidence-based clinical decision support.

Social Web and Health

Mining the Internet, including the social web, to provide insights into health-related behavior and health outcomes of various populations and finding ways to develop interventions that promote public and consumer health.

Many health behaviors, despite being an individual choice, is often influenced by social and cultural context. Social media such as Twitter and Facebook afford us enormous opportunities to understand the intersections of individual behaviors, social-environmental factors, and social interactions on social media platforms. Over the past several years, our research has shown that we can mine the social web for invaluable insights into public and consumer health. Specially, we have mined Twitter data for detecting drug-related adverse events, evaluating the adequacy of gender identification terms among transgender population, assessing the U.S. weekly trends in work stress and emotion, and understanding the impact of promotional health information on laypeople. We have also developed open source software for the collection and analysis of social media data.

Data Privacy in Healthcare

Leveraging state-of-the-art data privacy research to construct technologies that enable privacy in the context of real-world organizational, political, and health information architectures.

Although privacy protection of health information is highly regulated (e.g., HIPAA), there lacks a consistent and efficient model for developing software systems involving electronic health records (EHRs). I am interested in designing security models and developing patient-centric privacy-preserving software systems such as privacy-preserving record linkage systems, secure health information exchange platform, and privacy-preserving cohort discovery framework



Principal Investigator (MPI, Dual PI, Co-PI) (Selected)

PANDA-MSD: Predictive Analytics via Networked Distributed Algorithms for Multi-System Diseases (Chen, Merkel, Bian)

This proposal seeks support to develop novel data integration methods using electronic health records from multiple CTSA hubs to create predictive models for multi-system diseases. We propose to develop the Predictive Analytics via Networked Distributed Algorithms (PANDA) framework, which will enable accurate risk prediction to help healthcare providers reach accurate diagnoses earlier.

Computational Drug Repurposing for AD/ADRD with Integrative Analysis of Real World Data and Biomedical Knowledge (Wang Bian)

The goal of this project is to develop a comprehensive learning-based causal inference framework to generate high-throughput drug repurposing hypotheses for Alzheimer’s' disease and related dementias (AD/ADRD) from real-world data and biomedical knowledge bases. The generated drug repurposing hypotheses will be validated with diverse data sources and approaches including (1) using multi-omics data based on the network medicine approach, and (2) using the trial emulation approach and with i) high-quality cohorts’ data from the National Alzheimer's Coordinating Center (NACC), and ii) a prospective RWD-based cohort. The developed algorithms and software tools will be made publicly available and widely disseminated in the AD/ADRD research communities.

Using Real-world Data to Assess the Burden of Diabetes in Children and Adolescents in Florida (Bian, Shao, Guo, Shenkman)

The goal of this project is to develop a comprehensive learning-based causal inference framework to generate high-throughput drug repurposing hypotheses for Alzheimer’s' disease and related dementias (AD/ADRD) from real-world data and biomedical knowledge bases. The generated drug repurposing hypotheses will be validated with diverse data sources and approaches including (1) using multi-omics data based on the network medicine approach, and (2) using the trial emulation approach and with i) high-quality cohorts’ data from the National Alzheimer's Coordinating Center (NACC), and ii) a prospective RWD-based cohort. The developed algorithms and software tools will be made publicly available and widely disseminated in the AD/ADRD research communities.

The benefits and harms of lung cancer screening in Florida (Bian, Guo)

  • NIH/NCI 1R01CA246418-01, 01/01/2020 - 12/31/2023
  • Role: Principal Investigator (Contact PI)
  • Other PI(s): Yi Guo@UFL (MPI)

Lung cancer screening using Low-dose computed tomography is a promising technique to reduce the burden of lung cancer, the leading cause of cancer-related death in the United States. Concerns over high false positives, invasive diagnostic procedures, postprocedural complications, and downstream health care costs impede developing and promoting lung cancer screening program. Our research is to leverage a large repository of linked electronic health record (EHR) and administrative claims data to understand contemporary use of lung cancer screening and the associated health care outcomes and costs in a real-world setting.

NLP to Connect Social Determinants and Clinical Factors for Outcomes Research (Wu, Bian)

  • PCORI ME-2018C3-14754, 01/01/2020 - 01/31/2023
  • Role: Principal Investigator (Dual PI)
  • Other PI(s): Yonghui Wu@UFL (Contact PI)

This project seeks to develop clinical natural language processing (NLP) methods and systems to extract and connect social & behavioral determinants of health (SDoH & BDoH) and adverse events (AEs) with clinical factors generated by clinical practice in patients’ electronic health records (EHRs) for health outcomes research. We will 1) develop ontologies, corpora, and NLP methods to extract SDoH, BDoH, and AEs with improved handling of abbreviations related to medical concepts; 2) develop methods to integrate medical knowledge with statistical NLP methods; and 3) develop and disseminate an NLP package – SODA, which extracts, standardizes, and populates SDoH, BDoH, and AEs information (in addition to clinical factors) from clinical narratives to the PCORnet CDM.

Systematic Analysis of Clinical Study Generalizability Assessment Methods with Informatics (He@FSU, Bian)

  • NIH/NIA 1 R21 AG061431-01, 01/15/2019 - 11/30/2020
  • Role: Principal Investigator (MPI)
  • Other PI(s): Zhe He@FSU (Contact PI)

We propose to systematically review the extant methods for generalizability assessments, and then use a data-driven strategy to reproduce, evaluate, and compare these methods with our unique data resource, the OneFlorida Data Trust. The success of this R21 project will (1) fill a knowledge gap on the validity and utility of the different generalizability assessment methods; (2) provide an easy-to-use toolbox ctGATE for assessing study generalizability much-needed by the clinical research community; and (3) help the clinical researchers choose the most appropriate generalizability assessment methods with readily available implementations. We focus on breast, lung, and colorectal cancer trials.

A Person-Centric Prediction Model of Job Loss based on Social Media (Prosperi, Bian, Zhou@UMN)

We propose first to systematically review the extant methods for generalizability assessments, and then use a data-driven strategy to reproduce, evaluate, and compare these methods with our unique data resource, the OneFlorida Data Trust. We aim to answer two key research questions: (1) whether the a prior generalizability can predict the actual clinical outcomes of the interventions on the target population; and (2) how the exclusion criteria that may limit older adults’ participation affect study generalizability and outcomes of patients in different age groups. We focus on breast, lung, and colorectal cancer trials.

Co-Investigator (Site PI, Co-I) (Selected)

PCORnet 2.0/PCRnet Clinical Research Network Infrastructure: OneFlorida Clinical Research Consortium (Shenkman, Hogan)

  • PCRF #1239, 10/1/2018-09/30/2020
  • Role: Co-Investigator (Director of Query Coordination, BMI Director of the OneFlorida Surveillance and Linkage to Care Program)
  • PI(s): Betsy Shenkman@UFL (Contact PI), Bill Hogan@UFL (BMI Co-PI)

This project is to develop further the research infrastructure in OneFlorida as part of PCORnet 2.0. OneFlorida is a Data Research Network that provides an enduring infrastructure for comparative effectiveness research and pragmatic clinical trials.

Improving Patient Reported Outcome Data for Research through seamless integration of the PROMIS Toolkit into EHR Workflows (Starren@NWU, Bian)

  • NIH/NCATS 1U01TR001806-01, 09/15/2016–09/14/2020
  • Role: Co-Investigator (UF Site PI)
  • PI(s): Justin B Starren@NWU (Contact PI)

To achieve this, we will develop software to support tight integration into the two most common academic medical center EHRs--Epic and Cerner. We will develop a generalized integration of the PROMIS toolset, utilizing the SMART-on-FHIR standard, that can be implemented in multiple EHR platforms. Finally, we will implement and evaluate these software solutions across a number of diverse CTSA sites both within and outside of the project team CTSA sites.

Implementing a Guidelines-Based M-Health Intervention for High Risk Asthma Patients (Perry@ACH, Bian)

  • NIH/NINR, NIGMS 1R01NR015988-01A1, 07/20/2018 – 05/31/2023
  • Role: Co-Investigator (UF Site PI, lead app development)
  • PI(s): Tamara Perry@ACH (Contact PI)

The overarching goal of the proposed project is to examine the effectiveness of a personalized, interactive asthma smarthphone application in reducing asthma morbidity among adolescents who are at increased risk for future asthma exacerbations. Further, we will examine the impact of sharing smartphone asthma-related data with the primary care provider (PCP) on outcomes.

Utilizing Data from the Electronic Medical Record to Predict Alzheimer's and Dementia Risk (Maraganore)

  • FL DOH #9AZ14, 02/01/2019–01/31/2021
  • Role: Co-Investigator (EHR Integration Lead, eHealth)
  • PI(s): Demetrius Maraganore (Contact PI)

First, we will utilize data captured by the EMR at the University of Florida (UF), to develop a cognitive impairment/dementia/AD prediction model (UF AD prediction model). Second, we will replicate the model using historical data captured by the EMRs at OneFlorida Clinical Research Consortium sites. Third, we will integrate it into the UF EMR and build clinical decision support (CDS) tools that identify patients at highest risk for cognitive disorders and guide referral by primary care providers to brain health specialists.

Methods and Software for Lifecourse Epidemiology Data and Sample Size Analysis (Gluech,Muller,Dabelea)

Lifecourse epidemiology is the study of chronic disease risk associated with the long‐term effects of exposures occurring throughout the life course. When repeated measurements of risk factors over time are used to predict disease outcomes, the complex study designs require new data and sample size methods and software. We propose to create new methods and software for risk factor trajectories, and broadly disseminate them to a wide audience of scientists.