bianjiang@ufl.edu, My UF Profile
Chief Data Scientist, University of Florida Health
Chief Data Scientist, OneFlorida+ Clinical Research Consortium
Director of the Cancer Informatics Shared Resource (CISR), University of Florida Health Cancer Center
Associate Director of the Biomedical Informatics Program, University of Florida Clinical and Translational Science Insitute
Professor, Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida
I love doing research in biomedical informatics, but I also enjoy writing high quality code and producing real-world products that will make a broader impact...
Data-driven medicine—applications of data science techniques, including machine-/deep-learning methods on solving heterogeneous and big data problems in medicine.
Data-driven approaches that aim to discover patterns from massive amount of data and make clinically relevant predictions are becoming increasingly common in translational research. Although advances have been made in computing systems, the bridge from data processing to knowledge discovery, unfortunately, is still challenging. Unlocking the value of big data computing systems to achieve the optimal use of those systems demands strong background in multiple disciplines (e.g., data integration, machine-/deep-learning methods, distributed systems, algorithms). Our work in this area focuses on (1) developing software tools and systems that are capable of handling big data in terms of integration, storage, searching, sharing, analyzing, and visualizing; and (2) using heterogeneous data sources, especially real-world data (RWD) such as electronic health records (EHRs) and administrative claims data to generate real-world evidence (RWE). For example, our work on clinical trial generalizability assessment has used RWD (e.g., large collections of linked EHRs and claims data from the OneFlorida network) to identify trial generalizability issues during its design phase.
Development of novel informatics methods, tools and systems to support clinical and clinical research activities.
It is imperative to have tools and software designed and developed using informatics principles to support clinical and clinical research activities. I have been deeply involved in various software development and data infrastructure research projects. In particular, my efforts have led to the development of the Comprehensive Research Informatics Suits (CRIS). CRIS now is the standard clinical research management system at the University of Arkansas for Medical Sciences (UAMS). Further, I have led the development of the Enterprise Image Repository (EIR) and the Trauma Image Repository (TIR) – a medical image exchange platform for telemedicine that has been widely used in the state of Arkansas. Moreover, I have lead the development of the CLinicAl Research Administration (CLARA) system, as part of UAMS's CTSA effort, to manage the complex regulatory requirements of clinical trials. At the University of Florida (UF), I have crucial role in building the OneFlorida Data Trust, a secure data repository currently containing ~15 million linked patient records of different types integrating diverse data sources.
Semantic web and ontology—advancing knowledge representation and creating applications of knowledge bases/graphs.
This line of research started with an interest in using network science and graph theory as a conceptual framework for interpreting health and health care data from a unique perspective. Naturally, we investigated applications and other areas that use graph-based data structures. In particular, we have used ontology and semantic web technology stack as a data and knowledge representation framework to solve various informatics challenges: (1) creating ontology-based semantic data integration infrastructure to harmonize heterogeneous datasets; (2) turning free-text clinical trial eligibility criteria into machine-computable form to support cohort discovery and recruitment; and (3) developing knowledge bases/graphs to provide evidence-based information to lay consumers.
Design, development, and testing of eHealth applications for management of chronic diseases and promotion of healthy behaviors through multi-level interventions.
eHealth is an emerging field in the intersection of informatics, computer science, health communication, public health, and health care delivery, leveraging the booming and related technologies (e.g., web- or mobile-based apps and wearables) to enhance and deliver health services and information. Following best practices (e.g., user-centered design, Agile software development), we designed and developed a wide range of eHealth applications and have shown their effectiveness in funded clinical trials. Further, one key focus area is to identify multi-level interventions (targeting patients, caregivers, providers, and health care systems). Thus, our research also taps into innovations in health information technology (IT) such as integrating patient-reported outcome (PRO) collection toolkit into EHR workflows and tools that provide evidence-based clinical decision support.
Mining the Internet, including the social web, to provide insights into health-related behavior and health outcomes of various populations and finding ways to develop interventions that promote public and consumer health.
Many health behaviors, despite being an individual choice, is often influenced by social and cultural context. Social media such as Twitter and Facebook afford us enormous opportunities to understand the intersections of individual behaviors, social-environmental factors, and social interactions on social media platforms. Over the past several years, our research has shown that we can mine the social web for invaluable insights into public and consumer health. Specially, we have mined Twitter data for detecting drug-related adverse events, evaluating the adequacy of gender identification terms among transgender population, assessing the U.S. weekly trends in work stress and emotion, and understanding the impact of promotional health information on laypeople. We have also developed open source software for the collection and analysis of social media data.
Leveraging state-of-the-art data privacy research to construct technologies that enable privacy in the context of real-world organizational, political, and health information architectures.
Although privacy protection of health information is highly regulated (e.g., HIPAA), there lacks a consistent and efficient model for developing software systems involving electronic health records (EHRs). I am interested in designing security models and developing patient-centric privacy-preserving software systems such as privacy-preserving record linkage systems, secure health information exchange platform, and privacy-preserving cohort discovery framework
This proposal seeks support to develop novel data integration methods using electronic health records from multiple CTSA hubs to create predictive models for multi-system diseases. We propose to develop the Predictive Analytics via Networked Distributed Algorithms (PANDA) framework, which will enable accurate risk prediction to help healthcare providers reach accurate diagnoses earlier.
The goal of this project is to develop a comprehensive learning-based causal inference framework to generate high-throughput drug repurposing hypotheses for Alzheimer’s' disease and related dementias (AD/ADRD) from real-world data and biomedical knowledge bases. The generated drug repurposing hypotheses will be validated with diverse data sources and approaches including (1) using multi-omics data based on the network medicine approach, and (2) using the trial emulation approach and with i) high-quality cohorts’ data from the National Alzheimer's Coordinating Center (NACC), and ii) a prospective RWD-based cohort. The developed algorithms and software tools will be made publicly available and widely disseminated in the AD/ADRD research communities.
The goal of this project is to develop a comprehensive learning-based causal inference framework to generate high-throughput drug repurposing hypotheses for Alzheimer’s' disease and related dementias (AD/ADRD) from real-world data and biomedical knowledge bases. The generated drug repurposing hypotheses will be validated with diverse data sources and approaches including (1) using multi-omics data based on the network medicine approach, and (2) using the trial emulation approach and with i) high-quality cohorts’ data from the National Alzheimer's Coordinating Center (NACC), and ii) a prospective RWD-based cohort. The developed algorithms and software tools will be made publicly available and widely disseminated in the AD/ADRD research communities.
Lung cancer screening using Low-dose computed tomography is a promising technique to reduce the burden of lung cancer, the leading cause of cancer-related death in the United States. Concerns over high false positives, invasive diagnostic procedures, postprocedural complications, and downstream health care costs impede developing and promoting lung cancer screening program. Our research is to leverage a large repository of linked electronic health record (EHR) and administrative claims data to understand contemporary use of lung cancer screening and the associated health care outcomes and costs in a real-world setting.
This project seeks to develop clinical natural language processing (NLP) methods and systems to extract and connect social & behavioral determinants of health (SDoH & BDoH) and adverse events (AEs) with clinical factors generated by clinical practice in patients’ electronic health records (EHRs) for health outcomes research. We will 1) develop ontologies, corpora, and NLP methods to extract SDoH, BDoH, and AEs with improved handling of abbreviations related to medical concepts; 2) develop methods to integrate medical knowledge with statistical NLP methods; and 3) develop and disseminate an NLP package – SODA, which extracts, standardizes, and populates SDoH, BDoH, and AEs information (in addition to clinical factors) from clinical narratives to the PCORnet CDM.
We propose to systematically review the extant methods for generalizability assessments, and then use a data-driven strategy to reproduce, evaluate, and compare these methods with our unique data resource, the OneFlorida Data Trust. The success of this R21 project will (1) fill a knowledge gap on the validity and utility of the different generalizability assessment methods; (2) provide an easy-to-use toolbox ctGATE for assessing study generalizability much-needed by the clinical research community; and (3) help the clinical researchers choose the most appropriate generalizability assessment methods with readily available implementations. We focus on breast, lung, and colorectal cancer trials.
We propose first to systematically review the extant methods for generalizability assessments, and then use a data-driven strategy to reproduce, evaluate, and compare these methods with our unique data resource, the OneFlorida Data Trust. We aim to answer two key research questions: (1) whether the a prior generalizability can predict the actual clinical outcomes of the interventions on the target population; and (2) how the exclusion criteria that may limit older adults’ participation affect study generalizability and outcomes of patients in different age groups. We focus on breast, lung, and colorectal cancer trials.
This project is to develop further the research infrastructure in OneFlorida as part of PCORnet 2.0. OneFlorida is a Data Research Network that provides an enduring infrastructure for comparative effectiveness research and pragmatic clinical trials.
To achieve this, we will develop software to support tight integration into the two most common academic medical center EHRs--Epic and Cerner. We will develop a generalized integration of the PROMIS toolset, utilizing the SMART-on-FHIR standard, that can be implemented in multiple EHR platforms. Finally, we will implement and evaluate these software solutions across a number of diverse CTSA sites both within and outside of the project team CTSA sites.
The overarching goal of the proposed project is to examine the effectiveness of a personalized, interactive asthma smarthphone application in reducing asthma morbidity among adolescents who are at increased risk for future asthma exacerbations. Further, we will examine the impact of sharing smartphone asthma-related data with the primary care provider (PCP) on outcomes.
First, we will utilize data captured by the EMR at the University of Florida (UF), to develop a cognitive impairment/dementia/AD prediction model (UF AD prediction model). Second, we will replicate the model using historical data captured by the EMRs at OneFlorida Clinical Research Consortium sites. Third, we will integrate it into the UF EMR and build clinical decision support (CDS) tools that identify patients at highest risk for cognitive disorders and guide referral by primary care providers to brain health specialists.
Lifecourse epidemiology is the study of chronic disease risk associated with the long‐term effects of exposures occurring throughout the life course. When repeated measurements of risk factors over time are used to predict disease outcomes, the complex study designs require new data and sample size methods and software. We propose to create new methods and software for risk factor trajectories, and broadly disseminate them to a wide audience of scientists.