The central theme driving our group's research is the pursuit of “big data, intelligent algorithms, intuitive interfaces” to bridge data gaps, making the data-to-decision process not only accurate and efficient, but also accessible, understandable, and acceptable to users. On the machine side, we build human-centered AI models and systems that can extract useful information from big data fastly (scalability), are steerable by user needs (steerability), learn from human feedback (ability to learn), and produce interpretable results (transparency). On the human side, we have been closely working with different types of users, to learn about the challenges they face, and identify the general human factors to guide the human-AI system development. We then create interactive visualizations that are intuitive to understand, faithful in representing data, and visually scalable, to help users understand and reason with big data and communicate their knowledge and feedback to machines. Please check this talk slides if you want to know more details.

Themes of interest:

We are always exploring new avenues in data visualization and human-centered AI. Prospective students and collaborators are warmly invited to join our group's research journey.

Human-AI Teaming for Time-Varying Data Analytics

One primary focus of our group is the study of various forms of time-varying data that are observed in critical application domains including sustainability, healthcare, and urban informatics. This encompasses signal data from monitoring-heavy equipments (e.g., satellites, wind turbines, air monitoring stations), spatio-temporal data from urban environments (e.g., human/vehicle movements), and temporal event records from social good applications in areas like healthcare, sports, and education.


Time series or signal data (dense temporal observations) are collected by sensors in a variety of industrial equipment. The slides provide a good overview of the series of work that we have done. Sintel (SIGMOD'22; w/ SES and Iberdrola) is essentially an ecosystem supporting signal processing/featurizing, various ML tasks (forcasting, classification, anomaly detection), benchmarking, and data visualization to allow human-in-the-loop analytic workflow. One of our libraries Orion (github stars > 850) has been widely used for time series anomaly detection. MTV (CSCW'22) is a visual analytics system for detecting, investigating, and annotating anomalies in multivariate time series in a collaborative way. AER (BigData'22) and TadGAN (BigData'20; covered by MIT News) propose unsupervised deep learning approaches for time series anomaly detection.


Temporal event records or event sequences frequenly appear in applications that serve urgent societal needs. [Healthcare] Cardea (DSAA'20; covered by MIT News; gitHub stars >100) builds ML prediction pipelines on Electronic Health Record (EHR) data with AutoML. VBridge (TVCG'22; VIS'21 honorable mention; w/ ZJU Med School) visually explains prediction models trained using EHR data and creates two analysis workflows for practical uses. IBoca (w/ J-Clinic) digitalizes the MoCA or Montreal Cognitive Assessment, enabling the collection of fine-grained patient interaction logs for use in machine learning. Check this slides for more healthcare data analytics details. [Sports] TacticFlow (TVCG'22) and RASIPAM (TVCG'22) and BEEP (KDD'23 DSAI4Sports) present a range of interactive pattern mining algorithms, along with novel visualizations, for exploring event sequence patterns in competitive sports data.


Spatio-temporal (ST) or movement data reveal the knowledge regarding where and when. They represent one special form of time-varying data due to the simultaneous presence of spatial and temporal dimensions. SmartAdP (TVCG'17; the top 10 most cited work in TVCG since 2017; w/ MSRA) targets at multifactor decision-making for citywide optimal location selection using large-scale taxi trajectories. TPFlow (TVCG'19; best paper award at VIS'18; w/ Bosch) introduces a tensor-based algorithm along with tailored visual summarizations to enable automatical and human-steerable ST patterns (trend, anomalies, etc.) exploration. AQEyes (w/ HKU) is a visual analytics system for air quality data anomaly detection and examination.

Uncovering and Addressing AI/ML Usability Challenges

The second strand of my research focuses on identifying usability challenges in human-AI collaboration and proposing solutions to enhance the usability of AI across various application scenarios. Sibyl (TVCG'22;) has summarized some common challenges (e.g., lack of trust, unhelpful prediction, unclear consequences of actions) and potential mitigating tools/approaches (e.g., local/global explanations, cost-benefit analysis). However, this is just a start. The design choices for these tools and human-AI interfaces are heavily influenced by the specifics of the domains and decision-makers involved. Through user studies such as field observations, interviews, and usability experiments, our group seeks to provide a more nuanced understanding of the contextual factors influencing human-AI collaboration. This will, in turn, inform future research and the development of human-centered AI. Our ultimate aim is to design innovative algorithms, interactive tools, and open-source software that democratizes AI, empowering individuals to harness its potential in every facet of their lives."


Build trust with high-stakes decision-makers. Sibyl (TVCG'22; spotlight on MIT News homepage; w/ Colorado Department of Human Services) not only reveals the common AI usability challenges, but also identifies the useful XAI algorithms and visual representations in the scenario of child welfare screening. VBridge (TVCG'22; VIS'21 honorable mention; w/ ZJU Med School) visually explains ML prediction outcomes — the risk of developing complications after paediatric cardiac surgery — to clinicians to inform clinical decision making. Feature space (SIGKDD Explorations’22) is a position paper that takes the lessons from our past projects working with domain experts, proposing a feature taxonomy and highlighting the need for interpretable features.


Increase transparency and controllability for ML model developers. DeepTracker (TIST'18; w/ MSRA) enables visual exploration of the intense dynamics of CNN training processes to assist with model debugging and optimization. ATMSeer (CHI’19; covered by MIT News) offers multi-granular visualizations to allow users easily to analyze and refine the search space for an AutoML system. Cardea (DSAA’20; covered by MIT News; GitHub stars >100) is an open-source framework that streamlines complex ML processes to enable building prediction models on electronic healthcare records with just a few lines of codes. Pyreal (ongoing) is a highly-extensible framework and software system for generating a variety of interpretable and rapidly understandable ML explanations for any tabular ML model.