Tianhao Li · ML for Scientific Discovery

Amazon AI PhD Fellow · Johns Hopkins · 2025–2027

Machine Learning for Scientific Discovery

Tianhao Li · Ph.D. Candidate, Johns Hopkins

I build ML/AI systems for expensive scientific search and decision problems— active learning, surrogate modeling, LLM-assisted workflows, and closed-loop pipelines connecting models with high-fidelity simulation backends. Previously M.S. at Duke University.

Get in Touch Research ↓

About Me

I am a Ph.D. candidate in Materials Science and Engineering at Johns Hopkins University, working with Prof. Corey Oses, and an Amazon AI PhD Fellow (2025–2027).

My work focuses on building ML/AI systems for expensive scientific search and decision problems—active learning for data-efficient model adaptation, surrogate modeling over large combinatorial spaces, LLM-assisted discovery workflows, and closed-loop pipelines connecting machine learning with high-fidelity simulation backends.

Previously, I completed my M.S. at Duke University and my B.S. at Changsha University of Science and Technology.

Currently looking for

Summer 2026 applied-science / research internships in ML for scientific discovery, simulation, or agentic workflows. Also open to research collaborations that bridge ML methods with domain-science backends.

Education & Honors

Amazon AI PhD Fellow

Amazon · Johns Hopkins University

2025 – 2027

Ph.D. · Materials Science & Engineering

Johns Hopkins University

2023 – Present

M.S. · Materials Science & Engineering

Duke University

2021 – 2023

B.S. · Materials Science & Engineering

Changsha University of Science and Technology

2016 – 2020

Research

Current Projects

Active · 2024–Present

Active Learning for Scientific Simulations

Data-efficient active-learning workflow for adapting pretrained ML interatomic potentials to ultra-complex disordered materials. Reduced required simulation labels from 20,000 to 930 (95% reduction) while improving energy MAE from 0.061 to 0.026 eV/atom. Trained on >1.5M atom-step trajectories.

Active · 2023–Present

ML/AI Infrastructure for Scientific Discovery

Built and maintain a structured scientific data asset of 194,760 simulation records for model training, benchmarking, and retrieval. Contributed to LLM-assisted discovery interfaces supporting tool-calling, structured retrieval, and natural-language exploration over scientific datasets.

Active · 2024–2025

ML Screening for Fuel-Cell Catalysts

End-to-end ML workflow for feature engineering, surrogate modeling, and multi-objective ranking over 20,000+ compositions. Physics-informed random-forest models used to down-select platinum-free catalyst candidates under activity, stability, and cost constraints.

Active · 2025–Present

Closed-Loop AI Discovery Pipelines

Developing closed-loop learning workflows combining surrogate models, automated large-scale screening, and escalation to high-fidelity evaluation under uncertainty. Building API-connected infrastructure for experiment–model feedback and campaign-scale prioritization.

Project Highlights

Visual Tour

Key figures from recent publications — each captures a core idea, method, or finding.

Project 1 · Active Learning MLIP

SOAP-guided active learning for disordered materials

A closed-loop fine-tuning workflow adapts pretrained ML interatomic potentials to ultra-complex disordered oxides. SOAP-similarity selection reaches target MAE with ~109 samples, vs. thousands for random sampling.

Disordered system modeling → ML interatomic potential → energy-based high-entropy descriptor — End-to-end workflow: disordered supercell tiling → ML interatomic potential fine-tuning → energy-based formability descriptors that separate high- vs. low-formability compositions.

Project 2 · Fuel-Cell HEAs · Nano Futures 2025

AI-driven search for Pt-free fuel-cell catalysts

End-to-end ML pipeline screens 26,334 quinary HEA compositions under activity, stability, and sustainability constraints — matching Pt-like d-band behavior while using earth-abundant elements.

Small-cell tiling, energy distributions, descriptor correlation, and high-value zone in HEA chemical space — Small-cell tiling captures local disorder; EFA and DEED descriptors quantify synthesizability and catalytic activity; the high-value zone identifies earth-abundant Pt-free candidates.

Project 3 · CHAOS Database & CHAOS-GPT

LLM-assisted exploration of ~200K simulation records

The CHAOS database curates 194,760 first-principles records across high-entropy oxides. A natural-language interface (CHAOS-GPT) turns user prompts into SQL-style retrieval and on-the-fly analysis.

Periodic-table frequency maps for high-entropy oxides, binary oxides, and ternary oxides in the CHAOS database — Elemental frequency across the CHAOS database — periodic-table heatmaps for high-entropy oxides, binary oxides, and ternary oxides covering 194,760 first-principles records.

Project 4 · Invited Review · npj Computational Materials 2025

High-entropy materials powering green energy

Invited review synthesizing the landscape of high-entropy materials for hydrogen generation & storage, batteries, electronics, catalysis, thermoelectrics, and biofuel applications.

High-entropy materials for green energy: hydrogen, batteries, electronics, catalysis, thermoelectrics, and biofuel — High-entropy materials span six major green-energy sectors — from hydrogen generation and storage to batteries, solar cells, thermoelectrics, catalysis, and biofuels.

Project 5 · Invited Review · Materials Horizons 2025

Revisiting thermoelectrics with a high-entropy design

Invited review examining how configurational entropy — from doping through alloying to full high-entropy compositions — reshapes phonon scattering, band convergence, and thermoelectric performance.

From doping to alloying to high-entropy: increasing configurational entropy across material classes — Increasing configurational entropy from doping (Nb-doped MoS₂) through binary alloying (PtNi) to five-component high-entropy oxides — each step introduces new disorder-driven design handles for thermoelectric optimization.

Open Source

Projects & Tools

Dataset ↗

S4E-MatForge

Led end-to-end curation of an open materials dataset with ~200K records— schema design, metadata standardization, QA/QC, packaging, and ML-ready release. Hosted on Hugging Face.

Hugging FaceData Curation200K records

GPT

Demo ↗

CHAOS-GPT

LLM-assisted screening interface for scientific candidate triage and dataset exploration. Supports tool-calling, structured retrieval, and natural-language queries over large scientific data repositories.

LLMTool-callingRetrieval

GitHub ↗

HE Fuel-Cell Discovery Pipeline

Open ML pipeline for feature engineering, surrogate modeling, and candidate screening in large composition spaces. Reusable code, standardized datasets, and analysis artifacts for multi-objective material discovery.

PyTorchRandom ForestScreening

Publications

Selected Works

Manuscript in Preparation Target: 2026

Active-Learning Deep-Learning Models for Scientific Simulations

Li, T.; Oses, C.; et al.
Nano Futures 2025

The Search for High-Entropy Fuel-Cell Catalysts Using Disorder Descriptors

Li, T.; Han, G.; Xu, X.; et al.
npj Computational Materials 2025

High Entropy Powering Green Energy: Hydrogen, Batteries, Electronics, and Catalysis

Qiu, G.; Li, T.; Xu, X.; et al.
Materials Horizons 2025

Beyond the Four Core Effects: Revisiting Thermoelectrics with a High-Entropy Design

Oses, C.; Li, T.; Xu, X.; et al.
Frontiers in Chemistry 2019

Effects of H₂–H₃ Phase Transition Reversibility on Ni-Rich Layered Oxide Cathodes

Chen, J.; Yang, H.; Li, T.; et al.

Full list on Google Scholar · ORCID: 0009-0008-5486-5940

Selected Presentations

ARPA-E Energy Innovation Summit · National Harbor, MD

High-Throughput Design of High-Entropy Materials

Poster · March 2025

ARPA-E Annual Fission Meeting · Arlington, VA

AI-Driven Design of High-Entropy Materials

Poster · September 2024

Data-Driven Materials Modeling Workshop · Johns Hopkins University

Data-Driven Thermodynamic Modeling for Materials Discovery ↗

Talk · May 2024

Service

Professional Activities

Peer Reviewer

Reviewer for Nano Materials Science, Nanotechnology, and Journal of Alloys and Compounds — topics spanning ML for materials, high-entropy alloys, and energy-materials characterization.

Guest Editor

Co-editing a Special Issue on data-driven design of high-entropy materials — coordinating submissions, peer review, and editorial decisions.

Mentorship

Mentored junior graduate students and research interns on ML-for-science projects at Johns Hopkins — from onboarding to paper-ready deliverables.

Teaching

Teaching Experience

Spring 2025

Teaching Assistant

Gateway Computing: Python (EN.500.113)

Weekly office hours for 40 engineering students; organized four quizzes and graded programming projects with detailed feedback.

Fall 2023

Teaching Assistant

Introduction to Computational Materials Modeling (EN.510.466/666)

Led five workshops on numerical integration, Monte Carlo, DFT+VASP, high-throughput AFLOW, and AI-driven discovery. Authored starter code and unit tests; graded ~40 submissions.

Skills

ML & AI

PyTorchTensorFlowGNNActive LearningLLM WorkflowsSurrogate Modeling

Data & Systems

PythonHugging FaceGitSlurm/HPCMATLABR

Contact

Get in Touch

I'm always happy to discuss research collaborations, questions about my work, or computational materials science in general.

✉ tli114@jh.edu Google Scholar LinkedIn GitHub ▶ YouTube Talk