Tianhao Li
Amazon AI PhD Fellow · Johns Hopkins · 2025–2027

Machine Learning for Scientific Discovery

Tianhao Li · Ph.D. Candidate, Johns Hopkins

I build ML/AI systems for expensive scientific search and decision problems— active learning, surrogate modeling, LLM-assisted workflows, and closed-loop pipelines connecting models with high-fidelity simulation backends. Previously M.S. at Duke University.

I am a Ph.D. candidate in Materials Science and Engineering at Johns Hopkins University, working with Prof. Corey Oses, and an Amazon AI PhD Fellow (2025–2027).

My work focuses on building ML/AI systems for expensive scientific search and decision problems—active learning for data-efficient model adaptation, surrogate modeling over large combinatorial spaces, LLM-assisted discovery workflows, and closed-loop pipelines connecting machine learning with high-fidelity simulation backends.

Previously, I completed my M.S. at Duke University and my B.S. at Changsha University of Science and Technology.

Currently looking for

Summer 2026 applied-science / research internships in ML for scientific discovery, simulation, or agentic workflows. Also open to research collaborations that bridge ML methods with domain-science backends.

Education & Honors

Amazon AI PhD Fellow
Amazon · Johns Hopkins University
2025 – 2027
Ph.D. · Materials Science & Engineering
Johns Hopkins University
2023 – Present
M.S. · Materials Science & Engineering
Duke University
2021 – 2023
B.S. · Materials Science & Engineering
Changsha University of Science and Technology
2016 – 2020

Current Projects

Active · 2024–Present

Active Learning for Scientific Simulations

Data-efficient active-learning workflow for adapting pretrained ML interatomic potentials to ultra-complex disordered materials. Reduced required simulation labels from 20,000 to 930 (95% reduction) while improving energy MAE from 0.061 to 0.026 eV/atom. Trained on >1.5M atom-step trajectories.

Active · 2023–Present

ML/AI Infrastructure for Scientific Discovery

Built and maintain a structured scientific data asset of 194,760 simulation records for model training, benchmarking, and retrieval. Contributed to LLM-assisted discovery interfaces supporting tool-calling, structured retrieval, and natural-language exploration over scientific datasets.

Active · 2024–2025

ML Screening for Fuel-Cell Catalysts

End-to-end ML workflow for feature engineering, surrogate modeling, and multi-objective ranking over 20,000+ compositions. Physics-informed random-forest models used to down-select platinum-free catalyst candidates under activity, stability, and cost constraints.

Active · 2025–Present

Closed-Loop AI Discovery Pipelines

Developing closed-loop learning workflows combining surrogate models, automated large-scale screening, and escalation to high-fidelity evaluation under uncertainty. Building API-connected infrastructure for experiment–model feedback and campaign-scale prioritization.

Visual Tour

Key figures from recent publications — each captures a core idea, method, or finding.

SOAP-guided active learning for disordered materials

A closed-loop fine-tuning workflow adapts pretrained ML interatomic potentials to ultra-complex disordered oxides. SOAP-similarity selection reaches target MAE with ~109 samples, vs. thousands for random sampling.

Disordered system modeling → ML interatomic potential → energy-based high-entropy descriptor
End-to-end workflow: disordered supercell tiling → ML interatomic potential fine-tuning → energy-based formability descriptors that separate high- vs. low-formability compositions.

AI-driven search for Pt-free fuel-cell catalysts

End-to-end ML pipeline screens 26,334 quinary HEA compositions under activity, stability, and sustainability constraints — matching Pt-like d-band behavior while using earth-abundant elements.

Small-cell tiling, energy distributions, descriptor correlation, and high-value zone in HEA chemical space
Small-cell tiling captures local disorder; EFA and DEED descriptors quantify synthesizability and catalytic activity; the high-value zone identifies earth-abundant Pt-free candidates.

LLM-assisted exploration of ~200K simulation records

The CHAOS database curates 194,760 first-principles records across high-entropy oxides. A natural-language interface (CHAOS-GPT) turns user prompts into SQL-style retrieval and on-the-fly analysis.

Periodic-table frequency maps for high-entropy oxides, binary oxides, and ternary oxides in the CHAOS database
Elemental frequency across the CHAOS database — periodic-table heatmaps for high-entropy oxides, binary oxides, and ternary oxides covering 194,760 first-principles records.

High-entropy materials powering green energy

Invited review synthesizing the landscape of high-entropy materials for hydrogen generation & storage, batteries, electronics, catalysis, thermoelectrics, and biofuel applications.

High-entropy materials for green energy: hydrogen, batteries, electronics, catalysis, thermoelectrics, and biofuel
High-entropy materials span six major green-energy sectors — from hydrogen generation and storage to batteries, solar cells, thermoelectrics, catalysis, and biofuels.

Revisiting thermoelectrics with a high-entropy design

Invited review examining how configurational entropy — from doping through alloying to full high-entropy compositions — reshapes phonon scattering, band convergence, and thermoelectric performance.

From doping to alloying to high-entropy: increasing configurational entropy across material classes
Increasing configurational entropy from doping (Nb-doped MoS₂) through binary alloying (PtNi) to five-component high-entropy oxides — each step introduces new disorder-driven design handles for thermoelectric optimization.

Projects & Tools

S4E-MatForge

Led end-to-end curation of an open materials dataset with ~200K records— schema design, metadata standardization, QA/QC, packaging, and ML-ready release. Hosted on Hugging Face.

Hugging FaceData Curation200K records
GPT

CHAOS-GPT

LLM-assisted screening interface for scientific candidate triage and dataset exploration. Supports tool-calling, structured retrieval, and natural-language queries over large scientific data repositories.

LLMTool-callingRetrieval

HE Fuel-Cell Discovery Pipeline

Open ML pipeline for feature engineering, surrogate modeling, and candidate screening in large composition spaces. Reusable code, standardized datasets, and analysis artifacts for multi-objective material discovery.

PyTorchRandom ForestScreening

Selected Works

  1. Manuscript in Preparation Target: 2026

    Active-Learning Deep-Learning Models for Scientific Simulations

    Li, T.; Oses, C.; et al.

  2. Nano Futures 2025

    The Search for High-Entropy Fuel-Cell Catalysts Using Disorder Descriptors

    Li, T.; Han, G.; Xu, X.; et al.

  3. npj Computational Materials 2025

    High Entropy Powering Green Energy: Hydrogen, Batteries, Electronics, and Catalysis

    Qiu, G.; Li, T.; Xu, X.; et al.

  4. Materials Horizons 2025

    Beyond the Four Core Effects: Revisiting Thermoelectrics with a High-Entropy Design

    Oses, C.; Li, T.; Xu, X.; et al.

  5. Frontiers in Chemistry 2019

    Effects of H₂–H₃ Phase Transition Reversibility on Ni-Rich Layered Oxide Cathodes

    Chen, J.; Yang, H.; Li, T.; et al.

Full list on Google Scholar · ORCID: 0009-0008-5486-5940

Selected Presentations

ARPA-E Energy Innovation Summit · National Harbor, MD
High-Throughput Design of High-Entropy Materials
Poster · March 2025
ARPA-E Annual Fission Meeting · Arlington, VA
AI-Driven Design of High-Entropy Materials
Poster · September 2024
Data-Driven Materials Modeling Workshop · Johns Hopkins University
Talk · May 2024

Professional Activities

Peer Reviewer

Reviewer for Nano Materials Science, Nanotechnology, and Journal of Alloys and Compounds — topics spanning ML for materials, high-entropy alloys, and energy-materials characterization.

Guest Editor

Co-editing a Special Issue on data-driven design of high-entropy materials — coordinating submissions, peer review, and editorial decisions.

Mentorship

Mentored junior graduate students and research interns on ML-for-science projects at Johns Hopkins — from onboarding to paper-ready deliverables.

Teaching Experience

Spring 2025
Teaching Assistant
Gateway Computing: Python (EN.500.113)

Weekly office hours for 40 engineering students; organized four quizzes and graded programming projects with detailed feedback.

Fall 2023
Teaching Assistant
Introduction to Computational Materials Modeling (EN.510.466/666)

Led five workshops on numerical integration, Monte Carlo, DFT+VASP, high-throughput AFLOW, and AI-driven discovery. Authored starter code and unit tests; graded ~40 submissions.

Skills

ML & AI
PyTorchTensorFlowGNNActive LearningLLM WorkflowsSurrogate Modeling
Data & Systems
PythonHugging FaceGitSlurm/HPCMATLABR

Get in Touch

I'm always happy to discuss research collaborations, questions about my work, or computational materials science in general.

Visitor Map