Hydrogen is quickly being established as a medium for decarbonization across various energy-intensive industries. The aim is to ultimately displace the use of fossil fuels for small- and large-scale power generation, land and sea transportation, as well as chemicals and fertilizers production. Photocatalytic water splitting is a potential low-cost and long-term solution to green hydrogen production.
The proposed research aims to conduct a comprehensive data-driven analysis of experimental studies on photocatalytic hydrogen production published between 2020 and 2025. By systematically collecting and analysing this body of work, the project seeks to identify trends, challenges, and opportunities within the field. A key component of this research involves exploring the potential and challenges of large language models (LLMs) to automate the extraction of data from scientific literature, thereby enhancing the efficiency and accuracy of future machine learning research. To evaluate the effectiveness of LLMs in this context, an initial manual extraction of data will be conducted to serve as a ground-truth dataset. This dataset will be instrumental in assessing the performance of LLMs in accurately extracting relevant information from unstructured text. Subsequently, the constructed database will undergo analytical data mining to explore gaps in reported experiments and assess how these gaps influence the development of consistent research comparison frameworks.
The anticipated outcome of this project is a structured database that can significantly advance machine learning research in photocatalysis. By providing organized training and validation data, this database can support various stages of research, from catalyst design to kinetic studies. Ultimately, this research aims to transform unstructured textual information into structured, actionable insights, thereby facilitating innovative materials design and discovery.
Chemical Engineering
Clean energy | Photocatalysis | Data science
- Research environment
- Expected outcomes
- Supervisory team
- Reference material/links
The project will be carried out in the Particles and Catalysis Research group co-lead by Scientia Professor Rose Amal, A/Prof Jason Scott and Dr Emma Lovell. The student will work in a multidisciplinary research environment and learn various functional skills to facilitate future career in academic or industry. The project will involve the following activities:
- Data Collection (70%): This phase involves the systematic gathering of experimental research data on photocatalytic hydrogen production from scientific publications dated between 2020 and 2025. Utilizing LLMs, the project will automate the extraction of relevant information, including experimental conditions, photocatalyst characteristics, and performance metrics, from unstructured text. This approach aims to create a comprehensive and structured database, enabling efficient analysis and knowledge discovery.
- Computer-Based Analytics (30%): Following data collection, advanced data analytics techniques will be employed to explore the compiled database. The analysis will focus on identifying data gaps, performance trends, and correlations between photocatalyst properties and hydrogen production efficiency. This phase will also involve the use of machine learning algorithms to predict potential avenues for enhancing photocatalytic performance.
The student is expected to gain understanding in photocatalysis research as well as data analysis and contribute to the broader innovative materials design and discovery. The project will also allow the student to work with other research students to gain valuable interdisciplinary experience. The generated knowledge and data will result in a scientific journal publication. Continuing of the research as an 4th year honour thesis project is possible.
The expected outcomes of the projects are as follow:
- Structured Database: A comprehensive and accessible database of experimental research on photocatalytic hydrogen production (encompassing both water splitting and organic reforming) published between 2020 and 2025.
- Analytical Report: A detailed analysis identifying data gaps, performance trends, and correlations within the compiled research, providing valuable insights into the current state and future directions of photocatalytic hydrogen production.
- Automated Data Extraction Framework: A demonstration of the application of LLMs in automating data extraction and gap identification within the field of photocatalysis, highlighting the potential of artificial intelligence in accelerating research and development.
- Comprehensive Research Report: A final report synthesizing the findings from the data collection and analysis phases, offering recommendations for future research and development efforts in photocatalytic hydrogen production.
- Isazawa, T., Cole, J.M. Automated Construction of a Photocatalysis Dataset for Water-Splitting Applications. Sci Data 10, 651 (2023). https://doi.org/10.1038/s41597-023-02511-6
- Mavracic J, Court CJ, Isazawa T, Elliott SR, Cole JM. ChemDataExtractor 2.0: Autopopulated ontologies for materials science. Journal of Chemical Information and Modeling. 2021 Sep 16;61(9):4280-9.
- Can E, Yildirim R. Data mining in photocatalytic water splitting over perovskites literature for higher hydrogen production. Applied Catalysis B: Environmental. 2019 Mar 1;242:267-83.
- Parrino, F., Loddo, V., Augugliaro, V., Camera-Roda, G., Palmisano, G., Palmisano, L., & Yurdakal, S. (2018). Heterogeneous photocatalysis: guidelines on experimental setup, catalyst characterization, interpretation, and assessment of reactivity. Catalysis Reviews, 61(2), 163–213. https://doi.org/10.1080/01614940.2018.1546445
- Beil SB, Bonnet S, Casadevall C, Detz RJ, Eisenreich F, Glover SD, Kerzig C, Næsborg L, Pullen S, Storch G, Wei N. Challenges and Future Perspectives in Photocatalysis: Conclusions from an Interdisciplinary Workshop. JACS Au. 2024 Aug 8;4(8):2746-66.
- Xu J, Su A, Huang P, Yu W, Du K, Fan Z, et al. PhotoCat: An Artificial Intelligence-Driven Synthesis Planning Platform for Photocatalysis. ChemRxiv. 2023; doi:10.26434/chemrxiv-2023-cc43d-v2 This content is a preprint and has not been peer-reviewed.
- https://htem.nrel.gov/
- https://next-gen.materialsproject.org/