Databases, AI and Workflow Automation

1 About

Ken Pu, Ph.D., Associate Professor
Computer Science / Faculty of Science / Ontario Tech University

He received his PhD in Computer Science from University of Toronto in 2006. Dr. Pu’s expertise is in database systems, applied machine learning, and data-driven workflows.

Zikun Fu, M.Sc., Computer Science (2024-now)
Zikun’s research profile

Zikun is pursuing his M.Sc. in Computer Science at Ontario Tech University. His research has been on language model fine-tuning for database entity recognition. He has worked on data curation, synthetic data generation, model fine-tuning and evaluation of end-to-end ML workflows.

During his M.Sc. studies, Zikun has been published in the following venues two peer-reviewed research paper, on synthetic data augmentation (Fu et al. 2024) and DBER pipeline (Fu et al. 2025).

Farees Siddiqui, M.Sc., Computer Science (2026-now)
Ontario Tech University

Farees is working on his M.Sc. in Computer Science at Ontario Tech University. His research interests include database systems, applied machine learning, and data-driven workflows.

Farees’s research profile

2 Research Directions

Projects we invest attention to are largely driven by objectives towards the following long-term goals in an AI era.

Fundamental AI research

We aim to advance the foundations of data science, machine learning, and agentic AI.

  • mathematical models of neural network dynamics
  • meta-learning methods for ensemble learning
  • integration of AI with classical computer science

Advanced Document Understanding

We focus on novel methods for document comprehension and synthesis with deep learning and AI assistance. This involves designing document-task specific neural networks, LLM orchestration and agentic workflows to support an AI-driven document processing pipeline.

Open-source benchmarks

We are committed to providing the research community with open source benchmarks to evaluate AI models and technologies. These benchmarks are crucial for ensuring transparency, reproducibility, and progress in the field of AI research.

Databases in AI era

AI applications require different database architectures for training data and model storage, demanding new query capabilities and storage optimizations. Moreover, AI is integrated into the query lifecycle—helping users author queries through natural language and optimizing evaluation through learned cost models.

3 Projects

Title Author
Closed-domain NER Dataset Zikun Fu, Prof. Kourosh Davoudi, Prof. Ken Pu
Document understanding Prof. Ken Pu, Farees Siddiqui, Levi Willm, Ethan Janovitz
No matching items

4 Recent Publications

  • Ken Pu, Limin Ma, Bohdan Synytski, “Semantic Relational Types of SQL Queries and Applications to AI Agent Tool Selection”, In 2025 IEEE CASCON, Toronto, Canada

  • Fu, Z., Yang, C., Davoudi, K., & Pu, K. Q. (2025, August). Database Entity Recognition with Data Augmentation and Deep Learning. In 2025 IEEE International Conference on Information Reuse and Integration and Data Science (IRI) (pp. 349-354). IEEE.

  • Ma, L., Pu, K., Zhu, Y., & Taylor, W. (2025). Comparing large language models for generating complex queries. Journal of Computer and Communications, 13(2), 236-249.

  • Fu, Z., Yang, C., Davoudi, H., & Pu, K. (2024). Transforming Text-to-SQL Datasets into Closed-Domain NER Benchmark. Ontario DataBase Day–Program, 12.

  • Ma, Limin, and Ken Q. Pu. “Accelerating Relational Keyword Queries With Embedded Predictive Neural Networks.” 2024 IEEE International Conference on Information Reuse and Integration for Data Science (IRI). IEEE, 2024.

  • Wasti, Syed Mekael, Ken Q. Pu, and Ali Neshati. “Large language user interfaces: Voice interactive user interfaces powered by LLMs.” Intelligent Systems Conference. Cham: Springer Nature Switzerland, 2024.

References

Fu, Zikun, Yang Chen, Kourosh Davoudi, and Ken Q. Pu. 2025. “Database Entity Recognition with Data Augmentation and Deep Learning.” In IEEE 26th International Conference on Information Reuse and Integration, 1–6.
Fu, Zikun, Chen Yang, Heidar Davoudi, and Ken Pu. 2024. “Transforming Text-to-SQL Datasets into Closed-Domain NER Benchmark.” Ontario DataBase Day–Program, 12.