Skip to main content

News and Media

Two colleagues collaborate in an office, with one typing on a laptop displaying code while the other stands nearby observing, illustrating hands‑on database or programming work.
Dr. Yongwei Shan, an associate professor in the School of Civil and Environmental Engineering, and Ife Awotunde, a graduate research assistant, co-authored a paper on the development of an artificial intelligence-based system that uses knowledge graphs for a schema-guided retrieval-augmented network.

Simplifying SQL: CIVE research aims to transform how cities can manage sewer data

Thursday, March 26, 2026

Media Contact: Tanner Holubar | Communications Specialist | 405-744-2065 | tanner.holubar@okstate.edu

Workers who maintain the sewer management system for cities and municipalities oversee thousands of data points. This includes every pipe, manhole and pump station, along with every past and future repair on the maintenance schedule.

Managing these systems means handling vast datasets and relying heavily on IT professionals trained in Structured Query Language to retrieve specific data, which slows retrieval and increases decision-making time.

Making the retrieval of this data more efficient was the subject of a research project in the College of Engineering, Architecture and Technology at Oklahoma State University.

Ife Awotunde, a graduate research assistant working under Dr. Yongwei Shan in the School of Civil and Environmental Engineering, co-authored a paper recently published in the journal Advanced Engineering Informatics, titled “Domain-specific SQL generation with LLMs: A hybrid framework combining knowledge graphs and retrieval-augmentation.”

Their research developed an artificial intelligence-based system that uses knowledge graphs for a schema-guided retrieval-augmented (RAG) network.

Knowledge graphs represent data as a connected network of nodes and edges, where nodes represent data points, and edges represent labeled relationships between them.

The RAG framework is created by the system storing selected questions and their corresponding SQL queries, along with information about the structure of the database in a searchable concept called a vector database.

When a user asks a question, the system looks for examples and schema details that best fit the prompt. This information is then taken by the language model to guide the user to the correct data.

“By combining retrieved examples, schema context and knowledge-graph-derived relationships, the model receives structured guidance about how tables connect, which helps it generate more accurate SQL queries and reduces the creation of data that doesn’t exist,” Shan said.

Sewer management systems contain data across many related tables, such as assets, inspections, defect conditions, media files and asset locations. Retrieving specific information requires complex SQL queries and detailed knowledge of the database schema.

Two people stand side by side on steps outside an Oklahoma State University engineering building, with brick academic facilities and landscaped walkways visible in the background, illustrating a campus setting.
Dr. Yongwei Shan and Ife Awotunde, of the School of Civil and Environmental Engineering in the College of Engineering, Architecture and Technology at OSU developed a method to make sewer data more accessible and manageable for cities and municipalities.

“With schema-guided RAG, a user can ask a question, and the system retrieves the most relevant schema elements and examples, then adds them to the prompt for the language model,” Awotunde said. “This helps the model understand which tables, columns and relationships are relevant. Because the schema guides the system, it can generate SQL queries with the correct join paths between entities such as sewer pipes, inspections, and defect records, thereby reducing query errors.”

This allows city staff to ask general questions and receive accurate results without writing SQL, making municipal sewer data more accessible, reliable and useful for decision-making during operations.

The accuracy of the data is also improved by grounding the language model in the database structure before a query is generated. Large language models are proficient at understanding natural language, but when generating SQL, they lack knowledge of the database schema. This can cause the model to generate table names, select incorrect columns or connect invalid data points.

“The schema information provides structural guidance,” Awotunde said. “It tells the model exactly which tables exist, what columns they contain and how they relate through foreign keys or join paths. Because the model sees this structured information during generation, it is more likely to produce SQL that matches the database design.”

Every city has vast amounts of inspection records, maintenance logs and condition data, but much of this data is difficult to access without technical expertise. If a city were to adopt a schema-guided RAG system, that barrier would be lowered.

Asset managers can ask simple questions that lead to a quicker diagnosis of problems, leading to more proactive and prioritized infrastructure repair.

“Cities already collect large volumes of sewer inspection and asset data, but much of it remains underutilized,” Shan said. “Systems like this help transform that data into actionable insights that support planning, budgeting, and long-term infrastructure management.”

Having this paper published in Advanced Engineering Informatics, a high-impact civil engineering journal with an impact factor of 9.9, represents an important career stepping stone for Awotunde.

"It’s a Q1-ranked journal and one of the leading venues for research at the intersection of artificial intelligence and engineering systems,” Awotunde said. “The review process is quite rigorous, so having the work accepted validates the novelty and technical contribution of our approach. Personally, I had always hoped to publish in a top-quartile journal, so seeing our research appear there is very rewarding and a milestone in my research journey.”

Shan said it is a validation of the work on advancing AI applications in civil engineering. He also said it was a pleasure to watch one of his students continue to grow as a researcher.

“Personally, it is very rewarding to witness the professional growth of my graduate student, who led the authorship and is developing into a more independent researcher,” Shan said.