Conversational agents

AI-generated image (Copilot)

Conversational agents are perhaps the main application for a language model. At least in the context of popular use, conversational agents fully express the natural language processing capabilities of artificial intelligence when implemented using language models.

From a computational perspective, conversational agents are highly complex systems. These systems need to be able to interpret and generate natural language statements that consider the flow of a conversation, meaning they must handle history, intentions, discourse fluency, and, to be truly engaging, they need to express some type of personality.

Historically, conversational agents have been studied within artificial intelligence from different viewpoints and implemented with various theoretical resources. Due to the advent of large language models and their respective competence in handling natural language, conversational agents are now heavily based on this technology.

The KEML team has been studying conversational agents for some time. The team began its research with the proposition of the BLAB (Blue Amazon Brain) agent in 2020. The original project aimed to build a conversational agent capable of interacting with users by providing information and solving problems related to the Brazilian coast domain – called the Blue Amazon by the Brazilian Navy. In this context, the group developed an architecture for a conversational agent endowed with various capabilities. In the evolutionary cycle of this conversational agent, an orchestration module was implemented to make decisions about which functionality of the agent should be activated in certain conversational contexts about the Blue Amazon.

Currently, the KEML team is dedicated to producing conversational agents or related systems (such as Question & Answering Systems) based on large language models associated with knowledge-based structures. The idea is to propose different styles of conversational agents that can serve as a basis for testing large language models associated with knowledge representation mechanisms, such as knowledge graphs and ontologies.

The results associated with research in conversational agents are mainly represented by the systems and resources:

BLAB (Blue Amazon Brain): A system featuring a dialogue interface supported by an orchestration architecture for using different question-answering mechanisms in a dialogue production context.
Blabinha: A task-oriented and domain-oriented system, also aimed at exploring content about the Blue Amazon, but consisting of a gamified dialogue supported by GPT-family language models and prompt engineering.
Cocoruta (system): A Question & Answering System oriented towards the Blue Amazon domain and based on legal documents (laws, regulations, ordinances, decrees, bills, etc.).
Cocoruta (corpus): A collection of legal documents composed of about 200,000 documents extracted from official Brazilian repositories, organized via a hierarchy of metadata and format for computational processing.
Cocoruta (question and answer dataset): A set of questions and answers built on the Cocoruta corpus, through the application of a large language model (GEMINI).
Pirá: A bilingual question and answer dataset, constructed and evaluated by human work, on a corpus of scientific paper abstracts and excerpts from UN reports respectively about the Blue Amazon and the Global Ocean.

Learn a bit more about ….

Pirozelli, P.; José, M. M.; Silveira, I. C.; Nakasato, F.; Peres, S. M.; Brandão, A. A. F.; Costa, A. H. R.; Cozman, F. G. Benchmarks for Pirá 2.0, a Reading Comprehension Dataset about the Ocean, the Brazilian Coast, and Climate Change. In the Data Intelligence (MIT Press Direct 2024), 2024. v.6. p.29-63. https://doi.org/10.1162/dint_a_00245
Matos, V. B.; Grava, R.; Tavares, R.; José, M. M.; Pirozelli, P.; Brandão, A. A. F.; Peres, S. M.; Cozman, F. G. Coordination within Conversational Agents with Multiple Sources. In Proceedings of the 20th Nacional Meeting on Artificial and Computational Intelligence, (ENIAC 2023), Belo Horizonte, 2023. p. 939-953. ISSN 2763-9061. https://doi.org/10.5753/eniac.2023.234533
Other publications here!