HarpIA
HarpIA is a framework for evaluating large language models that is currently being designed and developed by the KEML team.
The idea is the framework will facilitate both quantitative and qualitative evaluations, based on different strategies (indicators and methods), and allow for the systematization, reproducibility, standardization, and transparency of the evaluation process.
From a quantitative evaluation standpoint, the framework is being designed to accommodate the implementation of various indicators through a highly cohesive and loosely coupled architecture. It is expected that different individuals will be able to contribute to the available set of indicators by providing implementations that adhere to the framework’s established standards.
From a qualitative perspective, the framework will offer a “comfortable” environment for human evaluations, with the possibility of configuring the evaluation and implementing an evaluation-aware workflow.
One of the construction objectives of the framework is to enable some level of automation for conducting evaluations to construct ablation procedures. Statistical robustness and natural language reporting are also objectives in the framework’s development.
Although test automation is planned for HarpIA’s development, it remains an open question, as we believe that the automation of evaluation is effective when it follows a well-established and validated method. Our intention is to facilitate the use of evaluation methods and validate them. Automating them will be the next step.
- The framework is currently under construction and not yet available for use. Some modules will be published soon.
* The name HarpIA refers to the harpy eagle (or royal hawk), the strongest predator among birds. Found in Brazil, especially in the Amazon and the Atlantic Forest, it is also present in other regions of South America. It is a very astute and strong bird, clever and observant. One of its characteristics is to scan its prey, evaluating their features before deciding to capture them.*
Blabinha 1.0 and Blabinha 2.0
Blabinha 1.0 is a system built on the Robios Go robot. It was designed as an extension of the BLAB orchestrator, aiming to serve as a knowledge dissemination agent about the Blue Amazon for children. It is a system developed with imperative programming and uses speech recognition and synthesis services provided by the Robios Go robot’s API.
You can learn more about Blabinha 1.0 by accessing the code, the promotional video (3’43”) – only in Portuguese, and the full application video (22’05”) – only in Portuguese.
Blabinha 2.0 is a conversational agent specifically designed as an evaluation environment for large language models and prompt engineering, enriched or not by knowledge representation mechanisms, when placed in the role of conducting task-oriented and domain-oriented dialogues.
Blabinha 2.0 is implemented using GPT-family models and prompt engineering of the chain-of-thought (step-by-step) type, aiming to conduct a dialogue with a child. It promotes engagement in a conversation about the Blue Amazon domain through a gamification strategy: the more knowledge the child extracts from Blabinha during the conversation, the stronger a superhero projected at the end of the dialogue will become.
During the conversation, the language model is subjected to a series of tasks, ranging from introducing itself to the child to performing topic analysis.
Blabinha 2.0 is not yet being used for interaction with children, as interactions with language models are not yet considered entirely safe. Only testers interact with it.
The system implementation is available here.
Scientific article describing Blabinha 2.0 (please cite this article if you make use of the implementation associated with Blabinha 2.0):
-
Teodoro Junior, G. S.; Peres, S. M.; Fantinato, M.; Brandão, A. A. F.; Cozman, F. G. A Goal-Oriented Chat-Like System for Evaluation of Large Language Models. In: XXI Encontro Nacional de Inteligência Artificial e Computacional (ENIAC).
CtxKG
CtxKG is a knowledge graph generation method aimed at extracting entities and relationships directly from texts without any external alignment with databases. CtxKG constructs networks of interconnected concepts.
To tackle a domain like the Blue Amazon, which generally remains underexplored in terms of concept formalization and relationships and has limited freely available descriptive content, the method is designed to operate in a low-resource context. This means that, through the use of different language processing tools and the inclusion of grammar concepts, CtxKG maximizes the use of the available textual resources.
The CtxKG method has two versions: the original in English and the Portuguese version called PtxKG (Portuguese CtxKG). The existence of these two versions allows it to handle both scientific documents, which are usually written in English, and more general materials about the various aspects of the Blue Amazon, which tend to be written in Portuguese, given the importance of the Blue Amazon to Brazil.
The implementation of CtxKG and PtxKG is available here. An auxiliary module can also be accessed here.
If you use or mention CtxKG, please cite:
- Ligabue, P. M., Brandão, A. A. F, Peres, S. M., Cozman, F.G., Pirozelli, P. Applying a Context-based Method to Build a Knowledge Graph for the Blue Amazon. Data Intelligence 2024; 6 (1): 64–103. https://doi.org/10.1162/dint_a_00223
dPASP
The dPASP framework presents a powerful high-level language for describing probabilistic tasks in an intuitive and declarative manner. Just like in traditional probabilistic logic programming (PLP), programs in dPASP are written in terms of probabilistic facts or rules, allowing for uncertainty to play a role in the knowledge description of the problem. Notably, the framework further extends PLP by leveraging the expressiveness of neural networks for describing probabilities in possibly hybrid domains. Further, by natively embedding neural expressions within the language, dPASP offers end-to-end training of sophisticated models and loss functions while requiring minimal user knowledge of deep learning system’s inner workings. More information here.
dPASP has both a domain-specific language (DSL) and command-line interpreter (parser) for that language, which can be used as a standalone tool. Alternatively, dPASP can be accessed as Python library or more directly through its C backend.
The easiest way to get started is by reading the tutorial Learning dPASP Through Examples.
BLAB Orquestrador
Currently, conversational agents can be built with language models, rules, and ontologies to provide fluent dialogue. However, coordinating multiple techniques or strategies supplying input for the agent’s dialogue is a challenge. The BLAB Orchestrator is a mechanism to effectively orchestrate these multiple input sources in a conversational agent. The orchestrator’s architecture follows a client-server approach and consists of:
- A prompt generation module for a language model responsible for making orchestration decisions;
- “Responder” modules that can be implemented by language models, rules, and ontologies focused on specialized domains;
- Resources that allow coupling graphical interfaces for user interaction via text, or a social robot with text-to-speech and speech-to-text capabilities for implementing speech interaction.
The implementation of the BLAB Orchestrator is available on GitHub.
Scientific paper describing the BLAB Orchestrator (please cite at least one of these paper if you use any implementation associated with the BLAB Orchestrator):
- Matos, V. B.; Grava, R.; Tavares, R.; José, M. M.; Pirozelli, P.; Brandão, A. A. F.; Peres, S. M.; Cozman, F. G. Coordination within Conversational Agents with Multiple Sources. In Proceedings of the 20th Nacional Meeting on Artificial and Computational Intelligence, (ENIAC 2023), Belo Horizonte, 2023. p. 939-953. ISSN 2763-9061. https://doi.org/10.5753/eniac.2023.234533
- Pirozelli, P.; Castro, A. B. R.; Oliveira, A. L. C.; Oliveira, A. S.; Cação, F. N.; Silveira, I. C.; Campos, J. G. M.; Motheo, L. C.; Figueiredo, L. F.; Pellicer, L. F. A. O.; José, M. A.; José, M. M.; Ligabue, P. M.; Grava, R. S.; Tavares, R. M.; Matos, V. B.; Sym, Y. V.; Costa, A. H. R.; Brandão, A. A. F.; Mauá, D. D.; Cozman, F. G.; Peres, S. M. The BLue Amazon Brain (BLAB): A Modular Architecture of Services about the Brazilian Maritime Territory. Proceedings of the Workshop: AI Modeling Oceans and Climate Change (AIMOCC 2022), Vienna, 2022, p. 1-11. https://doi.org/10.48550/arXiv.2209.07928
BLAB Reporter
The BLAB Reporter is an application that collects data related to the Blue Amazon from multiple sources and publishes this data on X (formerly Twitter). The publication is always made in natural language (Portuguese), even if the source data for the publication are not textual.
If you want to follow the BLAB Reporter’s publications, follow @BLAB_Reporter on X.
The code that implements the application is available on GitHub.
If you use the code that implements the BLAB Reporter or wish to mention it, please consult and cite one of the references below:
- Sym, Y. V.; Campos, J. G. M.; Cozman, F. G. Blab Reporter: Automated Journalism Covering The Blue Amazon. In Proceedings of the 15th International Conference on Natural Language Generation: System Demonstrations (ACL 2022), Waterville, Maine, USA, Meeting online, 2022. p.1–3. URL: https://aclanthology.org/2022.inlg-demos.1
- Pirozelli, P.; Castro, A. B. R.; Oliveira, A. L. C.; Oliveira, A. S.; Cação, F. N.; Silveira, I. C.; Campos, J. G. M.; Motheo, L. C.; Figueiredo, L. F.; Pellicer, L. F. A. O.; José, M. A.; José, M. M.; Ligabue, P. M.; Grava, R. S.; Tavares, R. M.; Matos, V. B.; Sym, Y. V.; Costa, A. H. R.; Brandão, A. A. F.; Mauá, D. D.; Cozman, F. G.; Peres, S. M. The BLue Amazon Brain (BLAB): A Modular Architecture of Services about the Brazilian Maritime Territory. Proceedings of the Workshop: AI Modeling Oceans and Climate Change (AIMOCC 2022), Vienna, 2022, p. 1-11. https://doi.org/10.48550/arXiv.2209.07928