Since its inception in 2021, the KEML team has created a series of models for various purposes: text2SQL translators, language models predating the advent of large language models, architectures for implementing conversational agents, etc. Here you will find a list of GitHub and Hugging face repositories that illustrate the work already carried out by the group.
More information about these models can be obtained on this website, in the Resources menu options, or by contacting the team.
- Cocoruta 1.o: Cocoruta is a specialized large language model fine-tuned for legal document-based Question Answering (Q&A), developed to address legal queries related to the “Blue Amazon”—a term used to describe Brazil’s extensive maritime territory. Cocoruta 1.0 is based on the LLaMa 2-7B model, fine-tuned with a corpus of 68,991 legal documents totaling 28.4 million tokens. Despite being trained with fewer parameters than some larger models, Cocoruta demonstrates competitive performance in domain-specific legal discourse.
Scientific paper related to Cocoruta 1.0 (please cite this paper if you use the Cocoruta 1.0 model):
- Espírito Santo, F. O.; Peres, S.M.; Gramacho, G. S.; Brandão, A. A. F.; Cozman, F. G. Legal Document-Based, Domain-Driven Q&A System: LLMs in Perspective. In Proceedings of International Joint Conference on Neural Networks (IJCNN 2024), Japão, 2024.
* “Cocoruta” is the name given to a bird species endemic to the Fernando de Noronha archipelago (Brazil), currently threatened with extinction. The resource’s name was chosen as a tribute to biodiversity and to help promote the conservation of the Blue Amazon (Brazilian coast).