Architecture

11/05/2025

The HarpIA platform is composed of three independent modules and a repository containing resources that are useful in research projects involving the evaluation of large language models (LLMs):

  • HarpIA Survey: assists researchers in evaluating the performance of LLMs using the feedback of human evaluators (human-centric evaluation).
  • HarpIA Lab : provides researchers with tools for evaluating and analyzing the performance of LLMs by computing popular metrics in specialized literature and by running other automated evaluation procedures.
  • HarpIA Twin: the performance of LLMs embedded in sociotechnical information systems is evaluated based on the predefined functional and non-functional requirements for the functionality executed by the LLM.
  • HarpIA Resources: repository dedicated to the dissemination of common use cases of LLMs in the literature and to foster the advancement of LLM evaluation practices.

HarpIA Survey

The architecture of the HarpIA Survey module is fundamentally based on four components: a Moodle server, two plugins that extend Moodle’s native functionalities, and a custom gateway. The first of these is a Moodle 4.5 (LTS) server. Moodle is an eLearning platform used by millions of people around the world. In the HarpIA Survey, its native functionalities are concatenated to compose the use cases supported by HarpIA Survey (e.g., functionality for authorizing human evaluators to access the module). The Moodle server is accompanied by a database (MySQL 8) and a web server (Apache 2.4), both widely used in industry and academia. Thus, when recruited by the researcher to participate in the evaluation, human evaluators interact with HarpIA Survey through a web interface based on Moodle’s native functionalities.

The two plugins that extend Moodle’s functionality play a key role in the communication between the HarpIA Survey and the LLM being evaluated. The HarpIA Interaction plugin creates a field type that can be used in “Database” activities. Fields of the “HarpIA Interaction” type allow the combined recording of prompts submitted to the LLM and their responses. In turn, the HarpIA Ajax plugin coordinates the submission of prompts to and the collection of responses from the custom gateway. The function of the LLM Gateway is to centralize and abstract calls to LLMs. Some additional details about these components:

  • HarpIA Interaction Plugin:  When specifying the web page with which participants will interact to evaluate an LLM, the researcher inserts a field of type “HarpIA Interaction” into the page. The configuration of this field has three important parameters, as illustrated in the figure below: (a) the type of model with which the HarpIA Survey will interact — e.g., Llama 3.1 8B or GPT 3.5 Turbo, (b) the system prompt that should be configured in the LLM, and (c) the type of experiment that the researcher wants to conduct.

  • HarpIA Ajax Plugin: This component implements an AJAX interaction between Moodle pages and the LLM Gateway. Basically, its function is to ensure the proper formatting of the request sent to the gateway, as well as the proper extraction of the response data that will be presented to the human evaluator. Both request and response follow the specification of an API created by the HarpIA project.
  • LLM Gateway: This component implements, among other functions, the translation of requests following the HarpIA project standard to the format expected by each LLM, and the opposite service for the responses replied by the LLM to the HarpIA Survey. The API created by the HarpIA project is based on services offered by ollama and seeks to be generic enough to allow communication with any LLM. To configure the LLM Gateway to operate with a new type of model, the researcher simply needs to update a configuration file with data on how to activate the model, such as the model name and type, the type of adapter that should be used to access the model (e.g., models that expose the Llama model API must inform the “OllamaAnswerProvider” class as an adapter), among others.

HarpIA Lab

The architecture of the HarpIA Lab module has three main components: the HarpIA Lab Frontend, which offers a graphical web interface to facilitate interaction with the module; (b) the HarpIA Lab Server, which allows the module to be made available as a service that can be invoked through the graphical interface or via the command line (and in this case, enabling the use of scripts to customize the evaluation process); and the HarpIA Lab Library, where the components that implement metrics or procedures for evaluating LLMs are organized.

The HarpIA Lab Server and Library components are based on technologies from the Python ecosystem that are widely used in industry and academia. For example, the metrics implemented in the HarpIA Lab Library component, when possible, are wrappers of implementations offered by packages such as NLTK, PyTorch, Gensim, and Evaluate (from Hugging Face). This reduces the effort required to perform verification tests of new contributions. The HarpIA Lab Frontend, on the other hand, was built on frameworks from the Javascript ecosystem (ReactJS and NextJS).

Together, the three components form a pipeline that accepts files in a format that allows specifying an LLM evaluation task with great precision. This file follows the JSON standard and contains, in its simplest form, two main keys: “instances” and “metrics”. The first specifies a list of tuples (ID, prompt, list of expected responses, response obtained) that were collected by the user during their investigation of the LLM behavior being evaluated. The second key specifies a list of tuples (metric name, parameters required to execute the metric). Once received, the pipeline parses the input JSON file and each of the metrics specified under the “metrics” key is applied to each instance specified under the “instances” key. At the end of the processing, the pipeline produces two files, also in JSON format: one of them contains the results of the metrics chosen by the user at the instance level and the other file contains similar results, but at the aggregated level.


HarpIA Twin

Coming soon!


Harpia Resources

Coming soon!