Proyecto OPENDATA4ICTS
Referencia: | RED2022-134332-I |
Área: | Ciencias Físicas |
Subárea: | Física de partículas y nuclear |
Titulo: | Datos en Abierto para Instalaciones Científico Técnicas Singulares Basadas en Aceleradores |
Tipo: | Redes ICTS |
The optimal use of research infrastructures requires that the wider scientific community has adequate access to the data produced. This requires that data are produced and stored in a way that makes them easy to access. For that purpose, the FAIR principles (Findability, Accessibility, Interoperability, Reusability) have been stated and developed.
The accelerator-based ICTS, CNA and CMAM, which form the distributed network IABA, produce an important set of data in each accelerator experiment carried out. Typically, these experiments generate a set of counts in different detectors of a detector array, which should be complemented with data from the accelerator diagnostics. Such wealth of data is often lost after each analysis, in such a way that users only refer to a count rate, where some background subtraction and fitting are performed. In many other cases, data are stored in specific formats that make any reuse initiative impractical. Proper storage of the raw data obtained from accelerator-based experiments, along with the relevant metadata, would be highly beneficial. It would allow systematic analysis of data from different experiments, detector arrays, and facilities. This, in turn, would improve data analysis procedures by leveraging historical data from previous experiments.
To address these challenges, the project has been structured into several objectives, as outlined below:
Objective 1: Define a Common Data Management Plan
The document "Data Management Policy IABA-ICTS" jointly developed for CNA and CMAM, has been drafted and reviewed by researchers from CNA, CMAM, and external users. It outlines the guidelines for data collection, storage, and sharing within the IABA-ICTS network. The document can be accessed at the following link:
Preliminary IABA-ICTS Data Policy
Objective 2: Define a Standard Format Containing the Relevant Metadata
This objective required collaboration with researchers to define all the necessary parameters to contextualize experiments and interpret the measurement files of the selected techniques for the pilot implementation. Templates were used in which researchers specified the parameter name, an example value (used to determine the data type: integer, string, float, etc.), measurement units, whether the parameter is mandatory or optional, the instrument associated with the parameter, and a clear definition.
As part of this process, a distinction was made between two categories of metadata: general metadata and technical metadata.
- General metadata refers to all the information that improves the discoverability, accessibility, and reuse of experimental datasets. This includes elements such as the proposal title, authors, keywords, experiment dates, and other descriptive fields. These metadata are collected through a centralized proposal management platform, which every researcher must complete to request beamtime at the facilities of the ICTS-IABA node, comprised of the Centro Nacional de Aceleradores (CNA) and the Centro de Micro-Análisis de Materiales (CMAM). The proposal submission portal can be accessed at: https://beamtime.cmam.uam.es/
- Technical metadata, in contrast, are essential for the correct interpretation and reusability of measurement data. These include parameters such as beam characteristics, detector configurations, and experimental geometry. They are critical for understanding the experimental conditions under which the data were acquired.
Furthermore, both CNA and CMAM are actively contributing to the NAPMIX project (Nuclear, Astro, and Particle Metadata Integration for eXperiments), which seeks to develop a common metadata schema for a wide range of experimental techniques used across international research infrastructures. NAPMIX also aims to create the digital tools needed to implement and manage these standards effectively. The work conducted within the OPENDATA4ICTS project is directly supporting this effort, providing a foundation of structured metadata and practical implementation insights.
More information about the NAPMIX project is available at:
https://oscars-project.eu/projects/napmix-nuclear-astro-and-particle-metadata-integration-experiments
The metadata description for each of the pilot techniques can be accessed at the following links:
Objective 3: Develop Specific Software
Once the necessary parameters were defined, the next objective was to develop software tools that allow the creation of metadata files. In our case, we opted for a desktop application for each technique using React as the front-end framework and Node.js as the back-end. These applications can be accessed in the following GitHub repository, where users will also find all necessary instructions to run the applications in developer mode: https://github.com/rauvarfer/Metadata_Apps/tree/main.
This GitHub repository is in private mode, so only those who are granted access will be able to view it. To request access, please contact us at the following email: Esta dirección de correo electrónico está siendo protegida contra los robots de spam. Necesita tener JavaScript habilitado para poder verlo..
Additionally, the installers can be downloaded from the following link to test the applications directly on a computer:
Since this is an executable file (.exe) developed by an individual, you will likely see warnings when downloading the installer (Google Drive cannot scan this file for viruses) and during installation (Windows protected your PC – Microsoft Defender SmartScreen prevented an unrecognized app from starting. Running this app might put your PC at risk.)
You can safely trust and install the application.
JSON format was chosen for metadata files due to its human- and machine-readability and easy conversion to other formats.
The application operates as follows:
- Each tab features a green "Save" button to store the entered data before generating the .json file. Even if fields are filled in, pressing the button is necessary to save them in the generated file. If not pressed, the values will be lost when switching tabs.
- If mandatory fields are left blank, an error message appears, highlighting the missing fields in red. The red border will not disappear until the "Save" button is pressed again.
Additionally, three utility buttons are available:
- Save JSON: Generates the .json file with all saved data, including date and time. All forms must be completed and saved before generating the file; otherwise, a message will indicate the missing fields, highlighting them in red.
- Load JSON: Loads data from an existing .json file of the same structure.
- Clean JSON: Clears all saved or loaded data.
The application accepts both dot (.) and comma (,) as decimal separators but saves data using a dot (.). It is crucial NOT to use thousand separators for proper functionality. Additionally, all numerical fields with associated units will display their values and units in the generated .json file.
The application’s backend, built with Node.js, connects to the internal MySQL database of the IABA-ICTS node's proposal management system. This connection allows the application to automatically retrieve all required general metadata associated with the experiment proposals submitted to the Centro Nacional de Aceleradores (CNA) and the Centro de Micro-Análisis de Materiales (CMAM). This connection is allowed for CNA and CMAM public IP's only for security reasons.
Examples of the structure of the measurement files can be found at the following links:
All datasets will be accompanied by a README file, which describes both the measurement files and all the metadata file parameters that can be found within it. These files can be viewed at the following links:
Objective 4: Determine the Optimal Storage Policy
Once the datasets, necessary metadata, and tools for generating these files were defined, the next step was to determine the most suitable repository for storing them. Two repositories have been evaluated:
- idUS (https://idus.us.es/), the institutional repository of the University of Seville, enables researchers to publish their datasets with DOI assignment and optional embargo periods.
- Madroño (https://edatos.consorciomadrono.es/), developed by a consortium of Madrid-based universities, provides similar functionalities for controlled access and persistent identification.
- Zenodo (https://zenodo.org/), maintained by CERN, also allows users to deposit datasets with free DOI generation and support for embargoed access.
All repositories align with the developed data policy. Additionally, Zenodo’s API (https://developers.zenodo.org/) is being explored to develop an automated tool for uploading datasets to the repository. This tool will extract the necessary data for filling out the upload form using the pre-generated metadata files.