DATAKOM Project

Efficient Compression Systems for Next-Generation Scientific Datasets

ECS4NSD

ECS4NSD investigates new compression systems for high-volume scientific data (synchrotron, DNA, and space data), aiming to improve throughput, compression performance, and information reuse at the same time.

Project key facts

  • Applicability: 01/09/2025 - 30/08/2028
  • Funding body: Ministry of Science and Innovation, Spanish Government, FEDER.
  • Financial support: 288.125 €
  • Reference: PID2024-156292OB-I00
  • Main researchers: Joan Serra Sagristà, Joan Bartrina Rapesta

Project description

The project starts from a clear challenge: scientific data production is growing much faster than current storage and transmission capabilities. Synchrotron facilities, DNA sequencing workflows, and space missions generate massive data volumes that require a revision of current standards.

ECS4NSD advances the state of the art through compression techniques tailored to each domain and computing platform, combining highly parallel approaches, machine learning-based coding, and compact data structures to enable efficient information extraction from long-term archives.

The project is led by Joan Serra-Sagristà (IP1) and Joan Bartrina-Rapesta (IP2).

Project outcome summary

Journals

2

Proceedings

3

Software published

1

Projecte contributions

Proceedings · 2026

Analysis of Lossless Compression for Synchrotron Crystallography Data

Pau Quintas-Torra, Xavier Fernández-Mellado, Joan Bartrina-Rapesta, Albert Castellví, Gabriel Jover-Mañas, Armando J. Pinho, Joan Serra-Sagristà

IEEE Data Compression Conference (DCC)

Open contribution

Software · 2026

BLADE: GPU-Oriented Lossless Compression of Bayer CFA Images (Code, Scripts, and Reproducibility Package)

J. Bartrina-Rapesta, J. Serra-Sagrista, S. Mijares i Verdú

(Zenodo, v1)

Open contribution

Journal article · 2026

Lossless Compression of Modern Astronomical Data Using a Novel Learned Predictor

Pau Quintas-Torra, Xavier Fernández-Mellado, Joan Bartrina-Rapesta, Sebastià Mijares, Armando J. Pinho, Joan Serra-Sagristà

Publications of the Astronomical Society of the Pacific

Open contribution

Proceedings · 2025

Analysis of Lossless Coding Techniques for Uncalibrated Spectral Data From James Webb Space Telescope

Xavier Fernández-Mellado, Pau Quintas-Torra, Joan Bartrina-Rapesta, Jordi Portell, Armando J. Pinho, Joan Serra-Sagristà

2025 15th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing (WHISPERS)

Open contribution

Journal article · 2025

Robustness of Lossy Multispectral Compression to Simulated Instrumental Noise: A Comparative Study

Jordi González-de-Regàs, Sebastià Mijares, Joan Bartrina-Rapesta, Joan Serra-Sagristà

IEEE Geoscience and Remote Sensing Letters

Open contribution

Proceedings · 2025

SGST: Bit-Depth Recovery-Based Lossless Compression for Hyperspectral Images

Zuoxin Xi, Sebastià Mijares, Joan Serra-Sagristà

2025 15th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing (WHISPERS)

Open contribution

Objectives

General objectives

  • O1. On-the-fly compression: new algorithms for scientific acquisition with high throughput and low complexity.
  • O2. Long-term compression and information extraction: improved compression performance, quality, and efficient data access.
  • O3. Internationalization and transfer: contributions to standards (CCSDS/HDF5), European collaborations, and industrial transfer.

Specific objectives

  • O1.1 Hardware-tailored parallel algorithms (SIMD/GPU) for compression during data acquisition.
  • O1.2 Multi-platform deployment (GPU, TPU, and low-power accelerators).
  • O2.1 High-compression techniques using scalable entropy coding, RoI, and learned models.
  • O2.2 New methodologies and compact structures for fast information extraction (genomics and crystallography).
  • O3.1 Participation in international standardization (CCSDS and HDF5).
  • O3.2 International collaborations and transfer of results to industrial partners.

Work packages

WP1 - Macromolecular crystallography data

Coordination: Albert Castellví, Joan Bartrina Rapesta (IP2)

Design and integration of compression techniques for crystallography workflows, including on-the-fly coding, long-term archiving, and HDF5 integration.

WP2 - Computed tomography data

Coordination: Gabriel Jover Mañas, Joan Serra-Sagristà (IP1)

Study of tomography workflows, development and validation of compression techniques for long-term archiving, and deployment in ALBA workflows.

WP3 - Space data

Coordination: Ian Blanes García, Joan Bartrina Rapesta (IP2)

Compression for space data using machine learning models, low computational complexity, and advanced smart coding features.

WP4 - Deoxyribonucleic acid data

Coordination: Ivan Erill, Joan Serra-Sagristà (IP1)

Compact data structures and new algorithms for efficient representation, search, and analysis of genomic data.

WP5 - Internationalization and technology transfer

Coordination: Joan Bartrina Rapesta (IP2), Joan Serra-Sagristà (IP1)

Contributions to CCSDS and HDF5, participation in European proposals, and transfer of results to industrial products.

WP6 - Coordination and management

Coordination: Joan Bartrina Rapesta (IP2), Joan Serra-Sagristà (IP1)

Technical and financial management of the project, milestone tracking, and overall execution coordination.

Participating people

Participating and collaborating institutions

  • Universitat Autònoma de Barcelona (UAB) - main coordinating institution of the project.
  • ALBA Synchrotron - direct participation in crystallography/tomography data and demonstrator integration.
  • University of Maryland, Baltimore County (USA) - participation in the bioinformatics/genomics line.
  • University of Warwick (UK) - participation in computing and applied machine learning.
  • University of Arizona (USA), Philips Medizin Systeme Böblingen GmbH (Germany), and University of Aveiro (Portugal) - international collaboration in the work team.
  • Transfer and data ecosystem entities: ESA, CNES, NASA, OpenCosmos, Satellogic, Airbus, NCBI, and IEEC (depending on project lines and data sources).

Contact

For collaborations or technical inquiries about ECS4NSD, contact the DATAKOM Data Compression Lab.

Join us How to reach us