Publikation
FAIR privacy-preserving operation of large genomic variant calling format (VCF) data without download or installation
Yasmmin C Martins; Praphulla MS Bhawsar; Jeya B Balasubramanian; Daniel Russ; Wendy SW Wong; Wolfgang Maaß; Jonas S Almeida
In: AMIA Summits on Translational Science Proceedings 2024. AMIA Joint Summits on Translational Science, Pages 65-74, American Medical Informatics Association, 5/2024.
Zusammenfassung
Motivation: The proliferation of genetic testing and consumer genomics represents a logistic challenge to the personalized use of GWAS data in VCF format. Specifically, the challenge of retrieving target genetic variation from large compressed files filled with unrelated variation information. Compounding the data traversal challenge, privacy-sensitive VCF files are typically managed as large stand-alone single files (no companion index file) composed of variable-sized compressed chunks, hosted in consumer-facing environments with no native support for hosted execution.
Results: A portable JavaScript module was developed to support in-browser fetching of partial content using byte-range requests. This includes on-the-fly decompressing irregularly positioned compressed chunks, coupled with a binary search algorithm iteratively identifying chromosome-position ranges. The in-browser zero-footprint solution (no downloads, no installations) enables the interoperability, reusability, and user-facing governance advanced by the FAIR principles for stewardship of scientific data.
Availability - https://episphere.github.io/vcf, including supplementary material