Phased genome assemblies and pangenome graphs of human populations of Japan and Saudi Arabia

Year: 2025

Venue: Scientific Data

Authors: Maxat Kulmanov, Saeideh Ashouri, Yang Liu, Marwa Abdelhakim, Ebtehal Alsolme, Masao Nagasaki, Yasuyuki Ohkawa, Yutaka Suzuki, Rund Tawfiq, Katsushi Tokunaga, Toshiaki Katayama, Malak S. Abedalthagafi, Robert Hoehndorf, Yosuke Kawai

DOI: 10.1038/s41597-025-05652-y

Abstract

The selection of a reference sequence in genome analysis is critical, as it serves as the foundation for all downstream analyses. Recently, the pangenome graph has been proposed as a data model that incorporates haplotypes from multiple individuals. Here we present JaSaPaGe, a pangenome graph reference for Saudi Arabian and Japanese populations, both of which have been significantly underrepresented in previous genomic studies. We constructed JaSaPaGe from high-quality phased diploid assemblies which were made utilizing PacBio high-fidelity long reads, Nanopore long reads, and Hi-C short reads of 9 Saudi and 10 Japanese individuals. Quality evaluation of the pangenome graph by variant calling showed that our pangenome outperformed earlier linear reference genomes (GRCh38 and T2T-CHM13) and showed comparable performance to the pangenome graph provided by the Human Pangenome Reference Consortium (HPRC), with more variants found in Japanese and Saudi samples using their population-specific pangenomes. This pangenome reference will serve as a valuable resource for both the research and clinical communities in Japan and Saudi Arabia.