EgyptRef is a reference genome for Egyptian and North African populations to complement the Genome Reference Consortium human genome (GRCh).
The EgyptRef project was initiated by the genetics and systems biology divisions of LIED, Lübeck University, Germany and MERC, Mansoura University, Egypt.
EgyptRef was published on September 18th, 2020 in Nature Communications, see the manuscript here!
I. Wohlers, A. Künstner, M. Munz, M. Olbrich, A. Fähnrich, V. Calonga-Solís, C. Ma, M. Hirose, S. El-Mosallamy, M. Salama, H. Busch & S. Ibrahim.
An integrated personal and population-based Egyptian genome reference.
Nat Commun 11, 4719 (2020). https://doi.org/10.1038/s41467-020-17964-1
The human reference genome is used extensively in medical research to detect variations related to disease. It is, for the most part, a single consensus representation based on a limited number of individuals with European ancestry. Despite continuous efforts, genome-wide population-specific variation is missing in the current reference genome. Recent advances in sequencing technology and genome assembly approaches provide the unique opportunity to investigate the architectures of human genomes across different populations.
We have taken advantage of these technologies (PacBio, 10X Genomics, Illumina) to sequence and de-novo assemble the genome of an Egyptian individual. We integrated the sequences of an additional 109 Egyptian individuals to generate an Egyptian Reference Human Genome. The quality of the Egyptian reference genome is comparable to the few other publicly available reference genomes, e.g. Korean and African. Comparing Egyptian genetic variant data to African and European variants showed that Egyptians constitute an individual genetic group that does carry specific “Egyptian” gene variants.
The generation of the first Egyptian -- and also first North African -- reference genome and the identification of population-specific variation will support genetic research and “precision” medicine in Egypt and North Africa.
The next stage of the project is to extend our sequencing efforts to better characterize “Egyptian” genetic variations and evaluate their relevance for common diseases such as cancer.
The EgyptRef team is supported by the Universities of Lübeck and Mansoura, the German Science Foundation, DFG, excellence program (EXC 306) and the DAAD.
EgyptRef published in Nature Communications |
We performed admixture analysis with latest variant data from 5,429 individuals of 144 world-wide populations, see Population Genome -> Population Genetics. Here is an interactive PCA plot. |
Dr. Inken Wohlers presentation "An Egyptian genome reference – A step towards precision medicine in North Africa" at PAHGC awarded as best oral presentation. |
EgyptRef small variant data of 110 Egyptians can be queried online, see tab Data Access. |
EgyptRef will be presented at the 8th Pan Arab Human Genetics Conference (PAHGC). |
EgyptRef data-based variant effect results have been presented at an EMBL Symposium on Systems Genetics: From Genomes to Complex Traits. See the poster here. |
EgyptRef and all accompanying data generated within the EgyptRef project will be publicly available upon journal publication. Summary statistics are available at this web site.
Prof. Saleh Ibrahim, Genetics Division
Prof. Hauke Busch, Medical Systems Biology Division
University of Lübeck
Lübeck Institute of Experimental Dermatology
Ratzeburger Allee 160
23562 Lübeck
E-Mail: Saleh.Ibrahim(at)uni-luebeck(dot).de
E-mail: Hauke.Busch(at)uni-luebeck(dot).de
Phone: +49 (0) 451 3101 8401
Fax: +49 (0) 451 3101 8404
Universität zu Lübeck
Lübeck Institute for Experimental Dermatology
Ratzeburger Allee 160, 23562 Lübeck
Die Universität zu Lübeck ist eine Stiftung des öffentlichen Rechts. Sie wird durch die Präsidentin Prof. Dr. Gabriele Gillessen-Kaesbach vertreten.
Die Hochschulen nehmen ihre Aufgaben im eigenen Namen unter Rechtsaufsicht des Landes wahr (Selbstverwaltungsangelegenheiten).
Rechtsaufsicht:
Ministerium für Soziales, Gesundheit, Wissenschaft und Gleichstellung
Düsternbrooker Weg 104
24105 Kiel
DE 202095138
Prof. Saleh Ibrahim
Prof. Hauke Busch
Universität zu Lübeck
Lübeck Institute for Experimental Dermatology
Ratzeburger Allee 160, 23562 Lübeck
Tel.: +49 (0) 451 3101 8470
E-Mail: Saleh.Ibrahim(at)uni-luebeck(dot).de
E-Mail: Hauke.Busch(at)uni-luebeck(dot).de
Dieses Impressum gilt für die Informationsangebote des Lübeck Institute for Experimental Dermatology (LIED) der Universität zu Lübeck im WWW, die durch den Vermerk "(c) Lübeck Institute of Experimental Dermatology" gekennzeichnet sind und über die URLs http://www.egyptian-genome.org erreichbar sind.
Für alle anderen Seiten auf diesem WWW-Server liegt die redaktionelle Verantwortlichkeit bei den jeweiligen Stellen oder Personen, die die Seiten erstellt haben.
Die Texte und Fotos auf den Seiten, die auf dieses Impressum verweisen, sind urheberrechtlich geschützt.
Das Kopieren dieser Dateien und ihre evtl. Veränderung sind daher ohne Genehmigung des Urhebers (Lübeck Institute of Experimental Dermatology) nicht gestattet.
Unberührt davon bleibt das "Kopieren" der Dateien auf den eigenen Rechner, um sich die WWW-Seiten auf einem Browser anzuschauen.
Ausgenommen sind ebenfalls eventuelle Pressemitteilungen.
Sie sind dem Inhalt nach von jedermann frei und ohne besondere Genehmigung weiterverwendbar.
Die Zusammenstellung der Informationen auf dieser Website wurde mit größtmöglicher Sorgfalt vorgenommen.
Dennoch kann keinerlei Gewähr für Aktualität, Richtigkeit, Vollständigkeit und Qualität der bereitgestellten Daten übernommen werden.
Haftungsansprüche gegen das Lübeck Institute for Experimental Dermatology oder die Autoren bzw. Verantwortlichen dieser Website für Schäden materieller oder immaterieller Art, die auf eventuell fehlerhaften oder unvollständigen Daten beruhen, sind, soweit nicht Vorsatz oder grobe Fahrlässigkeit vorliegt, ausgeschlossen.
Das Vorgesagte gilt auch für Informationen auf Websites, auf die mittels eines Hyperlinks verwiesen wird.
Der Inhalt dieser Websites liegt vollständig außerhalb des Verantwortungsbereiches des Lübeck Institute for Experimental Dermatology.
Die Websites waren jedoch zur Zeit der Verlinkung frei von illegalen Inhalten.
Auf die Gestaltung der gelinkten Websites kann nicht Einfluss genommen werden;
die Rechte an diesen Seiten sowie die Verantwortlichkeit für deren Inhalt liegen ausschließlich beim Drittanbieter.
Das LIED distanziert sich von jeglichem angebotenen Inhalt, wenn sich der Inhalt eines Links dahingehend ändert, dass Informationen übermittelt werden, die nicht mehr mit Angeboten der Universität zu Lübeck in Verbindung zu bringen sind.
Dies gilt insbesondere für Inhalte, deren Verbreitung nach deutschem und ausländischem Recht verboten ist und deren Beihilfe zur Verbreitung strafrechtlich verfolgt wird.
Für Schäden, die aus fehlerhaften oder unvollständigen Inhalten auf den mittels Link verwiesenen Websites resultieren, haften das Lübeck Institute for Experimental Dermatology und die Autoren bzw. Verantwortlichen dieser Website nicht.
The Egyptian genome reference in based on a de novo assembly of an Egyptian individual. This high-quality assembly, called EGYPT, is based on different types of next-generation sequencing data: PacBio long reads, 10x linked reads and Illumina paired-end short reads. EGYPT was generated from an assembly generated with wtdbg2, EGYPT_wtdbg2 with gaps filled by sequences from an alternative FALCON-based assembly, EGYPT_falcon. We compare its quality with a Korean de novo assembly, AK1, and a chromosome level assembly of a YORUBA individual, YORUBA.
EGYPT_wtdbg2 is based on assembly with wtdbg2.
J. Ruan and H. Li
Fast and accurate long-read assembly with wtdbg2.
bioRxiv. doi:10.1101/530972, 2019
EGYPT_falcon is based on assembly with FALCON.
C.S. Chin et al.
Phased diploid genome assembly with single-molecule real-time sequencing.
Nat Methods, 13(12):1050-1054, 2016
The visual summaries and comparative views of the EGYPT genome assembly, its underlying two draft assemblies, EGYPT_falcon and EGYPT_wtdbg2, as well as a Korean assembly, AK1, and a YORUBA assembly, YORUBA, have been generated with Icarus. See the Icarus manual here for information on the interpretation of visualisations.
A. Mikheenko, G. Valin, A. Prjibelski, V. Saveliev, A. Gurevich
Icarus: visualizer for de novo assembly evaluation.
Bioinformatics 32(21):3321-3323, 2016
The quality assessment of the EGYPT genome assembly, its underlying two draft assemblies, EGYPT_falcon and EGYPT_wtdbg2, as well as the Korean assembly AK1 and a YORUBA assembly, YORUBA, have been generated with QUAST-LG. See the QUSAST manual here for information on the interpretation of the various QC statistics.
A. Mikheenko, A. Prjibelski, V. Saveliev, D. Antipov, A. Gurevich
Versatile genome assembly evaluation with QUAST-LG.
Bioinformatics 34(13):i142-i150, 2018.
For different classes and types of repeats, the number of identified repeats, their combined length and the percent of sequence they cover is listed for the reference genome GRCh38, Yoruba assembly YORUBA, Korean assembly AK1 and wtdbg2- and falcon-based Egyptian assemblies.
SNVs and SVs between assemblies can be computed for any gene or genetic region for the Egyptian assembly, EGYPT, a Korean assembly, AK1, the assembly of a Yoruba individual, YORUBA. Assembly differences together with various other data tracks generated within the project can be viewed in the IGV viewer. An example is given in the following figure.
Gene-centric data extraction includes the following:
For identifying assembly differences, we used the tool NucDiff, which detects all types of assembly differences, classified by the authors here.
K. Khelik, K. Lagesen, G.K. Sandve, T. Rognes, A.J Nederbragt
NucDiff: in-depth characterization and annotation of differences between two sets of DNA sequences.
BMC Bioinformatics;18(1):338, 2017.
Using 10x data, we performed variant phasing and structural variant calling for the EGYPT individual. Phasing results are summarised in the following table and an example of phased variants and corresponding linked-reads in the following figure.
Phasing has been performed based on 10x sequencing data from four libraries by using 10x Genomics software LongRanger wgs version 2.2.2. We provided the variants called from combined genotyping with the other Egyptian individuals as a precalled set of variants to be phased.
We obtain an Egyptian population genome by integrating small and structural variant data of overall 110 Egyptian individuals.
The overall number of identified variants is given below.
SNVs and indels have been called in 110 Egyptian individuals and annotated. The figure displays the genetic regions in which the variants are located.
SNVs and small indels have been called with GATK 3.8 using the parameters of the best practice workflow.
A. McKenna, M. Hanna, E. Banks, A. Sivachenko, K. Cibulskis, A. Kernytsky, K. Garimella, D. Altshuler, S. Gabriel, M. Daly, M.A. DePristo
The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.
Genome Res, 20(9):1297-303, 2010
Variants have been annotated with ANNOVAR and VEP.
K. Wang, M. Li, H. Hakonarson
ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data
Nucleic Acids Res;38(16):e164, 2010
W. McLaren, L. Gil, S.E. Hunt, H.S. Riat, G.R. Ritchie, A. Thormann, P. Flicek, F. Cunningham
The Ensembl Variant EffectPredictor
Genome Biology, 17(1):122, 2016
Different types of structural variants with lengths of various orders of magnitude have been called.
Structural variants have been called with Delly2.
T. Rausch, T. Zichner, A. Schlattl, A.M. Stütz, V. Benes, J.O. Korbel
DELLY: structural variant discovery by integrated paired-end and split-read analysis.
Bioinformatics, 28(18):i333-i339, 2012
We collapsed overlapping SVs for every individual to determine the average number of SVs; the results are displayed in the following table.
Structural variant calls of type deletion, insertion, inversion, duplication were collapsed per individual by dividing variants calls into groups with respect to their chromosomal region (chromosome, start position, end position). That is, each group only contains variant calls that are overlapping with each other and with none of the variant calls in any other group. Translocations were collapsed per individual by merging variants with the same original chromosomal position and the same new chromosomal position.
We identified Egyptian common sequences that are not present within the reference genome GRCh38, but within the Egyptian de novo assembly.
Unique insertions in the table below have a coverage of more than 5 reads in at least 10 of 110 Egyptian individuals when mapping such reads to the assembly, which couldn't be mapped to the reference genome or GATK bundle sequences. Given is for every NUI sequence start, end and length; alignment block start and end in GRCh38 and in EGYPT assembly; number of SNVs/indels within the NUI (see Table below for a list of these variants); Sequence ID if significantly similar to NUI reported by PUBMED IDs 30072691, 28250455, or 28104618, resp. with corresponding alignment information.
Within these sequences, we identified small variants.
We determined mitochondrial haplogroups for 327 Egyptian individuals. The pie chart displays Mt haplogroup frequencies.
Haplogroup assignment was performed using HaploGrep 2.
H. Weissensteiner, D. Pacher, A. Kloss-Brandstätter, L. Forer, G. Specht, H.J. Bandelt, F. Kronenberg, A. Salas, S. Schönherr
HaploGrep 2: mitochondrial haplogroup classification in the era of high-throughput sequencing.
Nucleic Acids Res, 44(W1):W58-63, 2016
Principal component analysis was performed using the SMARTPCA program of EIGENSOFT after variant filtering and LD pruning.
N. Patterson, A.L. Price, D. Reich.
Population structure and eigenanalysis.
PLoS Genet, 2(12):e190, 2006
Admixture analysis was performed with ADMIXTURE.
D. H. Alexander, K. Lange.
Enhancements to the ADMIXTURE algorithm for individual ancestry estimation.
BMC Bioinformatics 18;12:246, 2011
Datasets used in population genetics analyses:
Open here a separate window to view an interactive version of this PCA plot.
We investigate variant effects on
We investigated the effect of exonic variants, i.e. variants that are within protein-coding sequence, see the following figure.
Variants have been filtered for exonic variants and their consequences annotated with ANNOVAR.
K. Wang, M. Li, H. Hakonarson
ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data
Nucleic Acids Res;38(16):e164, 2010
We computed allelic expression in blood using 10x-phased variant data and RNA sequencing-based expression data of the EGYPT individual. See the phASEr documentation here for documentation of the results table.
We computed allelic expression using the tool phASER on the 10x-phased variants.
S.E. Castel, P. Mohammadi, W.K. Chung, Y. Shen, T. Lappalainen
Rare variant phasing and haplotypic expression from RNA sequencing with phASER.
Nat Commun, 7:12817, 2016.
For replicated, high quality disease-associated tag SNPs from the GWAS catalog, we compared allele frequencies and proxy variants between Europeans (from 1000 Genomes, n=503) and Egyptians (n=110).
Data integration was performed using GWAS from the GWAS catalog.
A. Buniello, et al.
The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019.
Nucleic Acids Research, Vol. 47 (Database issue): D1005-D1012, 2019.
Top: Comparison of allele frequencies and number of proxy SNPs for 3,698 GWAS tag SNPs called in a minimum of 100 Egyptians.
Top: Illustration of the proxy SNP comparison. A European GWAS tag SNP (center) and variants in Europeans (top) and Egyptians (bottom). Lines denote variants in high LD. The tag SNP has 7 proxy variants in Europeans and 3 in Egyptians. Light blue/red variants are no proxy variants in Europeans/Egyptians. Two proxy variants are shared. Thus 2 of 7 European (~29%) and 2 of 3 Egyptian (~67%) variants are shared. Further 5 of 7 European proxies are European-only (~71%) and 1/3 Egyptian proxies are Egyptian-only (~33%).
Top: Proxy SNP comparions for 3,698 GWAS tag SNPs. European shared: Percentage of European proxy SNPs shared with Egyptian proxy SNPs. European only: Percentage of European proxy SNPs not shared with Egyptian proxies. Egyptian shared / Egyptian only respectively.
The small variant data (SNV and indels) in the Egyptian cohort can be queried using RSID or genomic ranges.
When using variant allele frequencies for downstream analyses, please consider the overall number of individuals which have been genotyped for a particular variant (column “AN").
AN is the number of genotyped chromosomes, which is twice the number of genotyped individuals.
AN is usually more than 200, meaning that more than 100 individuals have been genotyped, but for some SNVs it can be less and allele frequencies will not be reliable.
Note: If a range is specified, only a maximum of 100 variants are displayed. If you would like to extract a larger number of variants, we suggest to use the VCF file available at EGA after journal publication.
Variant number:
POS | RSID | REF | ALT | AF | AC | AN |
---|
Data is available at the European Genome-phenome Archive (EGA) under EGA study ID EGAS00001004303.
Final meta assembly: EGYPT
FALCON-based assembly version: EGYPT_falcon
WTDBG2-based assembly version: EGYPT_wtdbg2
Assemblies will be available at NCBI soon.