PCAWG - Pancancer Analysis of Whole Genomes

{{summary.donorCount | number}} Donors
{{summary.fileCount | number}} Files
{{summary.totalFileSize | bytes}}
Data Type # Donors # Files Format Size
{{type.uiName}} {{type.donorCount | number}} {{type.fileCount | number}} {{ type.fileFormat.join(', ')}} {{type.fileSize | bytes}}
Available data as of {{indexDate | date}}

The Pan-Cancer Analysis of Whole Genomes (PCAWG) study is an international collaboration to identify common patterns of mutation in more than 2,600 cancer whole genomes from the International Cancer Genome Consortium. Building upon previous work which examined cancer coding regions (Cancer Genome Atlas Research Network, The Cancer Genome Atlas Pan-Cancer analysis project, Nat. Genet. 2013 45:1113, Cell. 2018 Apr 5;173(2):283-285), this project explored the nature and consequences of somatic and germline variations in both coding and non-coding regions, with specific emphasis on cis-regulatory sites, non-coding RNAs, and large-scale structural alterations.

In order to facilitate the comparison among diverse tumor types, all tumor and matched normal genomes have been subjected to a uniform set of alignment and variant calling algorithms, and must pass a rigorous set of quality control tests. The research activities have been coordinated by a series of working groups comprising more than 700 scientists.

VCF-format files representing somatic variant calls for single-nucleotide variants, small indels, structural variants and copy number variants can be downloaded using the links in the table at the right. In addition, we provide aligned BAM files for download or cloud-based access via the Cancer Genome Collaboratory and Amazon Web Services. Note that data sets that may contain germline SNPs are controlled tier and require credentials provided by the ICGC Data Access Committee and/or the TCGA dbGaP Data Access Committee. Open tier variants are also available for browsing and querying. Detailed instructions for obtaining access to the controlled tier PCAWG data can be found in the DCC PCAWG documentation pages.

In addition, PCAWG working groups have generated substantial amounts of derived data, including donor clinical and histopathological data, subclonal reconstructions, purity and ploidy information, splice isoforms and mutational signatures. These data are described in individual PCAWG publications and can be downloaded from the ICGC PCAWG Data Release pages.

Pan-Cancer Analysis of Whole Genomes Published!

Read about the ICGC/TCGA analysis of >2,600 whole cancer genomes across 38 tumour types in 23 papers published in Nature and other Nature journals (Feb. 2020). Photo credit: Nik Spencer/Nature.

Publication and Embargo Policy

The PanCancer Analysis of Whole Genomes (PCAWG) project primary and derived datasets are available for use by any researcher under the standard ICGC publication policy terms which allows for unrestricted use and publication provided that data is not used to identify donors, and that controlled tier data is kept confidential. The PCAWG publication embargo expired on July 25, 2019. When using this dataset, please cite reference #1 below.

Major Publications

  1. The ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Network. Pan-cancer analysis of whole genomes. Nature (2020).
  2. Rheinbay, E. et al. Analyses of non-coding somatic drivers in 2,693 cancer whole genomes. Nature (2020).
  3. PCAWG Transcriptome Core Group et al. Genomic basis of RNA alterations in cancer. Nature (2020).
  4. Li, Y. et al. Patterns of somatic structural variation in human cancer genomes. Nature (2020).
  5. Gerstung, M. et al. The evolutionary history of 2,658 cancers. Nature (2020).
  6. Alexandrov, L. B. et al. The Repertoire of Mutational Signatures in Human Cancer. Nature (2020).
  7. Phillips, M. et al. Of Clouds and Genomic Data Protection. Nature (2020).