File Formats: Genomes

chrom.sizes#

igvtools uses chrom.sizes files to define the chromosome lengths for a given genome. The file format is tab delimited, first column is chromosome name and second is its length. There can be more columns present, but they are ignored. Files should be named as follows:

<genomdID>.chrom.sizes

For example, hg18.chrom.sizes.

Cytoband#

The Cytoband file format is used to

A cytoband file is a five-column tab-delimited text file. Each row of the file describes the position of a cytogenetic band. The columns in the file match the columns of the cytoBand table in the UCSC Genome Browser database. These files are downloadable from the UCSC website as "cytoBandIdeo.txt.gz" for many genome assemblies, for example https://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/cytoBandIdeo.txt.gz

Column Example Data Type Description
chrom chr1 string Chromosome
chromStart 0 integer

Start position in chromosome sequence

chromEnd 2300000 integer End position in chromosome sequence
name p36.33 string Name of cytogenetic band
gieStain gneg string Giemsa stain results. Recognized stain values: gneg, gpos50, gpos75, gpos25, gpos100, acen, gvar, stalk

FASTA#

The FASTA file format (.fasta or .fa) is used to specify the reference sequence for an imported genome. Each sequence in the FASTA file represents the sequence for a chromosome. The sequence name in the FASTA file is the chromosome name that appears in the chromosome drop-down list in the IGV tool bar. IGV orders the chromosomes based on their names, not their order in the FASTA file.

A FASTA file is a text file. Each sequence begins with a single-line description, followed by lines of sequence data. The single-line description contains a greater-than (>) symbol in the first column, followed by the sequence name.

FASTA files can be loaded directly from the Genome menu or can be referred to in a JSON file that contains a reference genome specification.

IGV reference genome (JSON)#

As of release 2.11.0 reference genomes can be specified and loaded as JSON files. The previous ".genome" format is now considered deprecated. The format is a json form of the "reference" object description from igv.js, described here. For IGV use, required properties include id, name, and fastaURL. All other properties are optional. An example of a complete json description for the GRCh38 assembly is given below.

Key differences with respect to the ".genome" format are

Fields ending with "url" can contain local file paths. These paths can be absolute or relative to the location of the genome (.json) file.

Example: Human GRCh38 with 2 annotation tracks

Required fields are id,name, fastaURL, indexURL. All other fields are optional.

{
  "id": "hg38",
  "name": "Human (GRCh38/hg38)",
  "fastaURL": "https://s3.amazonaws.com/igv.broadinstitute.org/genomes/seq/hg38/hg38.fa",
  "indexURL": "https://s3.amazonaws.com/igv.broadinstitute.org/genomes/seq/hg38/hg38.fa.fai",
  "cytobandURL": "https://s3.amazonaws.com/igv.org.genomes/hg38/annotations/cytoBandIdeo.txt.gz",
  "aliasURL": "https://s3.amazonaws.com/igv.org.genomes/hg38/hg38_alias.tab",
  "chromosomeOrder": [
    "chr1",
    "chr2",
    "chr3",
    "chr4",
    "chr5",
    "chr6",
    "chr7",
    "chr8",
    "chr9",
    "chr10",
    "chr11",
    "chr12",
    "chr13",
    "chr14",
    "chr15",
    "chr16",
    "chr17",
    "chr18",
    "chr19",
    "chr20",
    "chr21",
    "chr22",
    "chrX",
    "chrY"
  ],
  "tracks": [
    {
      "name": "Refseq Genes",
      "format": "refgene",
      "url": "https://s3.amazonaws.com/igv.org.genomes/hg38/ncbiRefSeq.sorted.txt.gz",
      "indexURL": "https://s3.amazonaws.com/igv.org.genomes/hg38/ncbiRefSeq.sorted.txt.gz.tbi"
    },
    {
      "name": "Gencode v24 genes",
      "format": "gtf",
      "url": "https://s3.amazonaws.com/igv.org.genomes/hg19/gencode.v24.genes.gtf.gz"
    }
  ]
}

File paths

URL properties (all fields that end with url) can be absolute or relative file paths. Relative paths are interpreted as relative to the location of the genome json file. For example, the following definition presumes an annotation file chr22.genes.gtf.gz in the same directory as the json file.

hg19_local_annotations.json

{
  "id": "hg19",
  "name": "Human (CRCh37/hg19)",
  "fastaURL": "https://s3.amazonaws.com/igv.broadinstitute.org/genomes/seq/hg19/hg19.fasta",
  "indexURL": "https://s3.amazonaws.com/igv.broadinstitute.org/genomes/seq/hg19/hg19.fasta.fai",
  "cytobandURL": "https://s3.amazonaws.com/igv.broadinstitute.org/genomes/seq/hg19/cytoBand.txt",
  "aliasURL": "https://s3.amazonaws.com/igv.org.genomes/hg19/hg19_alias.tab",
  "tracks": [
    {
      "name": "Gencode v24 genes",
      "url": "chr22.genes.gtf.gz"
    },

  ]
}

Genome with hidden annotation track

In the example below an annotation file containing protein coding genes from Gencode is loaded to support searching by Gencode gene identifiers.

{
  "id": "hg19",
  "name": "Human (CRCh37/hg19)",
  "fastaURL": "https://s3.amazonaws.com/igv.broadinstitute.org/genomes/seq/hg19/hg19.fasta",
  "indexURL": "https://s3.amazonaws.com/igv.broadinstitute.org/genomes/seq/hg19/hg19.fasta.fai",
  "cytobandURL": "https://s3.amazonaws.com/igv.broadinstitute.org/genomes/seq/hg19/cytoBand.txt",
  "aliasURL": "https://s3.amazonaws.com/igv.org.genomes/hg19/hg19_alias.tab",
  "tracks": [
    {
      "name": "Refseq Genes",
      "url": "https://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/ncbiRefSeq.txt.gz"
    },
    {
      "url": "https://s3.amazonaws.com/igv.org.genomes/hg19/gencode.v24.genes.gtf.gz",
      "hidden": true
    }
  ]
}