hr00343_Soil-metagenome-1

Summary statistics

Table of statistics

Link to full table
Accession Raw reads Quality and adapter trimmed reads
Seqs Bases MaxLen Seqs Bases MinLen MeanLen MaxLen Files %seqs lost %bases lost
002002_Airport 213,901,150 21,604,016,150 101 211,842,174 21,038,140,810 30 99 101 R1 R2  0 2
002003_Wally 64,378,540 6,502,232,540 101 63,589,926 6,324,397,020 30 99 101 R1 R2  1 2
002004_Nocton-Corn 145,430,068 14,688,436,868 101 143,889,198 14,305,680,413 30 99 101 R1 R2  1 2
002005_Nocton-Soy 132,995,268 13,432,522,068 101 131,587,916 13,051,593,048 30 99 101 R1 R2  1 2
002006_Sedgewick 131,481,768 13,279,658,568 101 130,369,322 12,974,335,408 30 99 101 R1 R2  0 2
002007_ItsyBitsy 173,099,062 17,483,005,262 101 171,532,350 17,054,225,245 30 99 101 R1 R2  0 2

Help on summary statistics

********************* * NOTE: As of 21-Aug-2012 the help needs to be rewritten. So use the following with care. ********************* This section gives an overview of the work done for the accessions (aka, samples). In general an accession will go through each processing step only once. However there are times when the same processing step will be run multiple times with different parameters. In this case all related steps will be in one large column with sub-columns for the differences.

FastA/FastQ
For each processing step that results in a FastA or FastQ file there are three common and untitled lines at the top:

  1. Number of reads, contigs or scaffolds.
  2. Number of bases in the reads, contigs or scaffolds.
  3. The minimum and maximum length of the reads, contigs or scaffolds. If there is a middle number inside braces then this is the mean length. E.g., '30-[139]-150' indicates that the shortest read is 30 base pairs, the longest 150 and the mean length is 139.
  4. If there is a fourth line that starts with 'cutoff' then the count of reads and number of bases are those of that cutoff length and greater. Otherwise all reads are considered. A 'cutoff' is most often used in contig creating programs such as 'ABySS' or 'Trinity' which tend to create a bunch of small and less interesting contigs.
  5. There may be a fifth line with a link to the FastA/FastQ files.

BAM
For processing steps that result in a BAM output file then there are four common lines.

  1. Number of reads, contigs or scaffolds [untitled].
  2. Percent of 'properly paired' [%PP] and 'singleton' [%Si] reads as given by the 'samtools flagstat' program.
  3. Percent of 'mapped' [%Map] and 'unmapped' [%Un] reads as given by the 'samtools idxstats' program.
  4. A link to the BAM file.

Specific sections:
Unaligned These are the raw reads from the sequencer. Unless there was something amiss in the sequencing run then the length range should have the same lower and upper limits; e.g., '100-100' for a 100-base run. Because these files come in many 'small' (~2 GB) files separated by lane and read direction then if you want these files then you will have to go the Unaligned directory in each accession.

Unaligned_filtered We run either Trimmomatic and/or fastx_clipper in order to remove adapters and to clip poor quality bases from both the 5' and 3' ends of the Unaligned reads. Any reads below a mimimum length are discarded. The Unaligned_filtered reads are the one most often used for further processing. The filtered reads for a sample are put all together into one large file per read direction. You can access these either via the given link, the 'Unaligned_filtered' directory(s) below or via the per-sample directory.

Directories and files

Directories (mostly accessions/samples)

002002_Airport
002003_Wally
002004_Nocton-Corn
002005_Nocton-Soy
002006_Sedgewick
002007_ItsyBitsy

Files .. if large you may wish to save them first

Size Name
3K summary_stats.html
3K summary_stats_help.html

FastA, FastQ and SAM/BAM files in current directory and sub-directories

In Unaligned directory

Total Seqs/Reads Nucleotides Range Mean N50 File name
52,800,503 5,332,850,803 101-101 101 101 002002_Airport_GTGGCC_L002_R1_MANY.fastq.gz
52,800,503 5,332,850,803 101-101 101 101 002002_Airport_GTGGCC_L002_R2_MANY.fastq.gz
54,150,072 5,469,157,272 101-101 101 101 002002_Airport_GTGGCC_L003_R1_MANY.fastq.gz
54,150,072 5,469,157,272 101-101 101 101 002002_Airport_GTGGCC_L003_R2_MANY.fastq.gz
16,124,585 1,628,583,085 101-101 101 101 002003_Wally_GTTTCG_L002_R1_MANY.fastq.gz
16,124,585 1,628,583,085 101-101 101 101 002003_Wally_GTTTCG_L002_R2_MANY.fastq.gz
16,064,685 1,622,533,185 101-101 101 101 002003_Wally_GTTTCG_L003_R1_MANY.fastq.gz
16,064,685 1,622,533,185 101-101 101 101 002003_Wally_GTTTCG_L003_R2_MANY.fastq.gz
36,310,019 3,667,311,919 101-101 101 101 002004_Nocton-Corn_CGTACG_L002_R1_MANY.fastq.gz
36,310,019 3,667,311,919 101-101 101 101 002004_Nocton-Corn_CGTACG_L002_R2_MANY.fastq.gz
36,405,015 3,676,906,515 101-101 101 101 002004_Nocton-Corn_CGTACG_L003_R1_MANY.fastq.gz
36,405,015 3,676,906,515 101-101 101 101 002004_Nocton-Corn_CGTACG_L003_R2_MANY.fastq.gz
32,683,570 3,301,040,570 101-101 101 101 002005_Nocton-Soy_GAGTGG_L002_R1_MANY.fastq.gz
32,683,570 3,301,040,570 101-101 101 101 002005_Nocton-Soy_GAGTGG_L002_R2_MANY.fastq.gz
33,814,064 3,415,220,464 101-101 101 101 002005_Nocton-Soy_GAGTGG_L003_R1_MANY.fastq.gz
33,814,064 3,415,220,464 101-101 101 101 002005_Nocton-Soy_GAGTGG_L003_R2_MANY.fastq.gz
32,767,247 3,309,491,947 101-101 101 101 002006_Sedgewick_ACTGAT_L002_R1_MANY.fastq.gz
32,767,247 3,309,491,947 101-101 101 101 002006_Sedgewick_ACTGAT_L002_R2_MANY.fastq.gz
32,973,637 3,330,337,337 101-101 101 101 002006_Sedgewick_ACTGAT_L003_R1_MANY.fastq.gz
32,973,637 3,330,337,337 101-101 101 101 002006_Sedgewick_ACTGAT_L003_R2_MANY.fastq.gz
42,592,787 4,301,871,487 101-101 101 101 002007_ItsyBitsy_ATTCCT_L002_R1_MANY.fastq.gz
42,592,787 4,301,871,487 101-101 101 101 002007_ItsyBitsy_ATTCCT_L002_R2_MANY.fastq.gz
43,956,744 4,439,631,144 101-101 101 101 002007_ItsyBitsy_ATTCCT_L003_R1_MANY.fastq.gz
43,956,744 4,439,631,144 101-101 101 101 002007_ItsyBitsy_ATTCCT_L003_R2_MANY.fastq.gz

In Unaligned_filtered directory

Total Seqs/Reads Nucleotides Range Mean N50 File name
105,921,087 10,504,144,774 30-101 99 101 002002_Airport_GTGGCC_R1_filtered.fastq
105,921,087 10,533,996,036 30-101 99 101 002002_Airport_GTGGCC_R2_filtered.fastq
31,794,963 3,160,033,315 30-101 99 101 002003_Wally_GTTTCG_R1_filtered.fastq
31,794,963 3,164,363,705 30-101 99 101 002003_Wally_GTTTCG_R2_filtered.fastq
71,944,599 7,148,182,659 30-101 99 101 002004_Nocton-Corn_CGTACG_R1_filtered.fastq
71,944,599 7,157,497,754 30-101 99 101 002004_Nocton-Corn_CGTACG_R2_filtered.fastq
65,793,958 6,516,642,991 30-101 99 101 002005_Nocton-Soy_GAGTGG_R1_filtered.fastq
65,793,958 6,534,950,057 30-101 99 101 002005_Nocton-Soy_GAGTGG_R2_filtered.fastq
65,184,661 6,482,962,609 30-101 99 101 002006_Sedgewick_ACTGAT_R1_filtered.fastq
65,184,661 6,491,372,799 30-101 99 101 002006_Sedgewick_ACTGAT_R2_filtered.fastq
85,766,175 8,520,755,309 30-101 99 101 002007_ItsyBitsy_ATTCCT_R1_filtered.fastq
85,766,175 8,533,469,936 30-101 99 101 002007_ItsyBitsy_ATTCCT_R2_filtered.fastq

Quality control files and directories

Quality control

Accession FastQC rRNA matches phiX matches
Unaligned Unaligned_filtered Unaligned_filtered Unaligned_filtered
002002_Airport Link Link 0.1% 0.1%
002003_Wally Link Link 0.1% 0.1%
002004_Nocton-Corn Link Link 0.2% 0.1%
002005_Nocton-Soy Link Link 0.1% 0.1%
002006_Sedgewick Link Link 0.1% 0.1%
002007_ItsyBitsy Link Link 0.1% 0.1%