Genomes to Natural Products Network (GNPN):
Stanford Genome Technology Center

MiSeq fastq preprocessing


Please see NextGen Sequencing of Assembled Gene Clusters for more background.

The fastq-files are first split into small packages of 100,000 reads in the UNIX console and then processed using an R-script called analyze_fastq.R. This script was run in R version 3.1.1 with packages ‘ShortRead’ and ‘Biostrings’ installed.


# set sample name
# first split fastq in unix console
zcat ${X}_S1_L001_R1_001.fastq.gz | split -l 400000 - xread-1-
zcat ${X}_S1_L001_R2_001.fastq.gz | split -l 400000 - xread-2-

Copy the miseq_sample_key.txt file to base directory.

Then transfer xread files to a new folder /xreads/ in the base directory.

Run the R script analyze_fastq.R.


Download the analyze_fastq.R script and miseq_sample_key.txt file as a zip file here.


Inquiries can be addressed to Maureen Hillenmeyer (maureenh at and Angela Chu (amchu at
Stanford Genome Technology Center