Genomes to Natural Products Network (GNPN):
Stanford Genome Technology Center


MiSeq fastq preprocessing

Overview

Please see NextGen Sequencing of Assembled Gene Clusters for more background.

The fastq-files are first split into small packages of 100,000 reads in the UNIX console and then processed using an R-script called analyze_fastq.R. This script was run in R version 3.1.1 with packages ‘ShortRead’ and ‘Biostrings’ installed.

In UNIX:

# set sample name
X=_name_of_miseq_run_
# first split fastq in unix console
zcat ${X}_S1_L001_R1_001.fastq.gz | split -l 400000 - xread-1-
zcat ${X}_S1_L001_R2_001.fastq.gz | split -l 400000 - xread-2-

Copy the miseq_sample_key.txt file to base directory.

Then transfer xread files to a new folder /xreads/ in the base directory.

Run the R script analyze_fastq.R.

Download

Download the analyze_fastq.R script and miseq_sample_key.txt file as a zip file here.



 



Inquiries can be addressed to Maureen Hillenmeyer (maureenh at stanford.edu) and Angela Chu (amchu at stanford.edu)
Stanford Genome Technology Center