Web supplement to
"Computational identification and analysis of
orphan assembly line polyketide synthases"

Robert V. O'Brien, Ronald W. Davis, Chaitan Khosla, Maureen E. Hillenmeyer

C elegans PKS


The increasing availability of DNA sequence data offers an opportunity for identifying new assembly-line polyketide synthases (PKSs) that produce biologically active natural products. We developed an automated method to extract and consolidate all multimodular PKS sequences (including hybrid PKS/non-ribosomal peptide synthetases) in the National Center for Biotechnology Information (NCBI) database, generating a non-redundant catalog of 885 distinct assembly-line PKSs, the majority of which were orphans associated with no known polyketide product. Two in silico experiments highlight the value of this search method and resulting catalog. First, we identified an orphan that could be engineered to produce an analog of albocycline, an interesting antibiotic whose gene cluster has not yet been sequenced. Second, we identified and analyzed a hitherto overlooked family of metazoan multimodular PKSs, including one from Caenorhabditis elegans. We also developed a comparative analysis method that identified sequence relationships among known and orphan PKSs. As expected, PKS sequences clustered according to structural similarities between their polyketide products. The utility of this method was illustrated by highlighting an interesting orphan from the genus Burkholderia that has no close relatives. Our search method and catalog provide a community resource for the discovery of new families of assembly-line PKSs and their antibiotic products.

Download paper

Download supplement

Graphical interface to orphan PKSs

To see antiSMASH-generated domain annotations of orphan PKSs, type a genbank ID in the box on the left. Identify genbank IDs of interest to you by exploring the graphical dendrogram and Excel file in the Supplement, or browse the catalog.

Inquiries can be addressed to Maureen Hillenmeyer (maureenh at stanford.edu).
Stanford Genome Technology Center