Wheat genetics tool bench for next generation sequencing

Collaborating with the wheat community, we aim to build a suite of software workflows designed for complex polyploids. We will provide users with up-to-date genome references based on the latest International Wheat Genome Sequencing Consortium (IWGSC) assemblies and exome capture designs. Users will also be able to input their own custom polyploid genomes. Based on existing published workflows, or those used by the Hall group, we will provide optimised tools in the iPlant environment for SNP scoring, measuring homologous gene expression and mapping-by-sequencing, with benchmarking datasets and clear user documentation. Further, with the help of the community, we will identify and assess applications that may be optimised for CyVerse, such as KASP SNP assay design, methyl-seq pipelines, and GWAS workflows optimised for wheat and linked to the BBSRC eTILLING project.

Mapping-by-sequencing:

A growing number of mapping-by-sequencing algorithms exist including SHOREmap, MAQGene, MutMap, NGM-Next-generation EMS mutation mapping and CloudMap. Benchmarking these algorithms, and those developed at Liverpool, against existing datasets will generate the most effective mapping-by-sequencing pipeline for Arabidopsis, barley and other custom diploid genomes. This will be made available in the CyVerse environment with benchmarking datasets, a simple interface, and user documentation. For Arabidopsis, we will link mapping intervals with information on synonymous and non-synonymous SNP information and annotation, and provide a pipeline to design CRISPR/Cas9 constructs to knock out target genes. Outputs will be linked to either stock or DNA synthesis centres so users can easily order T-DNA insertion mutants or generate CRISPR/Cas9 constructs.

Genome assembly pipeline

We are implementing an iPlant based de-novo genome assembly pipeline. The aim is to start with raw reads and perform various stages of assembly, namely, quality checking, adapter trimming, error correction, k-mer estimation, insert-size calculation, assembly and gap-closing.  The pipeline would execute in an iterative feedback based manner to improve on the quality of assembled genome, and if necessary, produce a panel of assemblies that can be compared and visualized against each other. To start with, we will first work with SOAP denovo2, and further diversify the pipeline to include Allpathslg, Velvet, Newbler etc. An important consideration for this pipeline would be to have minimal user-interaction during the entire process and use intelligent automation mechanisms to fine-tune the parameters affecting the overall assembly quality. This pipeline is being developed on Stampede at the moment and would eventually be ported to CyVerse resources.

RNA-Seq analysis pipeline

We have a workflow in place to run the Tuxedo suite of programmes for reference based RNA seq analysis. The workflow maps sequences to a reference, assembles the transcripts and creates a custom .gtf file based both on reference transcripts and any possible new transcripts/isoforms. The workflow then uses Cuffdiff to generate differential expression data to show up and down regulation in gene expression. The workflow also produces comparative R plots including a heat map, scatter plots and volcano plots along with a list of significantly differentially expressed genes. Currently the first version of the workflow can take up to 4 conditions with an unlimited number of replicates for each condition. We plan to include a time course version of the workflow in near future.