CyVerse UK Workshop 2017

CyVerse UK Workshop 2017

This meeting is focused on researchers who are either toward the beginning of their studies or have moved onto a new subject area. We will provide a hands-on sessions that will describe the use of software tools that can interrogate RNAseq, imaging, gene expression or GWAS data. Previous CyVerse users will provide real-life examples of how the software has been successfully used. This is the Learner Track

In addition we will host a concurrent track for more experienced bioinformaticians who wish to learn how to use CyVerse to host their own programs. This is the Intermediate Track. The concurrent tracks will run in separate rooms.

These software tools have been developed as part of the CyVerseUK grant. We will also highlight the opportunities that exist for the sharing of big data in a meaningful manner. This workshop is organised by GARNet with Professor Katherine Denby at the University of York.

Day 1: Monday March 20th

10.30am: Registration Opens

11.30am: Introduction to CyVerse and CyVerseUK

1.00pm: Lunch

2.00pm: Data Sharing, Management and Reuse

3.30pm: Break

Learner Track

4.00pm: CyVerseUK Tools: Earlham Institute: Tuxedo pipeline for RNAseq Analysis - Descriptive hands-on Workshop - User example

Intermediate Track

4.00pm: Working with Docker

6.00pm: Session End

7.00pm: Workshop Dinner at Walmgate Ale House

Day 2: Tuesday March 21st

Learner Track

9.00am: CyVerseUK Tools: University of Nottingham: Image Analysis - Hands on workshop - Introducing BISQUE - Image-Based Phenotyping - Introducing RootNav and RooTrace - Successful Use Example

11.00am: Break

11.30am: CyVerseUK Tools: University of Warwick: Transcriptomic Data Mining - Descriptive hands-on Workshop - User example

Intermediate Track

9.00am: Introducing the Agave API - What Agave is - What Agave isn’t - How you can use it to empower your code with CyVerse hardware

11.00am: Break

11.30am: Working with the Agave API - “Bring your own scripts” - BASH/shell scripting on basic VM hardware

1.30pm: Lunch

Learner Track

2.30pm: CyVerseUK Tools: Earlham Institute: GWAS Analysis Pipeline - Descriptive hands-on Workshop - User example

Intermediate Track

2.30pm: Bringing Agave and Docker together - Building programmatic interfaces - Submitting user-space jobs through CyVerse UK scheduling systems - Building user interfaces

Whole Group Finale

4.00pm: The Future of CyVerse and CyVerseUK

4.30pm: Meeting End

Other Information


There is accomodation available on a first come-first serve basis at the University of York bed & breakfast facility, Franklin House. - This is bookable online through the URL - Select the Accommodation tab, click 'Bed and Breakfast' and then onto 'Book'. Dietary Requirements If you have any dietary requirements then please can you email Geraint at with those details. We will send out information about the workshop dinner closer to the time.

Computer requirements

  1. Please bring a laptop to the workshop for the hands-on portions of the sessions. Closer to the time we will send out the data that we will analyse at the meeting.
  2. Please sign up for a CyVerse account. You can register at and we recommend that you take a brief look at the website before you attend the meeting.
  3. Please ensure your computer is registered with the EduRoam network. If this is not possible we will provide some Guest Logins for the duration of the workshop.


See this event on Eventbrite >> link

Expression Data Analysis Pipeline

Updated on 14 Feb. 2017

Expression Data Analysis Pipeline

Performing an analysis of large-scale expression data can be a daunting task, as any relevant information on the effects of the monitored treatment are diluted by tens of thousands of profiles that are left unaffected by the condition change. The Warwick team of CyVerse UK set out to create a number of apps that can get you from normalised expression time course data to concise biological hypotheses on regulatory functionality. It should be noted that these tools were created with array data in mind, but you are welcome to use count data as well. If this is the case, remember to log-transform your data before feeding it into the apps.

The first step of an analysis is typically the identification of differentially expressed genes (DEGs), often coupled with temporal deconstruction to help sequence the order of relevant events. This goes a long way towards reducing the size of the data and helping you focus on the genes that carry the signal you are trying to decipher. GP2S is an algorithm that allows for the detection of differential expression between two conditions of a time course, and also features an extension that allows for the detection of the time when the gene becomes differentially expressed. If your time course data only features one condition, then the gradient tool is a more fitting method of identifying the timing of events as it is created with a single condition time course dataset in mind. The method can also be potentially used for DEG identification by deeming the genes that get picked up by the method as changing at some point in time to be differentially expressed.

Once in possession of a differentially expressed gene list, further size reduction can be done by performing clustering and biclustering. Algorithms belonging to those families aim to split the data up into groups exhibiting related behaviour. BHC is a hierarchical clustering algorithm that requires minimal user input, and can be used successfully on both time course and static data. TCAP allows a different clustering experience, as the method uses a more complex similarity measure and produces intricate clusters that can feature regulatory interactions extending beyond mere co-regulation. If you're in possession of multiple time course datasets (such as different conditions), you can try Wigwams, which will mine them for modules of genes co-regulated across at least two of the provided datasets.

Gene groups obtained from the above methods can then be mined for relevant biological information, putting the detected expression trends into context. Two common forms of enrichment analysis are GO terms and transcription factor binding sites. In terms of GO term analysis, all the clustering/biclustering methods support the Cytoscape plugin BiNGO, with one of the output files serving as direct input for BiNGO's overrepresentation analysis. Two CyVerse apps allow for the analysis of transcription factor binding sites - MEME-LaB performs de novo mining, detecting novel overrepresented motifs, while HMT screens promoters for known transcription factor binding sites. Once again, input files compatible with the apps are provided on output from the clustering/biclustering apps.

Another potential analysis is the identification of underlying regulatory networks. Such models, inferred based on transcription factor expression levels, show the signalling chains of transcription factors, helping put the observed downstream co-regulatory events captured by clusters/biclusters into context. CSI proposes a model based on how good a job upstream transcription factors' expression profiles do of explaining the downstream transcription factors' expression profiles at a later time point, and the resulting model can be turned into a Cytoscape-friendly network by applying a stringency threshold on the confidence in each edge. Extensions of the algorithm exist - hCSI can infer related networks across multiple datasets, while oCSI can worth with data captured from different species.


Application Description Run on CyVerse
Differencial Expression
GP2S A differential expression algorithm for time series data with a two condition (eg. control/treated) experimental design DE
Gradient Tool An algorithm for the identification of the time of change from single condition time course expression data DE
Network Inference
CSI A network inference algorithm capable of inferring causal regulatory network models from time course expression data DE
hCSI An expansion of CSI network inference to handle multiple time course datasets DE
oCSI An expansion of CSI network inference to handle data from multiple organisms DE
Clustering / Biclustering
BHC A clustering algorithm for expression data originally made available in R, allows for the analysis of both time course or multiple static datasets DE
TCAP A clustering algorithm for time course expression data, identifies complex regulatory groups thanks to a rich information measure DE
Wigwams An algorithm for the extraction of gene groups co-regulated across subsets of multiple time course datasets DE
Transcription Factor Motif Enrichment
HMT A transcription factor binding site overrepresentation analysis algorithm for known motifs DE
MEME-LaB A transcription factor binding site overrepresentation analysis algorithm with novel motif discovery DE

[Tip]: The quickest way to locate our applications on the CyVerse Discovery Environment is to type "uk cyverse" in the applicatio search box. An account for CyVerse is required. Register here if you have not.

Below is a screenshot showing the search result of our apps: Search for CyVerse UK apps