Tools – The Furlong Laboratory

The following bioinformatics tools have been developed in the lab:

Data Processing

Je is a suite to handle barcoded fastq files with (or without) Unique Molecule Identifiers (UMIs) and filter read duplicates using these UMIs:

If you have barcodes and/or UMIs in your fastq files, you’ll most likely enjoy Je. Je currently offers 4 tools :

demultiplex to demultiplex multi-samples fastq files which reads contain barcodes and UMIs (or not)
demultiplex-illu to demultiplex fastq files according to associated index files (contain the sample encoding barcodes). Reads can additionally contain UMIs (inline)
clip to remove UMIs contained in reads of fastq files that do not need sample demultiplexing
markdupes to filter BAM files for read duplicates taking UMIs into account

In short, Je demultiplex, demultiplex-illu and clip add extracted barcodes and UMIs to the read headers and reformat read headers to fulfill read mappers requirements. Indeed most read mappers (bowtie, bwa…) expect headers for read_1 and read_2 to be strictly identical. After mapping, markdupes identifies PCR (and optical) read duplicates based on their mapping positions and UMIs found in read headers.

Go to the GBCS website to get more information and see the documentation…

Data Analysis

WASP-INDEL

Implementation of WASP pipeline with inclusion of INDELs

WASP is a suite of tools for unbiased allele-specific read mapping and discovery of molecular QTLs described in: “WASP: allele-specific software for robust discovery of molecular quantitative trait loci“. WASP has two parts, which can be used independently of each other:

Read filtering tools that correct for biases in allele-specific mapping.
A Combined Haplotype Test (CHT) that tests for genetic association with a molecular trait using counts of mapped and allele-specific reads.

The original WASP pipeline does not handle INDELs and discards reads overlapping them. WASP-INDEL includes INDELs in both WASP steps, controlling for mapping biases caused by INDELs (read filtering) and testing them for genetic associations (CHT).

Author: Adam Raboniwitz, EMBL Heidelberg Maintainer: Adam Raboniwitz <adamrabs@hotmail.com>

R Bioconductor package: FourCSeq

Analysis of (multiplexed) 4C sequencing data:

The package provides a pipeline to detect specific interactions between DNA elements and identify differential interactions between conditions. The statistical analysis in R starts with individual bam files for each sample as inputs. To obtain these files, the package contains a python script (extdata/python/demultiplex.py) to demultiplex libraries and trim off primer sequences. With a standard alignment software the required bam files can be then be generated. Author: Felix A. Klein, EMBL Heidelberg Maintainer: Felix A. Klein <felix.klein@embl.de>

R Bioconductor package: easyRNASeq

Count summarization and normalization for RNA-Seq data

Calculates the coverage of high-throughput short-reads against a genome of reference and summarizes it per feature of interest (e.g. exon, gene, transcript). The data can be normalized as ‘RPKM’ or by the ‘DESeq’ or ‘edgeR’ package. Author: Nicolas Delhomme, Ismael Padioleau, Bastian Schiffthaler, Niklas Maehler Maintainer: Nicolas Delhomme Citation (from within R, enter citation(“easyRNASeq”)):

Delhomme N, Padioleau I, Furlong EE and Steinmetz LM (2012). “easyRNASeq: a Bioconductor package for processing RNA-Seq data.” Bioinformatics, in press, pp. in press.

Data Visualisation

JBrowse Plugin: Dynamix

A plugin for JBrowse to browse genomes dynamically

The Furlong lab proposes Dynamix, a tool allowing users to gain visual analysis power when browsing a genome by dynamically modifying the set of visualised track, restricting it to the ones of interest on the observed genomic region.

Check out more on this page …