The following bioinformatics tools have been developed in the lab:
Data Processing
Je is a suite to handle barcoded fastq files with (or without) Unique Molecule Identifiers (UMIs) and filter read duplicates using these UMIs:
If you have barcodes and/or UMIs in your fastq files, you’ll most likely enjoy Je. Je currently offers 4 tools :- demultiplex to demultiplex multi-samples fastq files which reads contain barcodes and UMIs (or not)
- demultiplex-illu to demultiplex fastq files according to associated index files (contain the sample encoding barcodes). Reads can additionally contain UMIs (inline)
- clip to remove UMIs contained in reads of fastq files that do not need sample demultiplexing
- markdupes to filter BAM files for read duplicates taking UMIs into account
Data Analysis
Implementation of WASP pipeline with inclusion of INDELs
WASP is a suite of tools for unbiased allele-specific read mapping and discovery of molecular QTLs described in: “WASP: allele-specific software for robust discovery of molecular quantitative trait loci“. WASP has two parts, which can be used independently of each other:- Read filtering tools that correct for biases in allele-specific mapping.
- A Combined Haplotype Test (CHT) that tests for genetic association with a molecular trait using counts of mapped and allele-specific reads.
Analysis of (multiplexed) 4C sequencing data:
The package provides a pipeline to detect specific interactions between DNA elements and identify differential interactions between conditions. The statistical analysis in R starts with individual bam files for each sample as inputs. To obtain these files, the package contains a python script (extdata/python/demultiplex.py) to demultiplex libraries and trim off primer sequences. With a standard alignment software the required bam files can be then be generated. Author: Felix A. Klein, EMBL Heidelberg Maintainer: Felix A. Klein <felix.klein@embl.de>Count summarization and normalization for RNA-Seq data
Calculates the coverage of high-throughput short-reads against a genome of reference and summarizes it per feature of interest (e.g. exon, gene, transcript). The data can be normalized as ‘RPKM’ or by the ‘DESeq’ or ‘edgeR’ package. Author: Nicolas Delhomme, Ismael Padioleau, Bastian Schiffthaler, Niklas Maehler Maintainer: Nicolas DelhommeDelhomme N, Padioleau I, Furlong EE and Steinmetz LM (2012). “easyRNASeq: a Bioconductor package for processing RNA-Seq data.” Bioinformatics, in press, pp. in press.
Data Visualisation