The following bioinformatics tools have been developed in the lab:
Data Processing
Je is a suite to handle barcoded fastq files with (or without) Unique Molecule Identifiers (UMIs) and filter read duplicates using these UMIs:
If you have barcodes and/or UMIs in your fastq files, you’ll most likely enjoy Je. Je currently offers 4 tools :
- demultiplex to demultiplex multi-samples fastq files which reads contain barcodes and UMIs (or not)
- demultiplex-illu to demultiplex fastq files according to associated index files (contain the sample encoding barcodes). Reads can additionally contain UMIs (inline)
- clip to remove UMIs contained in reads of fastq files that do not need sample demultiplexing
- markdupes to filter BAM files for read duplicates taking UMIs into account
In short, Je demultiplex, demultiplex-illu and clip add extracted barcodes and UMIs to the read headers and reformat read headers to fulfill read mappers requirements. Indeed most read mappers (bowtie, bwa…) expect headers for read_1 and read_2 to be strictly identical. After mapping, markdupes identifies PCR (and optical) read duplicates based on their mapping positions and UMIs found in read headers.
Data Analysis
Analysis of (multiplexed) 4C sequencing data:
The package provides a pipeline to detect specific interactions between DNA elements and identify differential interactions between conditions. The statistical analysis in R starts with individual bam files for each sample as inputs. To obtain these files, the package contains a python script (extdata/python/demultiplex.py) to demultiplex libraries and trim off primer sequences. With a standard alignment software the required bam files can be then be generated. Author: Felix A. Klein, EMBL Heidelberg Maintainer: Felix A. Klein <felix.klein@embl.de>Count summarization and normalization for RNA-Seq data
Calculates the coverage of high-throughput short-reads against a genome of reference and summarizes it per feature of interest (e.g. exon, gene, transcript). The data can be normalized as ‘RPKM’ or by the ‘DESeq’ or ‘edgeR’ package. Author: Nicolas Delhomme, Ismael Padioleau, Bastian Schiffthaler, Niklas Maehler Maintainer: Nicolas DelhommeDelhomme N, Padioleau I, Furlong EE and Steinmetz LM (2012). “easyRNASeq: a Bioconductor package for processing RNA-Seq data.” Bioinformatics, in press, pp. in press.
Data Visualisation