The following bioinformatics tools have been developed in the lab:
Je is a suite to handle barcoded fastq files with (or without) Unique Molecule Identifiers (UMIs) and filter read duplicates using these UMIs:
If you have barcodes and/or UMIs in your fastq files, you’ll most likely enjoy Je. Je currently offers 4 tools :
- demultiplex to demultiplex multi-samples fastq files which reads contain barcodes and UMIs (or not)
- demultiplex-illu to demultiplex fastq files according to associated index files (contain the sample encoding barcodes). Reads can additionally contain UMIs (inline)
- clip to remove UMIs contained in reads of fastq files that do not need sample demultiplexing
- markdupes to filter BAM files for read duplicates taking UMIs into account
In short, Je demultiplex, demultiplex-illu and clip add extracted barcodes and UMIs to the read headers and reformat read headers to fulfill read mappers requirements. Indeed most read mappers (bowtie, bwa…) expect headers for read_1 and read_2 to be strictly identical. After mapping, markdupes identifies PCR (and optical) read duplicates based on their mapping positions and UMIs found in read headers.