Learn about LBFextract ====================== Introduction ------------ LBFextract is a Python package for extracting features for all genomic intervals described in a Browser Extensible Data (BED) file or multiple BED files, from a Binary Alignment Map (BAM) file and identifying condition-specific or cluster-specific differentially active Transcription Factors (TF). It focuses on liquid biopsy related features, transcription factor binding sites (TFBSs) and Transcription Start Sites (TSSs), but can be generalized to any kind of genomic intervals with similar properties. The package is built as a plugin interface, in which each plugin is a feature. It is composed by a core package, which contains the main logic, and a set of plugins, which represent the features extraction methods. The core package (lbfextract) describes the workflow and how different hooks will be executed to extract the features. The plugins implement the hooks. Default coverage-based and fragmentomics-based feature extraction methods are provided as lbfextract subpackages. The following feature extraction methods are available: - coverage - coverage-in-batch - central-60b (Peter Ulz coverage) - sliding-window-coverage - sliding-window-coverage-in-batch - wps-coverage - coverage-around-dyads - coverage-around-dyads-in-batch - middle-point-coverage - middle-point-coverage-in-batch - middle-n-points-coverage - middle-n-points-coverage-in-batch - entropy - entropy-in-batch - fragment-length-distribution ( per position ) - fragment-length-distribution-in-batch ( per position ) - fragment-length-ratios ( per position ) - relative-entropy-to-flanking - relative-entropy-to-flanking-in-batch - extract-signal .. image:: _static/LBF_structure.png :alt: LBF hook system and plugins architecture These feature extraction methods are implemented as plugins that overwrite a specific hook in lbfextract workflow. The current available hooks that can be implemented by plugins are: * ***fetch_reads***: extract the feature from a bam file * ***load_reads***: load reads in case they were already extracted * ***save_fetched_reads***: save the fetched reads specific to the regions of interest * ***transform_reads***: apply a transformation to each read extracted * ***transform_single_intervals***: extract the signal of one region * ***transform_all_intervals***: apply a transformation which requires all the regions * ***plot_signal***: plot the final signal * ***save_signal***: save the final signal LBFextract provides also CLIhooks which, if provided, allow the automatic integration of all the plugins with LBFextract Command Line Interface (CLI) and Terminal User Interface (TUI). installation ------------ For the installation of LBFextract, the following is required: - python>=3.10 - conda - setuptools~=62.0.0 LBFextract uses conda to create a separate environment for dependencies, which are not Python related ( samtools ). To be able to run the tests, the following Python package is also required: - pytest~=8.1.1 LBFextract can be installed as follows: .. code-block:: bash git clone https://github.com/Isy89/LBF.git && cd LBF python -m pip install . After the installation, the command line interface `lbfextract` should be available. Using it, a conda environment isolated from the current one containing samtools need to be created. The installation of this conda env can be done as follows: .. code-block:: bash lbfextract setup create-conda-envs # creates a separate conda env used for filtering the bam files and other steps Singularity Image isntallation ------------------------------- To install LBFextract using the Singularity image, the following steps are required: .. code-block:: bash singularity pull lbfextract_v0.1.0a1.sif library://lbfextract/lbfextract/lbfextract_v0.1.0a1.sif:0.1.0a1 singularity run lbfextract_v0.1.0a1.sif --help Using the run command you will have access to the lbfextract command line interface. When using the singularity image it may be necessary to bind the directory containing the BAM files and BED files and the output directory to the singularity container. This can be done using the following command: .. code-block:: bash singularity run --bind /path/to/data_bam:/data_bam --bind /path/to/data_bed:/data_bed --bind /path/to/output_dir:/output_dir lbfextract_v0.1.0a1.sif --help example: .. code-block:: bash singularity run --bind /path/to/data_bam:/data_bam --bind /path/to/data_bed:/data_bed --bind /path/to/output_dir:/output_dir lbfextract_v0.1.0a1.sif feature_extraction_commands extract-coverage --path_to_bam /data_bam/example.bam --path_to_bed /data_bed/example.bed --output_path /output_dir Coming Soon: Installation via pip (PyPI) ----------------------------------------- We are currently working on making LBFextract installable directly from the Python Package Index (PyPI) using pip. This feature will allow for easier installation and distribution across different platforms. Stay tuned for updates on when this feature will be available. In the meantime, please refer to the installation instructions provided above. Computational requirements -------------------------- - **Operating System:** Linux, macOS - **Memory:** 8 GB RAM or more depending on the number of BED files used, the number of genomic intervals per BED file and length of the genomic intervals used including the flanking regions In the following tables, we provide a reference for the peak of memory usage and time required for the analysis of a 20x sample using different feature extraction methods and varying the number of BED files and the number of genomic intervals used. For this analysis, 8 cores were used and the length of a genomic interval was kept equal to 4000 bp. .. image:: _static/computational_requirements.png :alt: computational requirements for different feature extraction methods varying the number of BED files or the the number of genomic intervals per BED file usage ----- LBFextract can be used through the command line interface (CLI), through the terminal user interface (TUI) or through the python API. The CLI offers four major set of commands: 1. feature_extraction_commands 2. post_extraction_analysis_commands 3. setup 4. start-tui The first set of commands are used to extract the features from the bam file. The second set of commands are used to analyze the extracted features. The third set of commands are used to setup the conda environments required for the features present in LBFextract to work. The fourth command is used to start the TUI interface. Paper/Citation -------------- If you want to have a look at the paper, you can find it `here `_. If you use LBFextract in your research, please cite the following paper: :: @article{LAZZERI20243163, title = {LBFextract: Unveiling transcription factor dynamics from liquid biopsy data}, journal = {Computational and Structural Biotechnology Journal}, volume = {23}, pages = {3163-3174}, year = {2024}, issn = {2001-0370}, doi = {https://doi.org/10.1016/j.csbj.2024.08.007}, url = {https://www.sciencedirect.com/science/article/pii/S200103702400268X}, author = {Isaac Lazzeri and Benjamin Gernot Spiegl and Samantha O. Hasenleithner and Michael R. Speicher and Martin Kircher}, keywords = {Cell-free DNA, Bioinformatics, Whole-genome sequencing, Transcription factors, Fragmentomics}, abstract = {Motivation The analysis of circulating cell-free DNA (cfDNA) holds immense promise as a non-invasive diagnostic tool across various human conditions. However, extracting biological insights from cfDNA fragments entails navigating complex and diverse bioinformatics methods, encompassing not only DNA sequence variation, but also epigenetic characteristics like nucleosome footprints, fragment length, and methylation patterns. Results We introduce Liquid Biopsy Feature extract (LBFextract), a comprehensive package designed to streamline feature extraction from cfDNA sequencing data, with the aim of enhancing the reproducibility and comparability of liquid biopsy studies. LBFextract facilitates the integration of preprocessing and postprocessing steps through alignment fragment tags and a hook mechanism. It incorporates various methods, including coverage-based and fragment length-based approaches, alongside two novel feature extraction methods: an entropy-based method to infer TF activity from fragmentomics data and a technique to amplify signals from nucleosome dyads. Additionally, it implements a method to extract condition-specific differentially active TFs based on these features for biomarker discovery. We demonstrate the use of LBFextract for the subtype classification of advanced prostate cancer patients using coverage signals at transcription factor binding sites from cfDNA. We show that LBFextract can generate robust and interpretable features that can discriminate between different clinical groups. LBFextract is a versatile and user-friendly package that can facilitate the analysis and interpretation of liquid biopsy data. Data and Code Availability and Implementation LBFextract is freely accessible at https://github.com/Isy89/LBF. It is implemented in Python and compatible with Linux and Mac operating systems. Code and data to reproduce these analyses have been uploaded to 10.5281/zenodo.10964406.} }