fextract package
Module contents
Fextract module
In the fextract module, the package implements default coverage-based features.
These are the default coverage features implemented:
*fragment coverage*. It takes into account all position of a fragment going from its left most to its right most position.
*midpoint coverage*. It takes into account only the central position of the fragment.
*middle-n points coverage*. It takes into account n positions from the left and right of the midpoint of a fragment.
*coverage around dyads*. It uses only inferred positions of the fragment, at which the dyads may be located
*sliding coverage*. It uses all positions of each fragment from its left most position to its right most position.
*Peter-Ulz (central-60bp) coverage*. It uses only the positions [53, 113) and [-113, -53) positions of reads in a pair
*windowed protection score (WPS)*. Number of positions derived by fragments completely overlapping a window around a given position. In lbfextract we furhter normalized it based on coverage at each position
More about this topic can be found in the article: LBFextract: unveiling transcription factor dynamics from liquid biopsy data, which can be found here <link_to_article>.
Subpackages
Submodules
fextract.autocompletion module
fextract.cli module
fextract.cli_lib module
cli_lib module
In this module the commands for the feature extraction methods based on coverage are described. The following commands are included:
- *extract_signal*:
allows the combination of different packages, but requires the specification of the classes and configurations required by the different hooks.
- * extract_coverage*:
extract the coverage from all the genomic intervals contains in a BED file and report this summarized using a summarization method like mean or median at each position.
- * extract_wps_coverage*:
extract the windowed protection score over multiple genomic intervals, summarizes the signal and returns the normalized aggregated wps score per position
- * extract_middle_point_coverage*:
extracts midpoint coverage at each position of multiple genomic intervals contained in a BED file and summarizes results per position.
- * extract_middle_n_points_coverage*:
extract the n points from the left and right of the middle point of each fragment and calculate coverage for each genomic interval in a BED file taking only these position for each fragment into account
- * extract_sliding_window_coverage*:
extract a windowed average of the coverage calculating at each position the average of the coverage values of the n-th following positions.
- * extract_peter_ulz_coverage (central-60 coverage)*:
calculates central 60 base read coverage using the same coordinates utilized by Peter Ulz et al in its Nature communication
- * extract_coverage_dyads*:
extract the coverage considering only the position of each fragment coming from where the dyads are probably located.
fextract.core module
fextract.feature_extractor module
fextract.generate_plugin_structure module
fextract.hookspecs module
fextract.lib module
- class lbfextract.fextract.lib.FextractHooks[source]
Bases:
object- fetch_reads(path_to_bam: Path, path_to_bed: Path, config: ReadFetcherConfig, extra_config: AppExtraConfig) DataFrame[source]
- load_fetched_reads(config: Config, extra_config: AppExtraConfig) DataFrame[source]
- plot_signal(signal: Signal, config: Any, extra_config: AppExtraConfig) Figure[source]
- save_fetched_reads(reads_per_interval_container: DataFrame, config: Config, extra_config: AppExtraConfig) Path[source]
Hook implementing the strategy to save the reads fetched for the intervals :param reads_per_interval_container: ReadsPerIntervalContainer containing information about the genomic region
and the reads mapping to it
- Parameters:
extra_config – AppExtraConfig containing the output path
- Returns:
None
- save_signal(signal: Signal, config: Any, extra_config: AppExtraConfig) Path[source]
- transform_all_intervals(single_intervals_transformed_reads: Signal, config: SignalSummarizer, extra_config: AppExtraConfig) Signal[source]
- transform_reads(reads_per_interval_container: DataFrame, config: Config, extra_config: AppExtraConfig) DataFrame[source]
- transform_single_intervals(transformed_reads: DataFrame, config: SingleSignalTransformerConfig, extra_config: AppExtraConfig) Signal[source]
fextract.pluggin_manager module
fextract.schemas module
- class lbfextract.fextract.schemas.AppExtraConfig(config_dict: dict | None = None)[source]
Bases:
Config- cores: int = None
- ctx: dict = None
- class lbfextract.fextract.schemas.Config(config_dict: dict | None = None)[source]
Bases:
object- schema = <Schema({}, extra=ALLOW_EXTRA, required=False) object>
- exception lbfextract.fextract.schemas.LbfextractInvalidConfigError(class_name: str, x: dict, schema: Schema)[source]
Bases:
ValueError
- class lbfextract.fextract.schemas.ReadFetcherConfig(config_dict: dict | None = None)[source]
Bases:
Config- F: int = None
- cores: int = None
- extra_bases: int = None
- f: int = None
- flanking_region_window: int = None
- n_binding_sites: int = None
- schema = <Schema({'window': Coerce(int, msg='window should be a integer'), 'flanking_region_window': Coerce(int, msg='flanking_region_window should be a integer'), 'extra_bases': Coerce(int, msg='extra_bases should be a integer'), 'n_binding_sites': Coerce(int, msg='n_binding_sites should be a integer'), 'cores': Coerce(int, msg='cores should be a integer'), 'f': Coerce(int, msg='f should be an integer representing the samtools flag to be used to include reads.'), 'F': Coerce(int, msg='F should be an integer representing the samtools flag to be used to exclude reads.')}, extra=PREVENT_EXTRA, required=False) object>
- window: int = None
- class lbfextract.fextract.schemas.SignalSummarizer(config_dict: dict | None = None)[source]
Bases:
Config- bedfile: Path = None
- schema = <Schema({'bed_file': Coerce(Path, msg='bedfile should be a pathlib.Path'), 'summarization_method': In(['mean', 'median', 'max', 'min', 'skip'])}, extra=PREVENT_EXTRA, required=False) object>
- summarization_method: str = None
- class lbfextract.fextract.schemas.SingleSignalTransformerConfig(config_dict: dict | None = None)[source]
Bases:
Config- flip_based_on_strand = None
- gc_correction = None
- max_fragment_length = None
- min_fragment_length = None
- n = None
- peaks = None
- possible_signal_transformers = {'coverage', 'coverage_dyads', 'middle_n_points_coverage', 'middle_point_coverage', 'peter_ulz_coverage', 'sliding_window_coverage', 'wps_coverage'}
- read_end = None
- read_start = None
- schema = <Schema({'n': Coerce(int, msg='n should be a integer'), 'window_size': Coerce(int, msg='flanking_window should be a integer'), 'signal_transformer': In({'coverage_dyads', 'peter_ulz_coverage', 'coverage', 'sliding_window_coverage', 'middle_n_points_coverage', 'wps_coverage', 'middle_point_coverage'}), 'flip_based_on_strand': Coerce(bool, msg='flip_based_on_strand should be a boolean'), 'gc_correction': Coerce(bool, msg='whether gc correction should be performed or not'), 'tag': Coerce(str, msg='the bam file tag to be used to extract the gc coefficient from each read'), 'read_start': Coerce(int, msg='the start of the region to used of a read'), 'read_end': Coerce(int, msg='the end of the region to used of a read'), 'peaks': Coerce(list, msg='peaks should be a boolean'), 'max_fragment_length': Coerce(int, msg='max_fragment_length should be a integer'), 'min_fragment_length': Coerce(int, msg='min_fragment_length should be a integer')}, extra=PREVENT_EXTRA, required=False) object>
- signal_transformer = None
- tag = None
- window_size = None
fextract.setup_conda_env module
fextract.signal_transformer module
- class lbfextract.fextract.signal_transformer.FragmentLengthDistribution(min_fragment_length=100, max_fragment_length=400, gc_correction: bool = False, tag: str = None)[source]
Bases:
object
- class lbfextract.fextract.signal_transformer.GenomiIntervalDataFrameRow(Start, End, Chromosome, reads_per_interval)[source]
Bases:
NamedTuple- Chromosome: str
Alias for field number 2
- End: int
Alias for field number 1
- Start: int
Alias for field number 0
- reads_per_interval: Iterator[AlignedSegment]
Alias for field number 3
- class lbfextract.fextract.signal_transformer.PeterUlzCoverage(gc_correction: bool, tag: str, read_start: int = 53, read_end: int = 113)[source]
Bases:
object
- class lbfextract.fextract.signal_transformer.TFBSCoverage(gc_correction: bool = False, tag: str = None)[source]
Bases:
objectThis class calculates the fragment coverage for a genomic interval and allow the possibility to correct for GC bias when a GC bias specific tag was added to each read in a BAM file.
- class lbfextract.fextract.signal_transformer.TFBSCoverageAroundDyads(n=1, gc_correction: bool = False, tag: str = None, peaks: list = None)[source]
Bases:
object
- class lbfextract.fextract.signal_transformer.TFBSMiddlePointCoverage(gc_correction: bool = False, tag: str = None)[source]
Bases:
object
- class lbfextract.fextract.signal_transformer.TFBSNmiddlePointCoverage(n=1, gc_correction: bool = False, tag: str = None)[source]
Bases:
object
- class lbfextract.fextract.signal_transformer.TFBSSlidingWindowCoverage(window_size: int, gc_correction: bool = False, tag: str = None)[source]
Bases:
object
- class lbfextract.fextract.signal_transformer.WPSCoverage(gc_correction: bool = False, tag: str = None, window_size: int = None, min_fragment_length: int = None, max_fragment_length: int = None)[source]
Bases:
TFBSCoverage