fextract package

Module contents

Fextract module

In the fextract module, the package implements default coverage-based features.

These are the default coverage features implemented:

*fragment coverage*. It takes into account all position of a fragment going from its left most to its right most position.

*midpoint coverage*. It takes into account only the central position of the fragment.

*middle-n points coverage*. It takes into account n positions from the left and right of the midpoint of a fragment.

*coverage around dyads*. It uses only inferred positions of the fragment, at which the dyads may be located

*sliding coverage*. It uses all positions of each fragment from its left most position to its right most position.

*Peter-Ulz (central-60bp) coverage*. It uses only the positions [53, 113) and [-113, -53) positions of reads in a pair

*windowed protection score (WPS)*. Number of positions derived by fragments completely overlapping a window around a given position. In lbfextract we furhter normalized it based on coverage at each position

More about this topic can be found in the article: LBFextract: unveiling transcription factor dynamics from liquid biopsy data, which can be found here <link_to_article>.

Subpackages

Submodules

fextract.autocompletion module

fextract.cli module

fextract.cli_lib module

cli_lib module

In this module the commands for the feature extraction methods based on coverage are described. The following commands are included:

*extract_signal*:
allows the combination of different packages, but requires the specification of the classes and configurations required by the different hooks.
* extract_coverage*:
extract the coverage from all the genomic intervals contains in a BED file and report this summarized using a summarization method like mean or median at each position.
* extract_wps_coverage*:
extract the windowed protection score over multiple genomic intervals, summarizes the signal and returns the normalized aggregated wps score per position
* extract_middle_point_coverage*:
extracts midpoint coverage at each position of multiple genomic intervals contained in a BED file and summarizes results per position.
* extract_middle_n_points_coverage*:
extract the n points from the left and right of the middle point of each fragment and calculate coverage for each genomic interval in a BED file taking only these position for each fragment into account
* extract_sliding_window_coverage*:
extract a windowed average of the coverage calculating at each position the average of the coverage values of the n-th following positions.
* extract_peter_ulz_coverage (central-60 coverage)*:
calculates central 60 base read coverage using the same coordinates utilized by Peter Ulz et al in its Nature communication
* extract_coverage_dyads*:
extract the coverage considering only the position of each fragment coming from where the dyads are probably located.

class lbfextract.fextract.cli_lib.CliHook[source]

Bases: object

get_command() → Command[source]

class lbfextract.fextract.cli_lib.CliHookExtractCoverage[source]

Bases: object

get_command() → Command | List[Command][source]

lbfextract.fextract.cli_lib.calculate_reference_distribution(path_to_sample, min_length, max_length, chr, start, end)[source]

lbfextract.fextract.cli_lib.get_peaks(distribution, height=0.2, distance=100)[source]

lbfextract.fextract.cli_lib.load_checks(ctx, param, value: tuple)[source]

lbfextract.fextract.cli_lib.open_yml(ctx, param, value) → dict[source]

fextract.core module

fextract.feature_extractor module

fextract.generate_plugin_structure module

fextract.hookspecs module

fextract.lib module

class lbfextract.fextract.lib.FextractHooks[source]

Bases: object

fetch_reads(path_to_bam: Path, path_to_bed: Path, config: ReadFetcherConfig, extra_config: AppExtraConfig) → DataFrame[source]

load_fetched_reads(config: Config, extra_config: AppExtraConfig) → DataFrame[source]

plot_signal(signal: Signal, config: Any, extra_config: AppExtraConfig) → Figure[source]

save_fetched_reads(reads_per_interval_container: DataFrame, config: Config, extra_config: AppExtraConfig) → Path[source]

Hook implementing the strategy to save the reads fetched for the intervals :param reads_per_interval_container: ReadsPerIntervalContainer containing information about the genomic region

and the reads mapping to it

Parameters:: extra_config – AppExtraConfig containing the output path
Returns:: None

save_signal(signal: Signal, config: Any, extra_config: AppExtraConfig) → Path[source]

transform_all_intervals(single_intervals_transformed_reads: Signal, config: SignalSummarizer, extra_config: AppExtraConfig) → Signal[source]

transform_reads(reads_per_interval_container: DataFrame, config: Config, extra_config: AppExtraConfig) → DataFrame[source]

transform_single_intervals(transformed_reads: DataFrame, config: SingleSignalTransformerConfig, extra_config: AppExtraConfig) → Signal[source]

fextract.pluggin_manager module

fextract.schemas module

class lbfextract.fextract.schemas.AppExtraConfig(config_dict: dict | None = None)[source]

Bases: Config

cores: int = None

ctx: dict = None

class lbfextract.fextract.schemas.Config(config_dict: dict | None = None)[source]

Bases: object

schema = <Schema({}, extra=ALLOW_EXTRA, required=False) object>

to_dict()[source]

exception lbfextract.fextract.schemas.LbfextractInvalidConfigError(class_name: str, x: dict, schema: Schema)[source]: Bases: ValueError

class lbfextract.fextract.schemas.ReadFetcherConfig(config_dict: dict | None = None)[source]

Bases: Config

F: int = None

cores: int = None

extra_bases: int = None

f: int = None

flanking_region_window: int = None

n_binding_sites: int = None

schema = <Schema({'window': Coerce(int, msg='window should be a integer'), 'flanking_region_window': Coerce(int, msg='flanking_region_window should be a integer'), 'extra_bases': Coerce(int, msg='extra_bases should be a integer'), 'n_binding_sites': Coerce(int, msg='n_binding_sites should be a integer'), 'cores': Coerce(int, msg='cores should be a integer'), 'f': Coerce(int, msg='f should be an integer representing the samtools flag to be used to include reads.'), 'F': Coerce(int, msg='F should be an integer representing the samtools flag to be used to exclude reads.')}, extra=PREVENT_EXTRA, required=False) object>

window: int = None

class lbfextract.fextract.schemas.SignalSummarizer(config_dict: dict | None = None)[source]

Bases: Config

bedfile: Path = None

schema = <Schema({'bed_file': Coerce(Path, msg='bedfile should be a pathlib.Path'), 'summarization_method': In(['mean', 'median', 'max', 'min', 'skip'])}, extra=PREVENT_EXTRA, required=False) object>

summarization_method: str = None

class lbfextract.fextract.schemas.SingleSignalTransformerConfig(config_dict: dict | None = None)[source]

Bases: Config

flip_based_on_strand = None

gc_correction = None

max_fragment_length = None

min_fragment_length = None

n = None

peaks = None

possible_signal_transformers = {'coverage', 'coverage_dyads', 'middle_n_points_coverage', 'middle_point_coverage', 'peter_ulz_coverage', 'sliding_window_coverage', 'wps_coverage'}

read_end = None

read_start = None

schema = <Schema({'n': Coerce(int, msg='n should be a integer'), 'window_size': Coerce(int, msg='flanking_window should be a integer'), 'signal_transformer': In({'coverage_dyads', 'peter_ulz_coverage', 'coverage', 'sliding_window_coverage', 'middle_n_points_coverage', 'wps_coverage', 'middle_point_coverage'}), 'flip_based_on_strand': Coerce(bool, msg='flip_based_on_strand should be a boolean'), 'gc_correction': Coerce(bool, msg='whether gc correction should be performed or not'), 'tag': Coerce(str, msg='the bam file tag to be used to extract the gc coefficient from each read'), 'read_start': Coerce(int, msg='the start of the region to used of a read'), 'read_end': Coerce(int, msg='the end of the region to used of a read'), 'peaks': Coerce(list, msg='peaks should be a boolean'), 'max_fragment_length': Coerce(int, msg='max_fragment_length should be a integer'), 'min_fragment_length': Coerce(int, msg='min_fragment_length should be a integer')}, extra=PREVENT_EXTRA, required=False) object>

signal_transformer = None

tag = None

window_size = None

fextract.setup_conda_env module

fextract.signal_transformer module

class lbfextract.fextract.signal_transformer.FragmentLengthDistribution(min_fragment_length=100, max_fragment_length=400, gc_correction: bool = False, tag: str = None)[source]: Bases: object

class lbfextract.fextract.signal_transformer.GenomiIntervalDataFrameRow(Start, End, Chromosome, reads_per_interval)[source]

Bases: NamedTuple

Chromosome: str: Alias for field number 2

End: int: Alias for field number 1

Start: int: Alias for field number 0

reads_per_interval: Iterator[AlignedSegment]: Alias for field number 3

class lbfextract.fextract.signal_transformer.PeterUlzCoverage(gc_correction: bool, tag: str, read_start: int = 53, read_end: int = 113)[source]: Bases: object

class lbfextract.fextract.signal_transformer.TFBSCoverage(gc_correction: bool = False, tag: str = None)[source]

Bases: object

This class calculates the fragment coverage for a genomic interval and allow the possibility to correct for GC bias when a GC bias specific tag was added to each read in a BAM file.

class lbfextract.fextract.signal_transformer.TFBSCoverageAroundDyads(n=1, gc_correction: bool = False, tag: str = None, peaks: list = None)[source]

Bases: object

get_relative_start_end(read, start) → list[source]

class lbfextract.fextract.signal_transformer.TFBSMiddlePointCoverage(gc_correction: bool = False, tag: str = None)[source]: Bases: object

class lbfextract.fextract.signal_transformer.TFBSNmiddlePointCoverage(n=1, gc_correction: bool = False, tag: str = None)[source]: Bases: object

class lbfextract.fextract.signal_transformer.TFBSSlidingWindowCoverage(window_size: int, gc_correction: bool = False, tag: str = None)[source]: Bases: object

class lbfextract.fextract.signal_transformer.WPSCoverage(gc_correction: bool = False, tag: str = None, window_size: int = None, min_fragment_length: int = None, max_fragment_length: int = None)[source]

Bases: TFBSCoverage

get_minus_one_indices(relative_start, relative_end, region_length)[source]

fextract package

Module contents

Fextract module

Subpackages

Submodules

fextract.autocompletion module

fextract.cli module

fextract.cli_lib module

cli_lib module

fextract.core module

fextract.feature_extractor module

fextract.generate_plugin_structure module

fextract.hookspecs module

fextract.lib module

fextract.pluggin_manager module

fextract.schemas module

fextract.setup_conda_env module

fextract.signal_transformer module

fextract.utils module

fextract.utils_classes module