fextract package

Module contents

Fextract module

In the fextract module, the package implements default coverage-based features.

These are the default coverage features implemented:

  • *fragment coverage*. It takes into account all position of a fragment going from its left most to its right most position.

  • *midpoint coverage*. It takes into account only the central position of the fragment.

  • *middle-n points coverage*. It takes into account n positions from the left and right of the midpoint of a fragment.

  • *coverage around dyads*. It uses only inferred positions of the fragment, at which the dyads may be located

  • *sliding coverage*. It uses all positions of each fragment from its left most position to its right most position.

  • *Peter-Ulz (central-60bp) coverage*. It uses only the positions [53, 113) and [-113, -53) positions of reads in a pair

  • *windowed protection score (WPS)*. Number of positions derived by fragments completely overlapping a window around a given position. In lbfextract we furhter normalized it based on coverage at each position

More about this topic can be found in the article: LBFextract: unveiling transcription factor dynamics from liquid biopsy data, which can be found here <link_to_article>.

Subpackages

Submodules

fextract.autocompletion module

fextract.cli module

fextract.cli_lib module

cli_lib module

In this module the commands for the feature extraction methods based on coverage are described. The following commands are included:

  • *extract_signal*:

    allows the combination of different packages, but requires the specification of the classes and configurations required by the different hooks.

  • * extract_coverage*:

    extract the coverage from all the genomic intervals contains in a BED file and report this summarized using a summarization method like mean or median at each position.

  • * extract_wps_coverage*:

    extract the windowed protection score over multiple genomic intervals, summarizes the signal and returns the normalized aggregated wps score per position

  • * extract_middle_point_coverage*:

    extracts midpoint coverage at each position of multiple genomic intervals contained in a BED file and summarizes results per position.

  • * extract_middle_n_points_coverage*:

    extract the n points from the left and right of the middle point of each fragment and calculate coverage for each genomic interval in a BED file taking only these position for each fragment into account

  • * extract_sliding_window_coverage*:

    extract a windowed average of the coverage calculating at each position the average of the coverage values of the n-th following positions.

  • * extract_peter_ulz_coverage (central-60 coverage)*:

    calculates central 60 base read coverage using the same coordinates utilized by Peter Ulz et al in its Nature communication

  • * extract_coverage_dyads*:

    extract the coverage considering only the position of each fragment coming from where the dyads are probably located.

class lbfextract.fextract.cli_lib.CliHook[source]

Bases: object

get_command() Command[source]
class lbfextract.fextract.cli_lib.CliHookExtractCoverage[source]

Bases: object

get_command() Command | List[Command][source]
lbfextract.fextract.cli_lib.calculate_reference_distribution(path_to_sample, min_length, max_length, chr, start, end)[source]
lbfextract.fextract.cli_lib.get_peaks(distribution, height=0.2, distance=100)[source]
lbfextract.fextract.cli_lib.load_checks(ctx, param, value: tuple)[source]
lbfextract.fextract.cli_lib.open_yml(ctx, param, value) dict[source]

fextract.core module

fextract.feature_extractor module

fextract.generate_plugin_structure module

fextract.hookspecs module

fextract.lib module

class lbfextract.fextract.lib.FextractHooks[source]

Bases: object

fetch_reads(path_to_bam: Path, path_to_bed: Path, config: ReadFetcherConfig, extra_config: AppExtraConfig) DataFrame[source]
load_fetched_reads(config: Config, extra_config: AppExtraConfig) DataFrame[source]
plot_signal(signal: Signal, config: Any, extra_config: AppExtraConfig) Figure[source]
save_fetched_reads(reads_per_interval_container: DataFrame, config: Config, extra_config: AppExtraConfig) Path[source]

Hook implementing the strategy to save the reads fetched for the intervals :param reads_per_interval_container: ReadsPerIntervalContainer containing information about the genomic region

and the reads mapping to it

Parameters:

extra_config – AppExtraConfig containing the output path

Returns:

None

save_signal(signal: Signal, config: Any, extra_config: AppExtraConfig) Path[source]
transform_all_intervals(single_intervals_transformed_reads: Signal, config: SignalSummarizer, extra_config: AppExtraConfig) Signal[source]
transform_reads(reads_per_interval_container: DataFrame, config: Config, extra_config: AppExtraConfig) DataFrame[source]
transform_single_intervals(transformed_reads: DataFrame, config: SingleSignalTransformerConfig, extra_config: AppExtraConfig) Signal[source]

fextract.pluggin_manager module

fextract.schemas module

class lbfextract.fextract.schemas.AppExtraConfig(config_dict: dict | None = None)[source]

Bases: Config

cores: int = None
ctx: dict = None
class lbfextract.fextract.schemas.Config(config_dict: dict | None = None)[source]

Bases: object

schema = <Schema({}, extra=ALLOW_EXTRA, required=False) object>
to_dict()[source]
exception lbfextract.fextract.schemas.LbfextractInvalidConfigError(class_name: str, x: dict, schema: Schema)[source]

Bases: ValueError

class lbfextract.fextract.schemas.ReadFetcherConfig(config_dict: dict | None = None)[source]

Bases: Config

F: int = None
cores: int = None
extra_bases: int = None
f: int = None
flanking_region_window: int = None
n_binding_sites: int = None
schema = <Schema({'window': Coerce(int, msg='window should be a integer'), 'flanking_region_window': Coerce(int, msg='flanking_region_window should be a integer'), 'extra_bases': Coerce(int, msg='extra_bases should be a integer'), 'n_binding_sites': Coerce(int, msg='n_binding_sites should be a integer'), 'cores': Coerce(int, msg='cores should be a integer'), 'f': Coerce(int, msg='f should be an integer representing the samtools flag to be used to include reads.'), 'F': Coerce(int, msg='F should be an integer representing the samtools flag to be used to exclude reads.')}, extra=PREVENT_EXTRA, required=False) object>
window: int = None
class lbfextract.fextract.schemas.SignalSummarizer(config_dict: dict | None = None)[source]

Bases: Config

bedfile: Path = None
schema = <Schema({'bed_file': Coerce(Path, msg='bedfile should be a pathlib.Path'), 'summarization_method': In(['mean', 'median', 'max', 'min', 'skip'])}, extra=PREVENT_EXTRA, required=False) object>
summarization_method: str = None
class lbfextract.fextract.schemas.SingleSignalTransformerConfig(config_dict: dict | None = None)[source]

Bases: Config

flip_based_on_strand = None
gc_correction = None
max_fragment_length = None
min_fragment_length = None
n = None
peaks = None
possible_signal_transformers = {'coverage', 'coverage_dyads', 'middle_n_points_coverage', 'middle_point_coverage', 'peter_ulz_coverage', 'sliding_window_coverage', 'wps_coverage'}
read_end = None
read_start = None
schema = <Schema({'n': Coerce(int, msg='n should be a integer'), 'window_size': Coerce(int, msg='flanking_window should be a integer'), 'signal_transformer': In({'coverage_dyads', 'peter_ulz_coverage', 'coverage', 'sliding_window_coverage', 'middle_n_points_coverage', 'wps_coverage', 'middle_point_coverage'}), 'flip_based_on_strand': Coerce(bool, msg='flip_based_on_strand should be a boolean'), 'gc_correction': Coerce(bool, msg='whether gc correction should be performed or not'), 'tag': Coerce(str, msg='the bam file tag to be used to extract the gc coefficient from each read'), 'read_start': Coerce(int, msg='the start of the region to used of a read'), 'read_end': Coerce(int, msg='the end of the region to used of a read'), 'peaks': Coerce(list, msg='peaks should be a boolean'), 'max_fragment_length': Coerce(int, msg='max_fragment_length should be a integer'), 'min_fragment_length': Coerce(int, msg='min_fragment_length should be a integer')}, extra=PREVENT_EXTRA, required=False) object>
signal_transformer = None
tag = None
window_size = None

fextract.setup_conda_env module

fextract.signal_transformer module

class lbfextract.fextract.signal_transformer.FragmentLengthDistribution(min_fragment_length=100, max_fragment_length=400, gc_correction: bool = False, tag: str = None)[source]

Bases: object

class lbfextract.fextract.signal_transformer.GenomiIntervalDataFrameRow(Start, End, Chromosome, reads_per_interval)[source]

Bases: NamedTuple

Chromosome: str

Alias for field number 2

End: int

Alias for field number 1

Start: int

Alias for field number 0

reads_per_interval: Iterator[AlignedSegment]

Alias for field number 3

class lbfextract.fextract.signal_transformer.PeterUlzCoverage(gc_correction: bool, tag: str, read_start: int = 53, read_end: int = 113)[source]

Bases: object

class lbfextract.fextract.signal_transformer.TFBSCoverage(gc_correction: bool = False, tag: str = None)[source]

Bases: object

This class calculates the fragment coverage for a genomic interval and allow the possibility to correct for GC bias when a GC bias specific tag was added to each read in a BAM file.

class lbfextract.fextract.signal_transformer.TFBSCoverageAroundDyads(n=1, gc_correction: bool = False, tag: str = None, peaks: list = None)[source]

Bases: object

get_relative_start_end(read, start) list[source]
class lbfextract.fextract.signal_transformer.TFBSMiddlePointCoverage(gc_correction: bool = False, tag: str = None)[source]

Bases: object

class lbfextract.fextract.signal_transformer.TFBSNmiddlePointCoverage(n=1, gc_correction: bool = False, tag: str = None)[source]

Bases: object

class lbfextract.fextract.signal_transformer.TFBSSlidingWindowCoverage(window_size: int, gc_correction: bool = False, tag: str = None)[source]

Bases: object

class lbfextract.fextract.signal_transformer.WPSCoverage(gc_correction: bool = False, tag: str = None, window_size: int = None, min_fragment_length: int = None, max_fragment_length: int = None)[source]

Bases: TFBSCoverage

get_minus_one_indices(relative_start, relative_end, region_length)[source]

fextract.utils module

fextract.utils_classes module