wespipeline package¶
Submodules¶
wespipeline.align module¶
-
class
wespipeline.align.
BwaAlignFastq
(*args, **kwargs)¶ Bases:
luigi.contrib.external_program.ExternalProgramTask
Task used for aligning fastq files against the reference genome.
It requires the output of both the wespipeline.reference.ReferenceGenome and wespipeline.fastq.GetFastq higher level tasks in order to proceed with the alignment.
If
wespipeline.utils.GlobalParams.exp_name
is set, it will be used for giving name to the Sam file produced.- Parameters
none –
- Output:
A luigi.LocalTarget instance for the aligned sam file.
-
output
()¶ The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Target
or a list ofTarget
instances.- Implementation note
If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
-
program_args
()¶ Override this method to map your task parameters to the program arguments
- Returns
list to pass as
args
tosubprocess.Popen
-
requires
()¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
-
class
wespipeline.align.
FastqAlign
(*args, **kwargs)¶ Bases:
wespipeline.utils.MetaOutputHandler
,luigi.task.Task
Higher level task for the alignment of fastq files.
It is given preference to local files over processing the alignment in order to reduce computational overhead.
Alignment is done with the Bwa mem utility.
- Parameters
fastq1_local_file (str) – String indicating the location of a local Sam file for the alignment.
cpus (int) – Integer indicating the number of cpus that can be used for the alignment.
- Output:
A dict mapping keys to luigi.LocalTarget instances for each of the processed files. The following keys are available:
‘sam’ : Local file with the alignment.
-
cpus
= <luigi.parameter.Parameter object>¶
-
requires
()¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
-
run
()¶ The task run method, to be overridden in a subclass.
See Task.run
-
sam_local_file
= <luigi.parameter.Parameter object>¶
wespipeline.fastq module¶
-
class
wespipeline.fastq.
FastqcQualityCheck
(*args, **kwargs)¶ Bases:
luigi.contrib.external_program.ExternalProgramTask
Task used for creating a quality report on fastq files.
The report is created using the Fastqc utility, reulsting on an html report, an a zip folder containing more detailed information about the quality of the reads.
- Parameters
fastq_file (str) – Path for the fastq file to be analyzed.
- Output:
html (luigi.LocalTarget) : File containing the report for fastqc quality.
-
fastq_file
= <luigi.parameter.Parameter object>¶
-
output
()¶ The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Target
or a list ofTarget
instances.- Implementation note
If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
-
program_args
()¶ Override this method to map your task parameters to the program arguments
- Returns
list to pass as
args
tosubprocess.Popen
-
class
wespipeline.fastq.
GetFastq
(*args, **kwargs)¶ Bases:
wespipeline.utils.MetaOutputHandler
,luigi.task.Task
Higher level task for the retrieval of the experiment fastq files.
Three diferent sources for the fastq files are accepted: an existing local file, an NCBI accession number for the reads, and an external url indicating the location for the resources. The order in which the sources will be searched is the same as above: it is given preference to local files over external resources in order to reduce computational overhead, and NCBI accession number over external resources for reproducibility reasons.
- Parameters
fastq1_local_file (str) – String indicating the location of a local compressed fastq file.
fastq2_local_file (str) – String indicating the location of a local compressed fastq file.
fastq1_url (str) – Url indicating the location of the resource for the compressed fastq file.
fastq2_url (str) – Url indicating the location of the resource for the compressed fastq file.
paired_end (bool) – Non case sensitive boolean indicating wether the reads are paired_end.
compressed (bool) – Non case sensitive boolean indicating wether the reads are compressed.
create_report (bool) – A non case-sensitive boolean indicating wether to create a quality check report.
- Output:
A dict mapping keys to luigi.LocalTarget instances for each of the processed files. The following keys are available:
‘fastq1’ : Local file with the fastq file with the experiment’s reads. ‘fastq2’ : In case of paired end experiments, a local file with the fastq
file with the experiment’s reads.
-
accession_number
= <luigi.parameter.Parameter object>¶
-
compressed
= <luigi.parameter.BoolParameter object>¶
-
create_report
= <luigi.parameter.BoolParameter object>¶
-
fastq1_local_file
= <luigi.parameter.Parameter object>¶
-
fastq1_url
= <luigi.parameter.Parameter object>¶
-
fastq2_local_file
= <luigi.parameter.Parameter object>¶
-
fastq2_url
= <luigi.parameter.Parameter object>¶
-
paired_end
= <luigi.parameter.BoolParameter object>¶
-
requires
()¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
-
run
()¶ The task run method, to be overridden in a subclass.
See Task.run
-
class
wespipeline.fastq.
SraToolkitFastq
(*args, **kwargs)¶ Bases:
luigi.contrib.external_program.ExternalProgramTask
Task used for downloading fastq files from the NVBI archive.
In case of the reads to be paired end, the output will consist of two separate fastq files.
- The output file(s) will have for name the accession number and,
in the case of paired end reads, a suffix identifying each of the two fastq.
- Parameters
accession_number (str) – NCBI accession number for the experiment.
paired_end (bool) – Non case sensitive boolean indicating wether the reads are paired_end.
- Output:
A dict mapping keys to luigi.LocalTarget instances for each of the processed files. The following keys are available:
‘fastq1’ : Local file with the fastq file with the experiment’s reads. ‘fastq2’ : In case of paired end experiments, a local file with the fastq
file with the experiment’s reads.
-
accession_number
= <luigi.parameter.Parameter object>¶
-
output
()¶ The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Target
or a list ofTarget
instances.- Implementation note
If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
-
paired_end
= <luigi.parameter.BoolParameter object>¶
-
program_args
()¶ Override this method to map your task parameters to the program arguments
- Returns
list to pass as
args
tosubprocess.Popen
-
class
wespipeline.fastq.
UncompressFastqgz
(*args, **kwargs)¶ Bases:
luigi.task.Task
Task for uncompressing fastq files.
The task uses utils.UncompressFile for uncompressing into fastq. If both fastq_local_file and fastq_url are set, the local file will have preference; thus reducing the overhead in the process.
- Parameters
fastq_local_file (str) – String indicating the location of a local compressed fastq file.
fastq_url (str) – Url indicating the location of the resource for the compressed fastq file.
output_file (str) – String indicating the desired location and name the output uncompressed fastq file.
- Output:
A luigi.LocalTarget instance for the uncompressed fastq file.
-
fastq_local_file
= <luigi.parameter.Parameter object>¶
-
fastq_url
= <luigi.parameter.Parameter object>¶
-
output
()¶ The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Target
or a list ofTarget
instances.- Implementation note
If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
-
output_file
= <luigi.parameter.Parameter object>¶
-
requires
()¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
-
run
()¶ The task run method, to be overridden in a subclass.
See Task.run
wespipeline.processalign module¶
-
class
wespipeline.processalign.
AlignProcessing
(*args, **kwargs)¶ Bases:
wespipeline.utils.MetaOutputHandler
,luigi.task.Task
Higher level task for the alignment of fastq files.
It is given preference to local files over processing the alignment in order to reduce computational overhead.
If the bam and bai local files are set, they will be used instead of the
Alignment is done with the Bwa mem utility.
- Parameters
bam_local_file (str) – String indicating the location of a local bam file with the sorted alignment. If set, this file will not be created.
bai_local_file (str) – String indicating the location of a local bai file with the index for the alignment. If set, this file will not be created.
no_dup_bam_local_file (str) – String indicating the location of a local sam file without the duplicates. If set, this file will not be created.
no_dup_bai_local_file (str) – String indicating the location of a local file with the index for the bam file without duplicates. If set, this file will not be created.
cpus (int) – Integer indicating the number of cpus that can be used for the alignment.
- Output:
A dict mapping keys to luigi.LocalTarget instances for each of the processed files. The following keys are available:
‘bam’ : Local file with the sorted alignment. ‘bai’ : Local file with the alignment index. ‘bamNoDup’ : Local sorted file with duplicates removed. ‘indexNoDup’ : Local file with the index for sorted alignment without duplicates.
-
bai_local_file
= <luigi.parameter.Parameter object>¶
-
bam_local_file
= <luigi.parameter.Parameter object>¶
-
cpus
= <luigi.parameter.IntParameter object>¶
-
no_dup_bai_local_file
= <luigi.parameter.Parameter object>¶
-
no_dup_bam_local_file
= <luigi.parameter.Parameter object>¶
-
requires
()¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
-
run
()¶ The task run method, to be overridden in a subclass.
See Task.run
-
class
wespipeline.processalign.
IndexBam
(*args, **kwargs)¶ Bases:
luigi.contrib.external_program.ExternalProgramTask
Task used for indexing the Bam file.
The
wespipeline.utils.GlobalParams.exp_name
will be used for giving name to the Bai file produced.- Parameters
none –
- Output:
A luigi.LocalTarget instance for the index Bai file.
-
output
()¶ The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Target
or a list ofTarget
instances.- Implementation note
If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
-
program_args
()¶ Override this method to map your task parameters to the program arguments
- Returns
list to pass as
args
tosubprocess.Popen
-
requires
()¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
-
class
wespipeline.processalign.
IndexNoDup
(*args, **kwargs)¶ Bases:
luigi.contrib.external_program.ExternalProgramTask
Task used for indexing the Bam file without duplicates.
The
wespipeline.utils.GlobalParams.exp_name
will be used for giving name to the Bai file produced.- Parameters
none –
- Output:
A luigi.LocalTarget instance for the index Bai file.
-
output
()¶ The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Target
or a list ofTarget
instances.- Implementation note
If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
-
program_args
()¶ Override this method to map your task parameters to the program arguments
- Returns
list to pass as
args
tosubprocess.Popen
-
requires
()¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
-
class
wespipeline.processalign.
PicardMarkDuplicates
(*args, **kwargs)¶ Bases:
luigi.contrib.external_program.ExternalProgramTask
Task used for removing duplicates from the Bam file.
The
wespipeline.utils.GlobalParams.exp_name
will be used for giving name to the Bam file produced.- Parameters
none –
- Output:
A luigi.LocalTarget instance for the Bam file without the duplicates.
-
output
()¶ The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Target
or a list ofTarget
instances.- Implementation note
If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
-
program_args
()¶ Override this method to map your task parameters to the program arguments
- Returns
list to pass as
args
tosubprocess.Popen
-
requires
()¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
-
class
wespipeline.processalign.
SortSam
(*args, **kwargs)¶ Bases:
luigi.contrib.external_program.ExternalProgramTask
Task used for sorting the alignment sam file.
It requires the output of the
wespipeline.reference.FastqAlign
step.The
wespipeline.utils.GlobalParams.exp_name
will be used for giving name to the Bam file produced.- Parameters
none –
- Output:
A luigi.LocalTarget instance for the sorted Sam Bam file.
-
output
()¶ The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Target
or a list ofTarget
instances.- Implementation note
If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
-
program_args
()¶ Override this method to map your task parameters to the program arguments
- Returns
list to pass as
args
tosubprocess.Popen
-
requires
()¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
wespipeline.reference module¶
-
class
wespipeline.reference.
BwaIndex
(*args, **kwargs)¶ Bases:
luigi.contrib.external_program.ExternalProgramTask
Task user for indexing the reference genome .fa file with the bwa index utility.
Aligning the reference genome helps reducing access time drastically.
- Parameters
None –
- Output:
A set of five files are result of indexing the reference genome. The extensions for each of the files are ‘.amb’, ‘.ann’, ‘.bwt’, ‘.pac’, ‘.sa’.
-
output
()¶ The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Target
or a list ofTarget
instances.- Implementation note
If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
-
program_args
()¶ Override this method to map your task parameters to the program arguments
- Returns
list to pass as
args
tosubprocess.Popen
-
requires
()¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
-
class
wespipeline.reference.
FaidxIndex
(*args, **kwargs)¶ Bases:
luigi.contrib.external_program.ExternalProgramTask
Task user for indexing the reference genome .fa file with the samtools faidx utility.
Aligning the reference genome helps reducing access time drastically.
- Parameters
None –
- Output:
A luigi.LocalTarget for the .fai index file for the reference genome .
-
output
()¶ The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Target
or a list ofTarget
instances.- Implementation note
If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
-
program_args
()¶ Override this method to map your task parameters to the program arguments
- Returns
list to pass as
args
tosubprocess.Popen
-
requires
()¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
-
class
wespipeline.reference.
GetProgram
(*args, **kwargs)¶ Bases:
luigi.contrib.external_program.ExternalProgramTask
Task user for downloading and giving execution permissions to the 2bit program.
The task gives execute permissions to the conversion utility for 2bit files to be converted to fa files which can then be used for aligning the sequences.
The source for the program is ftp://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/twoBitToFa.
- Parameters
none –
- Output:
A luigi.LocalTarget for the executable.
-
output
()¶ The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Target
or a list ofTarget
instances.- Implementation note
If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
-
program_args
()¶ Override this method to map your task parameters to the program arguments
- Returns
list to pass as
args
tosubprocess.Popen
-
requires
()¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
-
class
wespipeline.reference.
GetReferenceFa
(*args, **kwargs)¶ Bases:
wespipeline.utils.MetaOutputHandler
,luigi.task.WrapperTask
Task user for obtaining the reference genome .fa file.
This task will retrieve an external genome or use a provided local one, and convert it from 2bit format to .fa if neccessary.
- Parameters
ref_url (str) – Url for the resource with the reference genome.
reference_local_file (str) – Path for the reference genome 2bit file. If given the
ref_url
parameter will be ignored.from2bit (bool) – Non case sensitive boolean indicating wether the reference genome if in 2bit format. Defaults to false.
- Output:
A luigi.LocalTarget for the reference genome fa file.
-
from2bit
= <luigi.parameter.BoolParameter object>¶
-
ref_url
= <luigi.parameter.Parameter object>¶
-
reference_local_file
= <luigi.parameter.Parameter object>¶
-
requires
()¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
-
class
wespipeline.reference.
PicardDict
(*args, **kwargs)¶ Bases:
luigi.contrib.external_program.ExternalProgramTask
Task user for creating a dict file with the reference genome .fa file with the picard utility.
- Parameters
None –
- Output:
A luigi.LocalTarget for the .fai index file for the reference genome .
-
output
()¶ The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Target
or a list ofTarget
instances.- Implementation note
If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
-
program_args
()¶ Override this method to map your task parameters to the program arguments
- Returns
list to pass as
args
tosubprocess.Popen
-
requires
()¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
-
class
wespipeline.reference.
ReferenceGenome
(*args, **kwargs)¶ Bases:
wespipeline.utils.MetaOutputHandler
,luigi.task.Task
Higher level task for retrieving the reference genome.
It is given preference to local files over downloading the reference. However the indexing of the reference genome is always done using
GloablParams.exp_name
andGlobalParams.base_dir
for determining filenames and location for newer files respectively.The indexing is done using both Samtools and Bwa toolkits.
- Parameters
reference_local_file (str) – Optional string indicating the location for the reference genome. If set, it will not be downloaded.
ref_url (str) – Url for the download of the reference genome.
from2bit (bool) – A boolean [True, False] indicating whether the reference genome must be converted from 2bit.
- Output:
A dict mapping keys to luigi.LocalTarget instances for each of the processed files. The following keys are available:
‘faidx’ : Local file with the index, result of indexing with Samtools. ‘bwa’ : Set of five files, result of indexing the reference genome with Bwa. ‘fa’ : Local file with the reference genome.
-
from2bit
= <luigi.parameter.BoolParameter object>¶
-
ref_url
= <luigi.parameter.Parameter object>¶
-
reference_local_file
= <luigi.parameter.Parameter object>¶
-
requires
()¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
-
run
()¶ The task run method, to be overridden in a subclass.
See Task.run
-
class
wespipeline.reference.
TwoBitToFa
(*args, **kwargs)¶ Bases:
luigi.contrib.external_program.ExternalProgramTask
Task user for Converting 2bit files to the fa format.
The task will use a local executable or require the task for obtaining it, and use with the reference genome.
- Parameters
ref_url (str) – Url for the resource with the reference genome.
reference_local_file (str) – Path for the reference genome 2bit file. If given the
ref_url
parameter will be ignored.
- Output:
A luigi.LocalTarget for the reference genome fa file.
-
output
()¶ The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Target
or a list ofTarget
instances.- Implementation note
If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
-
program_args
()¶ Override this method to map your task parameters to the program arguments
- Returns
list to pass as
args
tosubprocess.Popen
-
ref_url
= <luigi.parameter.Parameter object>¶
-
reference_local_file
= <luigi.parameter.Parameter object>¶
-
requires
()¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
wespipeline.utils module¶
-
class
wespipeline.utils.
GlobalParams
(*args, **kwargs)¶ Bases:
luigi.task.Config
Task used for specifying globally accessible parameters.
Parameters defined in this class are task independent and should mantain low.
- Parameters
exp_name (str) – Name for the experiment. Useful for defining file names.
log_dir (str) – Absolute path for the logs of the application.
base_dir (str) – Absolute path to the directory where files are expected to appear if not specifyed differently.
-
base_dir
= <luigi.parameter.Parameter object>¶
-
exp_name
= <luigi.parameter.Parameter object>¶
-
log_dir
= <luigi.parameter.Parameter object>¶
-
class
wespipeline.utils.
GunzipFile
(*args, **kwargs)¶ Bases:
luigi.contrib.external_program.ExternalProgramTask
Task for unzipping compressed files.
Gunzip will allways do the process inplace, deleting the extension.
- Parameters
input_file (str) – Absolute path to the compressed file.
-
input_file
= <luigi.parameter.Parameter object>¶
-
output
()¶ The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Target
or a list ofTarget
instances.- Implementation note
If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
-
program_args
()¶ Override this method to map your task parameters to the program arguments
- Returns
list to pass as
args
tosubprocess.Popen
-
requires
()¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
-
class
wespipeline.utils.
LocalFile
(*args, **kwargs)¶ Bases:
luigi.task.Task
Helper task for making.
No extra processing is done in the task. It allows to make tasks dependent on files using the same strategy as with other tasks.
- Parameters
file (str) – Absolute path to the file to be tested.
-
file
= <luigi.parameter.Parameter object>¶
-
output
()¶ The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Target
or a list ofTarget
instances.- Implementation note
If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
-
run
()¶ The task run method, to be overridden in a subclass.
See Task.run
-
class
wespipeline.utils.
MetaOutputHandler
¶ Bases:
object
Helper class for propagating inputs in WrapperTasks
-
output
()¶
-
-
class
wespipeline.utils.
UncompressFile
(*args, **kwargs)¶ Bases:
luigi.contrib.external_program.ExternalProgramTask
Task for unzipping compressed files to a desired location.
Gunzip will allways do the process inplace, deleting the extension. This task allows to select the destination.
This operation
- Parameters
input_file (str) – Absolute path to the compressed file.
output_file (str) – Absolute path to the desired final location.
copy (bool) – Non case sensitive boolean indicating wether to copy or to move the file. Defaults to false.
-
copy
= <luigi.parameter.BoolParameter object>¶
-
input_file
= <luigi.parameter.Parameter object>¶
-
output
()¶ The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Target
or a list ofTarget
instances.- Implementation note
If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
-
output_file
= <luigi.parameter.Parameter object>¶
-
program_args
()¶ Override this method to map your task parameters to the program arguments
- Returns
list to pass as
args
tosubprocess.Popen
-
requires
()¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
-
class
wespipeline.utils.
Wget
(*args, **kwargs)¶ Bases:
luigi.contrib.external_program.ExternalProgramTask
Task for downloading files using the tool wget.
- Parameters
url (str) – Url indicating the location of the resource to be retreived.
output_file (str) – Absolute path for the destiny location of the retrived resource.
-
output
()¶ The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Target
or a list ofTarget
instances.- Implementation note
If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
-
output_file
= <luigi.parameter.Parameter object>¶
-
program_args
()¶ Override this method to map your task parameters to the program arguments
- Returns
list to pass as
args
tosubprocess.Popen
-
requires
()¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
-
url
= <luigi.parameter.Parameter object>¶
wespipeline.vcf module¶
-
class
wespipeline.vcf.
DeepvariantCallVariants
(*args, **kwargs)¶ Bases:
luigi.contrib.external_program.ExternalProgramTask
Task used for identifying varinats in the bam file provided using DeepVariant.
- Parameters
model_type (str) – A string defining the model to use for the variant calling. Valid options are [WGS,WES,PACBIO].
- Dependencies:
ReferenceGenome AlignProcessing
- Output:
A luigi.LocalTarget instance for the index vcf file.
-
output
()¶ The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Target
or a list ofTarget
instances.- Implementation note
If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
-
program_args
()¶ Override this method to map your task parameters to the program arguments
- Returns
list to pass as
args
tosubprocess.Popen
-
requires
()¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
-
class
wespipeline.vcf.
DeepvariantDockerTask
(*args, **kwargs)¶ Bases:
luigi.contrib.docker_runner.DockerTask
Task used for identifying varinats in the bam file provided using DeepVariant.
- Parameters
model_type (str) – A string defining the model to use for the variant calling. Valid options are [WGS,WES,PACBIO].
- Dependencies:
ReferenceGenome AlignProcessing
- Output:
A luigi.LocalTarget instance for the index vcf file.
-
BIN_VERSION
= '0.8.0'¶
-
property
binds
¶ Override this to mount local volumes, in addition to the /tmp/luigi which gets defined by default. This should return a list of strings. e.g. [‘/hostpath1:/containerpath1’, ‘/hostpath2:/containerpath2’]
-
property
command
¶
-
create_gvcf
= <luigi.parameter.BoolParameter object>¶
-
property
image
¶
-
model_type
= <luigi.parameter.Parameter object>¶
-
property
mount_tmp
¶
-
output
()¶ The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Target
or a list ofTarget
instances.- Implementation note
If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
-
requires
()¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
-
class
wespipeline.vcf.
DockerGatkCallVariants
(*args, **kwargs)¶ Bases:
luigi.contrib.docker_runner.DockerTask
Task used for identifying varinats in the bam file provided using DeepVariant.
- Parameters
model_type (str) – A string defining the model to use for the variant calling. Valid options are [WGS,WES,PACBIO].
- Dependencies:
ReferenceGenome AlignProcessing
- Output:
A luigi.LocalTarget instance for the index vcf file.
-
BIN_VERSION
= '0.8.0'¶
-
property
binds
¶ Override this to mount local volumes, in addition to the /tmp/luigi which gets defined by default. This should return a list of strings. e.g. [‘/hostpath1:/containerpath1’, ‘/hostpath2:/containerpath2’]
-
property
command
¶
-
property
image
¶
-
model_type
= <luigi.parameter.Parameter object>¶
-
property
mount_tmp
¶
-
output
()¶ The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Target
or a list ofTarget
instances.- Implementation note
If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
-
requires
()¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
-
class
wespipeline.vcf.
FreebayesCallVariants
(*args, **kwargs)¶ Bases:
luigi.contrib.external_program.ExternalProgramTask
Task used for identifying varinats in the bam file provided using Freebayes.
The
wespipeline.utils.GlobalParams.exp_name
will be used for giving name to the vcf produced.- Parameters
none –
- Dependencies:
ReferenceGenome AlignProcessing
- Output:
A luigi.LocalTarget instance for the index vcf file.
-
output
()¶ The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Target
or a list ofTarget
instances.- Implementation note
If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
-
program_args
()¶ Override this method to map your task parameters to the program arguments
- Returns
list to pass as
args
tosubprocess.Popen
-
requires
()¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
-
class
wespipeline.vcf.
GatkCallVariants
(*args, **kwargs)¶ Bases:
luigi.contrib.external_program.ExternalProgramTask
Task used for identifying varinats in the bam file provided using GatkCallVariants.
The
wespipeline.utils.GlobalParams.exp_name
will be used for giving name to the vcf produced.- Parameters
none –
- Dependencies:
ReferenceGenome AlignProcessing
- Output:
A luigi.LocalTarget instance for the index vcf file.
-
output
()¶ The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Target
or a list ofTarget
instances.- Implementation note
If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
-
program_args
()¶ Override this method to map your task parameters to the program arguments
- Returns
list to pass as
args
tosubprocess.Popen
-
requires
()¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
-
class
wespipeline.vcf.
PlatypusCallVariants
(*args, **kwargs)¶ Bases:
luigi.contrib.external_program.ExternalProgramTask
Task used for identifying varinats in the bam file provided using Platypus.
The
wespipeline.utils.GlobalParams.exp_name
will be used for giving name to the vcf produced.- Parameters
none –
- Dependencies:
ReferenceGenome AlignProcessing
- Output:
A luigi.LocalTarget instance for the index vcf file.
-
output
()¶ The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Target
or a list ofTarget
instances.- Implementation note
If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
-
program_args
()¶ Override this method to map your task parameters to the program arguments
- Returns
list to pass as
args
tosubprocess.Popen
-
requires
()¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
-
class
wespipeline.vcf.
VariantCalling
(*args, **kwargs)¶ Bases:
wespipeline.utils.MetaOutputHandler
,luigi.task.Task
Higher level task for the alignment of fastq files.
It is given preference to local files over processing the alignment in order to reduce computational overhead.
If the bam and bai local files are set, they will be used instead of the
Alignment is done with the Bwa mem utility.
- Parameters
use_platypus (bool) – A non-case sensitive boolean indicating wether to use Platypus for variant callign.
use_freebayes (bool) – A non-case sensitive boolean indicating wether to use Freebayesfor variant callign.
use_samtools (bool) – A non-case sensitive boolean indicating wether to use Samtools for variant callign.
use_gatk (bool) – A non-case sensitive boolean indicating wether to use Gatk for variant callign.
use_deepvariant (bool) – A non-case sensitive boolean indicating wether to use DeepVariant for variant callign.
vcf_local_files (string) – A comma delimited list of vfc files to be used instead of using the variant calling tools.
cpus (int) – Number of cpus that are available for each of the methods selected.
- Output:
A dict mapping keys to luigi.LocalTarget instances for each of the processed files. The following keys are available:
‘platypus’ : Local file with the variant calls obtained using Platypus. ‘freebayes’ : Local file with the variant calls obtained using Freevayes. ‘Varscan’ : Local sorted file with variant calls obtained using Varscan. ‘gatk’ : Local file with the variant calls obtained using GATK. ‘deepvariant’ : Local file with the variant calls obtained using DeepVariant.
-
cpus
= <luigi.parameter.IntParameter object>¶
-
requires
()¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
-
run
()¶ The task run method, to be overridden in a subclass.
See Task.run
-
use_deepvariant
= <luigi.parameter.BoolParameter object>¶
-
use_freebayes
= <luigi.parameter.BoolParameter object>¶
-
use_gatk
= <luigi.parameter.BoolParameter object>¶
-
use_platypus
= <luigi.parameter.BoolParameter object>¶
-
use_varscan
= <luigi.parameter.BoolParameter object>¶
-
vcf_local_files
= <luigi.parameter.Parameter object>¶
-
class
wespipeline.vcf.
VarscanCallVariants
(*args, **kwargs)¶ Bases:
luigi.contrib.external_program.ExternalProgramTask
Task used for identifying varinats in the bam file provided using Varscan..
The
wespipeline.utils.GlobalParams.exp_name
will be used for giving name to the vcf produced.- Parameters
none –
- Dependencies:
ReferenceGenome AlignProcessing
- Output:
A luigi.LocalTarget instance for the index vcf file.
-
output
()¶ The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Target
or a list ofTarget
instances.- Implementation note
If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
-
program_args
()¶ Override this method to map your task parameters to the program arguments
- Returns
list to pass as
args
tosubprocess.Popen
-
requires
()¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
wespipeline.vcfanalysis module¶
-
class
wespipeline.vcfanalysis.
DockerVTnormalizeVCF
(*args, **kwargs)¶ Bases:
luigi.contrib.docker_runner.DockerTask
-
VERSION
= '0.57721--hdf88d34_2'¶
-
biallelic_block_substitutions
= <luigi.parameter.BoolParameter object>¶
-
biallelic_clumped_variant
= <luigi.parameter.BoolParameter object>¶
-
property
binds
¶ Override this to mount local volumes, in addition to the /tmp/luigi which gets defined by default. This should return a list of strings. e.g. [‘/hostpath1:/containerpath1’, ‘/hostpath2:/containerpath2’]
-
property
command
¶
-
decomposes_multiallelic_variants
= <luigi.parameter.BoolParameter object>¶
-
property
image
¶
-
property
mount_tmp
¶
-
output
()¶ The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Target
or a list ofTarget
instances.- Implementation note
If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
-
requires
()¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
-
vcf
= <luigi.parameter.Parameter object>¶
-
-
class
wespipeline.vcfanalysis.
NormalizeVcfFiles
(*args, **kwargs)¶ Bases:
wespipeline.utils.MetaOutputHandler
,luigi.task.Task
docstring for NormalizeVcfFiles
-
output
()¶ The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Target
or a list ofTarget
instances.- Implementation note
If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
-
requires
()¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
-
run
()¶ The task run method, to be overridden in a subclass.
See Task.run
-
-
class
wespipeline.vcfanalysis.
VTnormalizeVCF
(*args, **kwargs)¶ Bases:
luigi.contrib.external_program.ExternalProgramTask
-
out
= <luigi.parameter.Parameter object>¶
-
output
()¶ The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Target
or a list ofTarget
instances.- Implementation note
If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
-
program_args
()¶ Override this method to map your task parameters to the program arguments
- Returns
list to pass as
args
tosubprocess.Popen
-
requires
()¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
-
vcf
= <luigi.parameter.Parameter object>¶
-
-
class
wespipeline.vcfanalysis.
VariantCallingAnalysis
(*args, **kwargs)¶ Bases:
luigi.task.Task
Higher level task for comparing variant calls.
Comparing variant calls is a delicate task that increments in complexity when dealing in diploid sequences (such us the human genome), where different variants can appear in the same position in each of the pair chromomes.
The normalization is done with vt, and the comparison with VcfTools
- Parameters
None –
- Output:
None. The resulting files are not provided as task output. Each of the n vcf files is analyzed and comparied by pairs. It is a total of 2n-1 files.
-
normalize
= <luigi.parameter.BoolParameter object>¶
-
output
()¶ The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Target
or a list ofTarget
instances.- Implementation note
If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
-
requires
()¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
-
run
()¶ The task run method, to be overridden in a subclass.
See Task.run
-
class
wespipeline.vcfanalysis.
VcftoolsCompare
(*args, **kwargs)¶ Bases:
luigi.contrib.external_program.ExternalProgramTask
Task used for comparing a pair of vcf files using VcfTools.
- Parameters
vcf1 (str) – Absolute path to the first file to be compared.
vcf2 (str) – Absolute path to the second file to be compared.
- Dependencies:
None
- Output:
A luigi.LocalTarget instance for the result of comparing the files.
-
output
()¶ The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Target
or a list ofTarget
instances.- Implementation note
If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
-
program_args
()¶ Override this method to map your task parameters to the program arguments
- Returns
list to pass as
args
tosubprocess.Popen
-
requires
()¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
-
vcf1
= <luigi.parameter.Parameter object>¶
-
vcf2
= <luigi.parameter.Parameter object>¶
-
class
wespipeline.vcfanalysis.
VcftoolsDepthAnalysis
(*args, **kwargs)¶ Bases:
luigi.contrib.external_program.ExternalProgramTask
Task used for extracting basic statistics for the variant calls using VcfTools.
- Parameters
vcf (str) – Absolute path to the file with the variant annotations.
- Dependencies:
None
- Output:
A luigi.LocalTarget instance for the file with the vcf statistics.
-
output
()¶ The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Target
or a list ofTarget
instances.- Implementation note
If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
-
program_args
()¶ Override this method to map your task parameters to the program arguments
- Returns
list to pass as
args
tosubprocess.Popen
-
requires
()¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
-
vcf
= <luigi.parameter.Parameter object>¶
-
class
wespipeline.vcfanalysis.
VcftoolsFreqAnalysis
(*args, **kwargs)¶ Bases:
luigi.contrib.external_program.ExternalProgramTask
Task used for extracting basic statistics for the variant calls using VcfTools.
- Parameters
vcf (str) – Absolute path to the file with the variant annotations.
- Dependencies:
None
- Output:
A luigi.LocalTarget instance for the file with the vcf statistics.
-
output
()¶ The output that this Task produces.
The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single
Target
or a list ofTarget
instances.- Implementation note
If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.
See Task.output
-
program_args
()¶ Override this method to map your task parameters to the program arguments
- Returns
list to pass as
args
tosubprocess.Popen
-
requires
()¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
-
vcf
= <luigi.parameter.Parameter object>¶