Snakemake to python package
Project structure
The first point is to organize the git repository as a python package. To help you, you can run the command:
generate_template -p git_repository -n PKGNAME
which produces the following tree
now I will explain what you have to modify in the files to create your package. You must adapt the highlighted lines on example files, to your project.
Mandatory files
pyproject.toml file
This file is used to create a python package as described here in the official documentation.
click to open pyproject.toml
1[build-system]
2build-backend = "setuptools.build_meta"
3requires = [
4 "setuptools>=42",
5 "setuptools_scm[toml]>=6.2"
6]
7
8
9[tool.setuptools_scm]
10write_to = "PKGNAME/_version.py"
11version_scheme="release-branch-semver"
12tag_regex="^(\\d.\\d.\\d)-*\\w*\\d*$"
13local_scheme = "no-local-version"
14
15[project]
16name = "PKGNAME"
17dynamic = ["version"]
18description = "TODO !!!!!"
19authors = [
20 {name = "Ravel Sebastien (CIRAD)",email = "sebastien.ravel@cirad.fr"},
21]
22dependencies = ["PyYAML", "click>=8.0.3", "cookiecutter", "docutils", "python-gitlab", "snakemake", "tqdm"]
23requires-python = ">=3.8"
24readme = "README.rst"
25license = {file = "LICENSE"}
26keywords = ["snakemake", "wrapper", "installation"]
27classifiers = [
28 "Development Status :: 5 - Production/Stable",
29 "Intended Audience :: Developers",
30 "Intended Audience :: End Users/Desktop",
31 "License :: CeCILL-C Free Software License Agreement (CECILL-C)",
32 "License :: Free for non-commercial use",
33 "License :: OSI Approved :: MIT License",
34 "Natural Language :: English",
35 "Operating System :: POSIX :: Linux",
36 "Programming Language :: Python :: 3.8",
37 "Programming Language :: Python :: 3.9",
38 "Topic :: Scientific/Engineering",
39 "Topic :: Scientific/Engineering :: Bio-Informatics",
40]
41
42[project.urls]
43Homepage = "https://forge.ird.fr/phim/sravel/snakecdysis"
44Downloads = "https://forge.ird.fr/phim/sravel/snakecdysis/archive/"
45"Bug Tracker" = "https://forge.ird.fr/phim/sravel/snakecdysis/issues"
46Documentation = "https://snakecdysis.readthedocs.io/en/latest/"
47"Source Code" = "https://forge.ird.fr/phim/sravel/snakecdysis"
48
49[project.optional-dependencies]
50dev = [
51 "sphinx_click",
52 "sphinx_copybutton",
53 "sphinx_rtd_theme",
54 "tox",
55]
56
57[project.scripts]
58generate_template = "snakecdysis.scripts.generate_template:main"
59
60[project.entry-points.PKGNAME]
61PKGNAME = "__init__"
The __init__.py file
This file is the entry point of the python package.
click to open __init__.py
1#!/usr/bin/env python3
2# -*- coding: utf-8 -*-
3from PKGNAME.global_variables import *
4from pathlib import Path
5from .global_variables import GIT_URL, DOCS, DATATEST_URL_FILES, SINGULARITY_URL_FILES
6
7logo = Path(__file__).parent.resolve().joinpath('PKGNAME_logo.png').as_posix()
8
9__version__ = Path(__file__).parent.resolve().joinpath("VERSION").open("r").readline().strip()
10
11
12__doc__ = """BLABLA"""
13
14description_tools = f"""
15 Welcome to PKGNAME version: {__version__} ! Created on XXXX 20XX
16 @author: Sebastien Ravel (CIRAD)
17 @email: Sebastien.ravel@cirad.fr
18
19 Please cite our github: GIT_URL
20 and GPLv3 Intellectual property belongs to CIRAD and authors.
21 Documentation avail at: DOCS"""
22
23dico_tool = {
24 "soft_path": Path(__file__).resolve().parent.as_posix(),
25 "url": GIT_URL,
26 "docs": DOCS,
27 "description_tool": description_tools,
28 "singularity_url_files": SINGULARITY_URL_FILES,
29 "datatest_url_files": DATATEST_URL_FILES
30}
global_variables.py file
This file allows to group the variables to be used in the wrapper
click to open global_variables.py
1from pathlib import Path
2
3DOCS = "https://PKGNAME.readthedocs.io/en/latest/"
4GIT_URL = "https://github.com/SouthGreenPlatform/PKGNAME"
5
6INSTALL_PATH = Path(__file__).resolve().parent
7SINGULARITY_URL_FILES = [('https://itrop.ird.fr/culebront_utilities/singularity_build/Singularity.culebront_tools.sif',
8 'INSTALL_PATH/containers/Singularity.culebront_tools.sif'),
9 ('https://itrop.ird.fr/culebront_utilities/singularity_build/Singularity.report.sif',
10 'INSTALL_PATH/containers/Singularity.report.sif')
11 ]
12
13DATATEST_URL_FILES = ("https://itrop.ird.fr/culebront_utilities/Data-Xoo-sub.zip", "Data-Xoo-sub.zip")
14
15
16
main.py file
This is the main script of the workflow. You have to check that its path is in the setup.py file. Normally it is already included with the instruction line 75:
entry_points={
'console_scripts': [f"{NAME} = {NAME}.main:main"],
},
This is the main script of the workflow. You have to check that its path is in the setup.py file. Normally it is already included with the instruction line 75:
click to open main.py
1#!/usr/bin/env python3
2
3from snakecdysis import main_wrapper
4from PKGNAME import dico_tool
5
6main = main_wrapper(**dico_tool)
7
8if __name__ == '__main__':
9 main()
module.py file
This file is used on snakefile file to add more control of the configuration file and checking user values. The goal is to create a new class that inherits from SnakEcdysis in order to use the attributes in order to have access to, for example, the paths of the scripts, the default/user configuration files, …
click to open module.py
1#!/usr/bin/env python3
2# -*- coding: utf-8 -*-
3from pathlib import Path
4
5from snakemake.logging import logger
6from snakemake.utils import validate
7import re
8from .global_variables import *
9from snakecdysis import *
10
11
12class PKGNAME(SnakEcdysis):
13 """
14 to read file config
15 """
16
17 def __init__(self, dico_tool, workflow, config):
18 super().__init__(**dico_tool, workflow=workflow, config=config)
19 # workflow is available only in __init__
20 # print("\n".join(list(workflow.__dict__.keys())))
21 # print(workflow.__dict__)
22
23 # Initialisation of PKGNAME attributes
snakefile file
click to open snakefile
1#!/usr/bin/env snakemake
2# -*- coding: utf-8 -*-
3
4from pathlib import Path
5from pprint import pprint as pp
6# load own functions
7import PKGNAME
8
9
10PKGNAME_obj = PKGNAME(PKGNAME.dico_tool, workflow=workflow, config=config)
11tools_config = PKGNAME_obj.tools_config
12cluster_config = PKGNAME_obj.cluster_config
13
14# print(PKGNAME_obj.export_use_yaml)
15# print for debug:
16# pp(PKGNAME_obj)
17# exit()
18# print(tools_config)
19# exit()
20
21###############################################################################
22# dir and suffix
23output_dir = config["DATA"]["OUTPUT"]
24log_dir = f"{output_dir}LOGS/"
25
26# Change workdir to output path (slurm logs append on outdir)
27workdir: output_dir
28
29
30
31#############################################
32# use threads define in cluster_config rule or rule default or default in snakefile
33#############################################
34def get_threads(rule, default):
35 """
36 give threads or 'cpus-per-task from cluster_config rule: threads to SGE and cpus-per-task to SLURM
37 """
38 if cluster_config:
39 if rule in cluster_config and 'threads' in cluster_config[rule]:
40 return int(cluster_config[rule]['threads'])
41 elif rule in cluster_config and 'cpus-per-task' in cluster_config[rule]:
42 return int(cluster_config[rule]['cpus-per-task'])
43 elif '__default__' in cluster_config and 'cpus-per-task' in cluster_config['__default__']:
44 return int(cluster_config['__default__']['cpus-per-task'])
45 elif '__default__' in cluster_config and 'threads' in cluster_config['__default__']:
46 return int(cluster_config['__default__']['threads'])
47 # if local
48 elif workflow.global_resources["_cores"]:
49 return workflow.global_resources["_cores"]
50 # if cluster not rule and not default or local not _cores return value from call
51 return default
52
53
54
55################################ ASSEMBLY ####################################
56include: f"{PKGNAME_obj.install_path}/snakefiles/assemblers.snake"
57
58
59
60rule some_exemple:
61 threads: get_threads("some_exemple",1)
62 input:
63 toto=rules.start.output.machin,
64 truc=ref
65 output:
66 csv_var_per_contig=f"{output_dir}4_STRUCTURAL_VAR/csv_variants/{{samples}}_variants_per_contig.csv"
67 log:
68 error=f'{log_dir}some_exemple/{{samples}}.e',
69 output=f'{log_dir}some_exemple/{{samples}}.o'
70 message:
71 f"""
72 Running {{rule}}
73 Input:
74 - toto: {{input.toto}}
75 Output:
76 - csv_file: {{output.csv_var_per_contig}}
77 Others
78 - Threads: {{threads}}
79 - LOG error: {{log.error}}
80 - LOG output: {{log.output}}
81 """
82 singularity:
83 tools_config['SINGULARITY']['TOOLS']
84 envmodules:
85 tools_config["MODULES"]["MINIMAP2"],
86 tools_config["MODULES"]["SAMTOOLS"]
87 shell:
88 f"(python {PKGNAME_obj.snakemake_scripts}/vcf_contigs.py -v {{input.vcf_file}} -r {{input.ref_fasta}} -o {{output.csv_var_per_contig}}) 1>{{log.output}} 2>{{log.error}}"
89
90
91