Snakemake to python package

Project structure

The first point is to organize the git repository as a python package. To help you, you can run the command:

generate_template -p git_repository -n PKGNAME

which produces the following tree

directory

now I will explain what you have to modify in the files to create your package. You must adapt the highlighted lines on example files, to your project.

Mandatory files

pyproject.toml file

This file is used to create a python package as described here in the official documentation.

click to open pyproject.toml
 1[build-system]
 2build-backend = "setuptools.build_meta"
 3requires = [
 4  "setuptools>=42",
 5  "setuptools_scm[toml]>=6.2"
 6]
 7
 8
 9[tool.setuptools_scm]
10write_to = "PKGNAME/_version.py"
11version_scheme="release-branch-semver"
12tag_regex="^(\\d.\\d.\\d)-*\\w*\\d*$"
13local_scheme = "no-local-version"
14
15[project]
16name = "PKGNAME"
17dynamic = ["version"]
18description = "TODO !!!!!"
19authors = [
20    {name = "Ravel Sebastien (CIRAD)",email = "sebastien.ravel@cirad.fr"},
21]
22dependencies = ["PyYAML", "click>=8.0.3", "cookiecutter", "docutils", "python-gitlab", "snakemake", "tqdm"]
23requires-python = ">=3.8"
24readme = "README.rst"
25license = {file = "LICENSE"}
26keywords = ["snakemake", "wrapper", "installation"]
27classifiers = [
28    "Development Status :: 5 - Production/Stable",
29    "Intended Audience :: Developers",
30    "Intended Audience :: End Users/Desktop",
31    "License :: CeCILL-C Free Software License Agreement (CECILL-C)",
32    "License :: Free for non-commercial use",
33    "License :: OSI Approved :: MIT License",
34    "Natural Language :: English",
35    "Operating System :: POSIX :: Linux",
36    "Programming Language :: Python :: 3.8",
37    "Programming Language :: Python :: 3.9",
38    "Topic :: Scientific/Engineering",
39    "Topic :: Scientific/Engineering :: Bio-Informatics",
40]
41
42[project.urls]
43Homepage = "https://forge.ird.fr/phim/sravel/snakecdysis"
44Downloads = "https://forge.ird.fr/phim/sravel/snakecdysis/archive/"
45"Bug Tracker" = "https://forge.ird.fr/phim/sravel/snakecdysis/issues"
46Documentation = "https://snakecdysis.readthedocs.io/en/latest/"
47"Source Code" = "https://forge.ird.fr/phim/sravel/snakecdysis"
48
49[project.optional-dependencies]
50dev = [
51    "sphinx_click",
52    "sphinx_copybutton",
53    "sphinx_rtd_theme",
54    "tox",
55]
56
57[project.scripts]
58generate_template = "snakecdysis.scripts.generate_template:main"
59
60[project.entry-points.PKGNAME]
61PKGNAME = "__init__"

The __init__.py file

This file is the entry point of the python package.

click to open __init__.py
 1#!/usr/bin/env python3
 2# -*- coding: utf-8 -*-
 3from PKGNAME.global_variables import *
 4from pathlib import Path
 5from .global_variables import GIT_URL, DOCS, DATATEST_URL_FILES, SINGULARITY_URL_FILES
 6
 7logo = Path(__file__).parent.resolve().joinpath('PKGNAME_logo.png').as_posix()
 8
 9__version__ = Path(__file__).parent.resolve().joinpath("VERSION").open("r").readline().strip()
10
11
12__doc__ = """BLABLA"""
13
14description_tools = f"""
15    Welcome to PKGNAME version: {__version__} ! Created on XXXX 20XX
16    @author: Sebastien Ravel (CIRAD)
17    @email: Sebastien.ravel@cirad.fr
18
19    Please cite our github: GIT_URL
20    and GPLv3 Intellectual property belongs to CIRAD and authors.
21    Documentation avail at: DOCS"""
22
23dico_tool = {
24    "soft_path": Path(__file__).resolve().parent.as_posix(),
25    "url": GIT_URL,
26    "docs": DOCS,
27    "description_tool": description_tools,
28    "singularity_url_files": SINGULARITY_URL_FILES,
29    "datatest_url_files": DATATEST_URL_FILES
30}

global_variables.py file

This file allows to group the variables to be used in the wrapper

click to open global_variables.py
 1from pathlib import Path
 2
 3DOCS = "https://PKGNAME.readthedocs.io/en/latest/"
 4GIT_URL = "https://github.com/SouthGreenPlatform/PKGNAME"
 5
 6INSTALL_PATH = Path(__file__).resolve().parent
 7SINGULARITY_URL_FILES = [('https://itrop.ird.fr/culebront_utilities/singularity_build/Singularity.culebront_tools.sif',
 8                          'INSTALL_PATH/containers/Singularity.culebront_tools.sif'),
 9                         ('https://itrop.ird.fr/culebront_utilities/singularity_build/Singularity.report.sif',
10                          'INSTALL_PATH/containers/Singularity.report.sif')
11                         ]
12
13DATATEST_URL_FILES = ("https://itrop.ird.fr/culebront_utilities/Data-Xoo-sub.zip", "Data-Xoo-sub.zip")
14
15
16

main.py file

This is the main script of the workflow. You have to check that its path is in the setup.py file. Normally it is already included with the instruction line 75:

entry_points={
    'console_scripts': [f"{NAME} = {NAME}.main:main"],
},

This is the main script of the workflow. You have to check that its path is in the setup.py file. Normally it is already included with the instruction line 75:

click to open main.py
1#!/usr/bin/env python3
2
3from snakecdysis import main_wrapper
4from PKGNAME import dico_tool
5
6main = main_wrapper(**dico_tool)
7
8if __name__ == '__main__':
9    main()

module.py file

This file is used on snakefile file to add more control of the configuration file and checking user values. The goal is to create a new class that inherits from SnakEcdysis in order to use the attributes in order to have access to, for example, the paths of the scripts, the default/user configuration files, …

click to open module.py
 1#!/usr/bin/env python3
 2# -*- coding: utf-8 -*-
 3from pathlib import Path
 4
 5from snakemake.logging import logger
 6from snakemake.utils import validate
 7import re
 8from .global_variables import *
 9from snakecdysis import *
10
11
12class PKGNAME(SnakEcdysis):
13    """
14    to read file config
15    """
16
17    def __init__(self, dico_tool, workflow, config):
18        super().__init__(**dico_tool, workflow=workflow, config=config)
19        # workflow is available only in __init__
20        # print("\n".join(list(workflow.__dict__.keys())))
21        # print(workflow.__dict__)
22
23        # Initialisation of PKGNAME attributes

snakefile file

click to open snakefile
 1#!/usr/bin/env snakemake
 2# -*- coding: utf-8 -*-
 3
 4from pathlib import Path
 5from pprint import pprint as pp
 6# load own functions
 7import PKGNAME
 8
 9
10PKGNAME_obj = PKGNAME(PKGNAME.dico_tool, workflow=workflow, config=config)
11tools_config = PKGNAME_obj.tools_config
12cluster_config = PKGNAME_obj.cluster_config
13
14# print(PKGNAME_obj.export_use_yaml)
15# print for debug:
16# pp(PKGNAME_obj)
17# exit()
18# print(tools_config)
19# exit()
20
21###############################################################################
22# dir and suffix
23output_dir = config["DATA"]["OUTPUT"]
24log_dir = f"{output_dir}LOGS/"
25
26# Change workdir to output path (slurm logs append on outdir)
27workdir: output_dir
28
29
30
31#############################################
32# use threads define in cluster_config rule or rule default or default in snakefile
33#############################################
34def get_threads(rule, default):
35    """
36    give threads or 'cpus-per-task from cluster_config rule: threads to SGE and cpus-per-task to SLURM
37    """
38    if cluster_config:
39        if rule in cluster_config and 'threads' in cluster_config[rule]:
40            return int(cluster_config[rule]['threads'])
41        elif rule in cluster_config and 'cpus-per-task' in cluster_config[rule]:
42            return int(cluster_config[rule]['cpus-per-task'])
43        elif '__default__' in cluster_config and 'cpus-per-task' in cluster_config['__default__']:
44            return int(cluster_config['__default__']['cpus-per-task'])
45        elif '__default__' in cluster_config and 'threads' in cluster_config['__default__']:
46            return int(cluster_config['__default__']['threads'])
47    # if local
48    elif workflow.global_resources["_cores"]:
49        return workflow.global_resources["_cores"]
50    # if cluster not rule and not default or local not _cores return value from call
51    return default
52
53
54
55################################ ASSEMBLY ####################################
56include: f"{PKGNAME_obj.install_path}/snakefiles/assemblers.snake"
57
58
59
60rule some_exemple:
61    threads: get_threads("some_exemple",1)
62    input:
63        toto=rules.start.output.machin,
64        truc=ref
65    output:
66        csv_var_per_contig=f"{output_dir}4_STRUCTURAL_VAR/csv_variants/{{samples}}_variants_per_contig.csv"
67    log:
68        error=f'{log_dir}some_exemple/{{samples}}.e',
69        output=f'{log_dir}some_exemple/{{samples}}.o'
70    message:
71        f"""
72        Running {{rule}}
73        Input:
74            - toto: {{input.toto}}
75        Output:
76            - csv_file: {{output.csv_var_per_contig}}
77        Others
78            - Threads: {{threads}}
79            - LOG error: {{log.error}}
80            - LOG output: {{log.output}}
81        """
82    singularity:
83        tools_config['SINGULARITY']['TOOLS']
84    envmodules:
85        tools_config["MODULES"]["MINIMAP2"],
86        tools_config["MODULES"]["SAMTOOLS"]
87    shell:
88         f"(python {PKGNAME_obj.snakemake_scripts}/vcf_contigs.py -v {{input.vcf_file}} -r {{input.ref_fasta}} -o {{output.csv_var_per_contig}}) 1>{{log.output}} 2>{{log.error}}"
89
90
91