doit blog_linkedin_202305

Centralizing Project Management Tasks with doit

At Fenris, we like to focus on fast and clean deployments. To achieve this, we use a plethora of tools to cover our needs for styling, testing, building, and publishing our code. For years, we’ve been using doit, a task management and automation tool for python, to organize our use of these tools.

We had a problem, though. We had practically the same doit tasks used in all of our projects, resulting in duplicated, and sometimes inconsistent, task definition code across dozens of repositories.

Last week, we came up with a solution. We decided to centralize our project management tasks and use that shared task library across our python projects.

 

Overview

doit is a tool that allows us to very simply define and execute all the tasks that we want to run upon pushing new code. As mentioned above, this includes linting, testing, security checks, packaging, publishing, and more.

Using doit gives us the benefits of optimizing processing by skipping already completed tasks, simplifying complicated command line calls, and most importantly, performing all of these tasks identically with our CI/CD and locally. We were already experiencing the benefits of automation, but wanted the benefits of task standardization that come from using a centralized library.

 

Context

There are a few things to know about the doit system before we jump into the code:

  1. By default, the list of tasks available within a given project are stored in a dodo.py file.
  2. The tasks in the dodo.py file are simple, often just containing an action, file / task dependencies, a task name, and targets.
  3. The DOIT_CONFIG constant specifies the default tasks to be run when $ doit is run from the command line. So, each individual project can specify its own DOIT_CONFIG based on its task needs.

 

The Solutions

Problem 1: Duplicated Code

The first problem that we needed to tackle was the fact that all of our doit tasks were copy-pasted across projects with some common elements changed, leading to a lot of duplication of code. To solve this issue, we refactored the shared tasks and moved them to a common package.

We defined each of our tasks within our shared fenris_doit repository, in a tasks folder. Here’s what an example task looks like:

def task_black(dodo: str, repo: str, tests: Optional[str] = None) -> Generator:
    """Check standardized code formatting."""
    for location in remove_none([dodo, repo, tests]):
        yield {
            "name": location,
            "actions": [f"black -l100 --check {location}"],
            "file_dep": list_files(location),
            "task_dep": ["python_dependencies"],
        }

Problem 2: Project-Specific Task Configuration

The second problem on our list was that there are a variety of things that each project might need to customize for its list of tasks. We’ve broken down the modifications that need to be accounted for below. After outlining these issues, we share the code used to solve them.

1. Setting Task Configuration Parameters

Each of the task functions needs a few arguments supplied, either by default values or by cli arguments passed in. We didn’t want a lot of code across repos, so applying partial to every function wasn’t an option.

For example, this black task takes a few arguments – the name of the file with the doit specifications (dodo), and the name of the repository and the tests directory to run the black command on (repo and tests). We needed to enable each project to import specified tasks while also autofilling said parameters so that each task runs with the proper configuration for a given project.

Continuing with this example — if I’m working in a project called my_library, I want to make sure that when I run $ doit black, black operates on the my_library folder, as well as my tests folder and my dodo.py file.

So, we developed a solution allowing for the import of various tasks, autofilling parameters with those specified in the import_tasks function call.

2. Defining Which Tasks to Import / Use

Another challenge that we tackled was supporting that each project might want to import a specific subset of the available shared tasks from our tasks repository. We only want to import the tasks specified in the project’s dodo.py file.

3. Overriding Existing Tasks

Additionally, we sought to support the need that a project might want to override the logic of a task already defined within the shared tasks repository. For example, the shared black task might enforce a line length of 100, whereas said project might need to be more strict with 88. Thus, we wanted to ensure that any of our shared tasks could be overridden within a given project.

4. Defining Custom Tasks

Finally, we wanted to ensure that within a project’s dodo.py file, custom tasks could be defined. Projects may have unique needs that are best tackled by a custom task defined only within said project’s scope.

 

The Code:

First, we share the import_tasks logic that we defined to support the above needs.

"""Tools for importing shared doit tasks."""
import importlib
import inspect
from functools import partial
from typing import Any, Iterable, List, Optional

DOIT_MODULE_NAME = "fenris_doit.tasks"

def import_tasks(
    globals_: dict,
    tasks: List[str],
    repo: str,
    tests: Optional[str] = "tests",
    dodo: Optional[str] = "dodo.py",
    requirements_file: Optional[str] = "requirements-dev.txt",
    internal_deps: Iterable[str] = tuple(),
    **kwargs: Any,
) -> None:
    """Import doit tasks and update task args."""
    _import_and_apply_params(
        globals_=globals_,
        tasks=tasks,
        repo=repo,
        tests=tests,
        dodo=dodo,
        requirements_file=requirements_file,
        internal_deps=internal_deps,
       **kwargs,
)


def _import_and_apply_params(globals_: dict, tasks: List[str], **kwargs: Any) -> None:
    """Import doit tasks and apply replacement params."""
    module = importlib.import_module(DOIT_MODULE_NAME)
    imported_task_names = [x for x in list(module.__dict__) if x.replace("task_", "") in tasks]

    # additional tasks defined in dodo.py file
    custom_task_names = [x for x in globals_ if x.replace("task_", "") in tasks]

    globals_.update({k: _update_if_callable(getattr(module, k), **kwargs) for k in imported_task_names})
    globals_.update({k: _update_if_callable(globals_[k], **kwargs) for k in custom_task_names})


def _update_if_callable(maybe_func: Any, **kwargs: Any) -> Any:
    if callable(maybe_func) and hasattr(maybe_func, "__name__"):
        all_args = inspect.getfullargspec(maybe_func)[0]
        to_apply = {k: v for k, v in kwargs.items() if k in all_args}

        # Cannot return partial directly, as it doesn't have a __name__ that doit picks up
        def new_func(*args: Any, **new_func_kwargs: Any) -> Any:
            try:
                 return partial(maybe_func, **to_apply)(*args, **new_func_kwargs)
            except TypeError as err:
                print(f"Doit config error: {err}")
                exit(1)

        # have to keep the __name__ the same as the original task name for doit
        new_func.__name__ = maybe_func.__name__
        return new_func
    else:
        return maybe_func

A couple of key notes:

  • We include the wrapper function import_tasks defined on line 10 in order to support default values for parameters such as the test directory name and the dodo file name.
  • On line 39, we reference custom_task_names, which includes any tasks that may have been defined in a project’s dodo file. This allows a project to not only import from our shared repository of tasks, but also to define any custom tasks relevant to the project in question. An example of this is shown in the dodo.py code below.
  • the update.calls on lines 41-42 are responsible for the replacement of the task parameters with the values provided in the import_tasks call.
  • We utilize the _update_if_callable helper function to make sure that we’re only changing the signature of functions (in our case, tasks are defined as functions)

Next, we’re sharing an example of a dodo.py file found in one of our projects:

"""Doit logic."""
from fenris_doit import import_tasks

DOIT_CONFIG = {
    "default_tasks": [
        "black",
        "pytest",
        "custom_job",
        ...
    ],
    "cleanforget": True,
   "verbosity": 0,
}

def task_black(dodo: str, repo: str, tests: Optional[str] = None) -> Generator:
    """Check standardized code formatting, overriding existing task and using -l88."""
    for location in remove_none([dodo, repo, tests]):
        yield {
            "name": location,
            "actions": [f"black -l88 --check {location}"],
            "file_dep": list_files(location),
            "task_dep": ["python_dependencies"],
        }

def task_custom_job(repo: str) -> Generator:
    """Check standardized code formatting."""
        yield {
            "name": f"custom_job: {repo}",
            "actions": [echo "custom task commencing..."],
            "file_dep": list_files(repo),
        }

import_tasks(
    globals_=globals(),
    tasks=DOIT_CONFIG["default_tasks"],
    repo="<< project_name >>",
)

A few more notes:

  • On line 15, we define a black task that overrides that from the shared repository. In this case, we’re using different line length settings within this project.
  • You’ll notice on line 25 that we define a custom_job, a task that’s not defined in our centralized tasks repository. Because of the custom task support we implemented in the import_tasks logic, this task will also be configured with the provided repo argument.

Finally, applying the correct values upon import of a given function is a bit challenging, so we’ll touch on a few of the specifics here.

  1. In the import_tasks logic seen above, we use the getfullargspec to figure out which arguments for a given function need to be specified.
  2. We use partial to create a new function (called task_black, for example), with arguments to match what doit requires. Using partial here prevents us from using tons of partial calls within each project’s dodo.py file.
  3. In the example dodo.py file, we pass in the globals() dictionary into import_tasks as an argument so that it can be updated with the new tasks created via partial. Thus, these new tasks, with the desired arguments applied / specified, can be used by an individual project.

You can see here that we’ve met our various needs. We’re able to specify project-specific task configuration parameters, add our own custom tasks, limit which tasks we are importing from the shared repository, and override existing tasks while still using other tasks from the shared repository. =)

 

A Sample Run

Then, when we run doit from the terminal within said project, we see the following:

(<< project_name >>) $ doit
. python_dependencies:requirements-dev.txt
-- flake8:dodo.py
. flake8:<< project_name >>
-- flake8:tests
-- black:dodo.py
. black:<< project_name >>
-- black:tests
-- pytest:pytest

Tasks marked with -- have already been executed (and thus were skipped), whereas tasks marked with . were executed this go around.

 

Final Words

Our centralizing of these project management tasks has helped us clean up our python projects and standardize our deployments even further. Hopefully, the above guidelines can help you and your team to do the same!

 

Additional Resources

Posted in