Parallelization

For the Parallelization in this framework, we need two define to things:

A Function receiving exactly one parameters and return an output
A Generator that yields inputs

[1]:

# FUNCTION
def f(x):
    return sum([x + k for k in range(1000)])

#GENERATOR
def g():
    for i in range(10):
        yield i

Then, we can use the parallelization implementation to run the function for each paramter returned in the generator.

Serial

This does not parallelize but just runs a for loop. This is often used as a dummy by default.

[2]:

from azcausal.core.parallelize import Serial

parallelize = Serial()

parallelize(g(), func=f)

[2]:

Pool

The Pool uses the Python implementation of threads or processes to run tasks in parallel.

[3]:

from azcausal.core.parallelize import Pool

# please just for mode `thread` or `process`
mode = 'thread'

# the number of workers (by default #cores-1)
max_workers = None

parallelize = Pool(mode=mode, max_workers=max_workers)

parallelize(g(), func=f)

[3]:

Joblib

Uses the well-known Joblib implementation for parallelization.

[4]:

from azcausal.core.parallelize import Joblib

n_jobs = None

parallelize = Joblib(n_jobs=n_jobs)

parallelize(g(), func=f)

[4]:

In some cases you might want to just directly pass a function. The partial function becomes handy there to map arguments and keywords as well.

[5]:

from functools import partial

funcs = [partial(f, arg) for arg in g()]

parallelize(funcs)

[5]: