Parallelization

For the Parallelization in this framework, we need two define to things:

  • A Function receiving exactly one parameters and return an output

  • A Generator that yields inputs

[1]:
# FUNCTION
def f(x):
    return sum([x + k for k in range(1000)])

#GENERATOR
def g():
    for i in range(10):
        yield i

Then, we can use the parallelization implementation to run the function for each paramter returned in the generator.

Serial

This does not parallelize but just runs a for loop. This is often used as a dummy by default.

[2]:
from azcausal.core.parallelize import Serial

parallelize = Serial()

parallelize(g(), func=f)
[2]:
[499500,
 500500,
 501500,
 502500,
 503500,
 504500,
 505500,
 506500,
 507500,
 508500]

Pool

The Pool uses the Python implementation of threads or processes to run tasks in parallel.

[3]:
from azcausal.core.parallelize import Pool

# please just for mode `thread` or `process`
mode = 'thread'

# the number of workers (by default #cores-1)
max_workers = None

parallelize = Pool(mode=mode, max_workers=max_workers)

parallelize(g(), func=f)
[3]:
[499500,
 500500,
 501500,
 502500,
 503500,
 504500,
 505500,
 506500,
 507500,
 508500]

Joblib

Uses the well-known Joblib implementation for parallelization.

[4]:
from azcausal.core.parallelize import Joblib

n_jobs = None

parallelize = Joblib(n_jobs=n_jobs)

parallelize(g(), func=f)
[4]:
[499500,
 500500,
 501500,
 502500,
 503500,
 504500,
 505500,
 506500,
 507500,
 508500]

In some cases you might want to just directly pass a function. The partial function becomes handy there to map arguments and keywords as well.

[5]:
from functools import partial

funcs = [partial(f, arg) for arg in g()]

parallelize(funcs)
[5]:
[499500,
 500500,
 501500,
 502500,
 503500,
 504500,
 505500,
 506500,
 507500,
 508500]