Parallelization
For the Parallelization
in this framework, we need two define to things:
A Function receiving exactly one parameters and return an output
A Generator that yields inputs
[1]:
# FUNCTION
def f(x):
return sum([x + k for k in range(1000)])
#GENERATOR
def g():
for i in range(10):
yield i
Then, we can use the parallelization implementation to run the function for each paramter returned in the generator.
Serial
This does not parallelize but just runs a for
loop. This is often used as a dummy by default.
[2]:
from azcausal.core.parallelize import Serial
parallelize = Serial()
parallelize(g(), func=f)
[2]:
[499500,
500500,
501500,
502500,
503500,
504500,
505500,
506500,
507500,
508500]
Pool
The Pool
uses the Python implementation of threads or processes to run tasks in parallel.
[3]:
from azcausal.core.parallelize import Pool
# please just for mode `thread` or `process`
mode = 'thread'
# the number of workers (by default #cores-1)
max_workers = None
parallelize = Pool(mode=mode, max_workers=max_workers)
parallelize(g(), func=f)
[3]:
[499500,
500500,
501500,
502500,
503500,
504500,
505500,
506500,
507500,
508500]
Joblib
Uses the well-known Joblib
implementation for parallelization.
[4]:
from azcausal.core.parallelize import Joblib
n_jobs = None
parallelize = Joblib(n_jobs=n_jobs)
parallelize(g(), func=f)
[4]:
[499500,
500500,
501500,
502500,
503500,
504500,
505500,
506500,
507500,
508500]
In some cases you might want to just directly pass a function. The partial
function becomes handy there to map arguments and keywords as well.
[5]:
from functools import partial
funcs = [partial(f, arg) for arg in g()]
parallelize(funcs)
[5]:
[499500,
500500,
501500,
502500,
503500,
504500,
505500,
506500,
507500,
508500]