ocl.transforms
Module with data pipe transforms.
Transforms are callables which transform a input torchdata datapipe into a new datapipe. For further information see ocl.transforms.Transform.
Transform
Bases: ABC
Abstract Base Class representing a transformation of the input data pipe.
A transform is a callable which when called with a torchdata.datapipes.iter.IterDataPipe applies a transformation and returns a new torchdata.datapipes.iter.IterDataPipe.
Attributes:
Name | Type | Description |
---|---|---|
is_batch_transform |
bool
|
True if the transform should be applied to a batch of examples instead of individual examples. False otherwise. |
fields |
Tuple[str]
|
Tuple of strings, that indicate which elements of the input are needed for this transform to be applied. This allows to avoid decoding parts of the input which are not needed for training/evaluating a particular model. |
Source code in ocl/transforms.py
fields: Tuple[str]
property
abstractmethod
Fields that will be transformed with this transform.
__call__
abstractmethod
Application of transform to input pipe.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_pipe |
IterDataPipe
|
Input data pipe |
required |
Returns:
Type | Description |
---|---|
IterDataPipe
|
Transformed data pipe. |
SimpleTransform
Bases: Transform
Transform of individual key in input dict using different callables.
Example
from torchdata.datapipes.iter import IterableWrapper
from ocl.transforms import SimpleTransform
input_dicts = [{"object_a": 1, "object_b": 2}]
transform = SimpleTransform(
transforms={
"object_a": lambda a: a*2,
"object_b": lambda b: b*3
}
)
input_pipe = IterableWrapper(input_dicts)
transformed_pipe = transform(input_pipe)
for transformed_dict in transformed_pipe:
assert transformed_dict["object_a"] == 1 * 2
assert transformed_dict["object_b"] == 2 * 3
Source code in ocl/transforms.py
__init__
Initialize SimpleTransform.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
transforms |
Dict[str, Callable]
|
Mapping of dict keys to callables that should be used to transform them. |
required |
batch_transform |
bool
|
Set to true if you want your transform to be applied after the data has been batched. |
required |
Source code in ocl/transforms.py
__call__
Transform input data pipe using transforms.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_pipe |
IterDataPipe
|
Input data pipe |
required |
Returns:
Type | Description |
---|---|
IterDataPipe
|
Transformed data pipe. |
DuplicateFields
Bases: Transform
Transform to duplicate a key of a dictionary.
This is useful if your pipeline requires the same input to be transformed in different ways.
Example
from torchdata.datapipes.iter import IterableWrapper
from ocl.transforms import DuplicateFields
input_dicts = [{"object_a": 1, "object_b": 2}]
transform = DuplicateFields(
mapping={
"object_a": "copy_of_object_a",
"object_b": "copy_of_object_b"
}
)
input_pipe = IterableWrapper(input_dicts)
transformed_pipe = transform(input_pipe)
for transformed_dict in transformed_pipe:
assert transformed_dict["object_a"] == 1
assert transformed_dict["copy_of_object_a"] == 1
assert transformed_dict["object_b"] == 2
assert transformed_dict["copy_of_object_b"] == 2
Source code in ocl/transforms.py
__init__
Initialize DuplicateFields.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mapping |
Dict[str, str]
|
Source to target mapping for dupplicated fields. Keys are sources, values are the key for duplicated field. |
required |
batch_transform |
bool
|
Apply to batched input. |
required |
Source code in ocl/transforms.py
Map
Bases: Transform
Apply a function to the whole input dict to create a new output dict.
This transform requires explicitly defining the input fields as this cannot be determined from the provided callable alone.
Example
from torchdata.datapipes.iter import IterableWrapper
from ocl.transforms import Map
input_dicts = [{"object_a": 1, "object_b": 2}]
def combine_a_and_b(input_dict):
output_dict = input_dict.copy()
output_dict["combined"] = input_dict["object_a"] + input_dict["object_b"]
return output_dict
transform = Map(
transform=combine_a_and_b,
fields=("object_a", "object_b")
)
input_pipe = IterableWrapper(input_dicts)
transformed_pipe = transform(input_pipe)
for transformed_dict in transformed_pipe:
a = transformed_dict["object_a"]
b = transformed_dict["object_b"]
assert transformed_dict["combined"] == a + b
Source code in ocl/transforms.py
__init__
Initialize Map transform.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
transform |
Callable[[Dict[str, Any]], Dict[str, Any]]
|
Callable which is applied to the individual input dictionaries. |
required |
fields |
Tuple[str]
|
The fields the transform requires to operate. |
required |
batch_transform |
bool
|
Apply to batched input. |
required |
Source code in ocl/transforms.py
Filter
Bases: Transform
Filter samples according to predicate.
Remove samples from input data pipe by evaluating a predicate.
Example
from torchdata.datapipes.iter import IterableWrapper
from ocl.transforms import Filter
input_dicts = [{"myvalue": 5}, {"myvalue": 10}]
transform = Filter(
predicate=lambda a: a > 5,
fields=("myvalue",)
)
input_pipe = IterableWrapper(input_dicts)
transformed_pipe = transform(input_pipe)
for transformed_dict in transformed_pipe:
assert transformed_dict["myvalue"] > 5
Source code in ocl/transforms.py
__init__
Transform to create a subset of a dataset by discarding samples.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
predicate |
Callable[..., bool]
|
Function which determines if elements should be kept (return value is True)
or discarded (return value is False). The function is only provided with the fields
specified in the |
required |
fields |
Sequence[str]
|
The fields from the input which should be passed on to the predicate for evaluation. |
required |
Source code in ocl/transforms.py
SampleSlices
Bases: Transform
Transform to sample slices from input tensors / numpy arrays.
If multiple fields are provided the input tensors are assumed to be of same length along slicing axes and the same slices will be returned.
Example
import numpy as np
from torchdata.datapipes.iter import IterableWrapper
from ocl.transforms import SampleSlices
my_array = np.random.randn(100, 10)
input_dicts = [{"array1": my_array, "array2": my_array.copy()}]
transform = SampleSlices(
n_slices_per_input=5,
fields=("array1", "array2"),
dim=0,
shuffle_buffer_size=1
)
input_pipe = IterableWrapper(input_dicts)
transformed_pipe = transform(input_pipe)
for transformed_dict in transformed_pipe:
assert transformed_dict["array1"].shape == (5, 10)
assert transformed_dict["array2"].shape == (5, 10)
assert np.allclose(transformed_dict["array1"], transformed_dict["array2"])
Source code in ocl/transforms.py
274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 |
|
__init__
Initialize SampleSlices.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n_slices_per_input |
int
|
Number of slices per input to sample. -1 indicates that all possible slices should be sampled. |
required |
fields |
Sequence[str]
|
The fields that should be considered video data and thus sliced according to the frame sampling during training. |
required |
dim |
int
|
The dimension along which to slice the tensors. |
0
|
seed |
int
|
Random number generator seed to deterministic sampling during evaluation. |
39480234
|
per_epoch |
bool
|
Sampling of frames over epochs, this ensures that after n_frames / n_frames_per_video epochs all frames have been seen at least once. In the case of uneven division, some frames will be seen more than once. |
False
|
shuffle_buffer_size |
int
|
Size of shuffle buffer used during training. An additional shuffling step ensures each batch contains a diverse set of images and not only images from the same video. |
1000
|
Source code in ocl/transforms.py
slice_data
Small utility method to slice a numpy array along a specified axis.
Source code in ocl/transforms.py
sample_frames_using_key
Sample frames deterministically from generator of videos using the key field.
Source code in ocl/transforms.py
SplitConsecutive
Bases: Transform
Source code in ocl/transforms.py
split_to_consecutive_frames
Sample frames deterministically from generator of videos using the key field.
Source code in ocl/transforms.py
SampleConsecutive
Bases: Transform
Select a random consecutive subsequence of frames in a strided manner.
Given a sequence of [1, 2, 3, 4, 5, 6, 7, 8, 9] this will return one of [1, 2, 3] [4, 5, 6] [7, 8, 9].
Source code in ocl/transforms.py
split_to_consecutive_frames
Sample frames deterministically from generator of videos using the key field.
Source code in ocl/transforms.py
VideoDecoder
Bases: Transform
Video decoder based on Decord.
Video decoding is implemented as a preprocessing transform instead of a part of the decoding mechanics as this allows sparse decoding if we only require parts of the input video.
Source code in ocl/transforms.py
529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 |
|
__init__
Video decoder based on decord.
It will decode the whole video into a single tensor and can be used with other downstream processing plugins.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fields |
Sequence[str]
|
The field of the input dictionary containing the video bytes. |
required |
stride |
int
|
Downsample frames by using striding. Default: 1 |
1
|
split_extension |
bool
|
Split the extension off the field name. |
True
|
video_reader_kwargs |
Dict[str, Any]
|
Arguments to decord.VideoReader. |
None
|
Source code in ocl/transforms.py
DecodeRandomWindow
Bases: VideoDecoder
Decode a random window of the video.
Source code in ocl/transforms.py
DecodeRandomStridedWindow
Bases: DecodeRandomWindow
Decode random strided segment of input video.
Source code in ocl/transforms.py
SpatialSlidingWindow
Bases: Transform
Split image data spatially by sliding a window across.
Source code in ocl/transforms.py
686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 |
|