ocl.utils.dataset_patches
Patches torchdata
for behavior to be consistent with webdatasets.
ChainedGenerator
Bases: IterDataPipe
Simple interface to allow chaining via a generator function.
This mirrors functionality from the webdatasets package.
Source code in ocl/utils/dataset_patches.py
patched_pathsplit
Split a path into a WebDataset prefix and suffix.
The version of pathsplit in torchdata behaves differently from WebDatasets by keeping "." in the suffix. This is patched here, by excluding the separating dot from the regex match.
The prefix is used for grouping files into samples, the suffix is used as key in the output dictionary. The suffix consists of all components after the first "." in the filename.
In torchdata, the prefix consists of the .tar file path followed by the file name inside the archive.
Any backslash in the prefix is replaced by a forward slash to make Windows prefixes consistent with POSIX paths.