Skip to content

ocl.typing

Types used in object centric learning framework.

ImageData = TensorType['batch size', 'channels', 'height', 'width'] module-attribute

VideoData = TensorType['batch size', 'frames', 'channels', 'height', 'width'] module-attribute

ImageOrVideoData = Union[VideoData, ImageData] module-attribute

TextData = TensorType['batch_size', 'max_tokens'] module-attribute

CNNImageFeatures = ImageData module-attribute

TransformerImageFeatures = TensorType['batch_size', 'n_spatial_features', 'feature_dim'] module-attribute

ImageFeatures = TransformerImageFeatures module-attribute

VideoFeatures = TensorType['batch_size', 'frames', 'n_spatial_features', 'feature_dim'] module-attribute

ImageOrVideoFeatures = Union[ImageFeatures, VideoFeatures] module-attribute

Positions = TensorType['n_spatial_features', 'spatial_dims'] module-attribute

PooledFeatures = TensorType['batch_size', 'feature_dim'] module-attribute

ObjectFeatures = TensorType['batch_size', 'n_objects', 'object_dim'] module-attribute

EmptyIndicator = TensorType['batch_size', 'n_objects'] module-attribute

ObjectFeatureAttributions = TensorType['batch_size', 'n_objects', 'n_spatial_features'] module-attribute

ConditioningOutput = TensorType['batch_size', 'n_objects', 'object_dim'] module-attribute

Output of conditioning modules.

FrameFeatures dataclass

Features associated with a single frame.

Source code in ocl/typing.py
@dataclasses.dataclass
class FrameFeatures:
    """Features associated with a single frame."""

    features: ImageFeatures
    positions: Positions

features: ImageFeatures class-attribute

positions: Positions class-attribute

FeatureExtractorOutput dataclass

Output of feature extractor.

Source code in ocl/typing.py
@dataclasses.dataclass
class FeatureExtractorOutput:
    """Output of feature extractor."""

    features: ImageOrVideoFeatures
    positions: Positions
    aux_features: Optional[Dict[str, torch.Tensor]] = None

    def __iter__(self) -> Iterable[ImageFeatures]:
        """Iterate over features and positions per frame."""
        if self.features.ndim == 4:
            yield FrameFeatures(self.features, self.positions)
        else:
            for frame_features in torch.split(self.features, 1, dim=1):
                yield FrameFeatures(frame_features.squeeze(1), self.positions)

features: ImageOrVideoFeatures class-attribute

positions: Positions class-attribute

aux_features: Optional[Dict[str, torch.Tensor]] = None class-attribute

__iter__

Iterate over features and positions per frame.

Source code in ocl/typing.py
def __iter__(self) -> Iterable[ImageFeatures]:
    """Iterate over features and positions per frame."""
    if self.features.ndim == 4:
        yield FrameFeatures(self.features, self.positions)
    else:
        for frame_features in torch.split(self.features, 1, dim=1):
            yield FrameFeatures(frame_features.squeeze(1), self.positions)

PerceptualGroupingOutput dataclass

Output of a perceptual grouping algorithm.

Source code in ocl/typing.py
@dataclasses.dataclass
class PerceptualGroupingOutput:
    """Output of a perceptual grouping algorithm."""

    objects: ObjectFeatures
    is_empty: Optional[EmptyIndicator] = None  # noqa: F821
    feature_attributions: Optional[ObjectFeatureAttributions] = None  # noqa: F821

objects: ObjectFeatures class-attribute

is_empty: Optional[EmptyIndicator] = None class-attribute

feature_attributions: Optional[ObjectFeatureAttributions] = None class-attribute