macro.FieldDescriptor¶

class macro.FieldDescriptor(schema_entries)¶

Struct that describes a piece of data.

It is an immutable container of schema entries. Order of schema entries does not matter.

get_entries_of_schema(data_schema)¶

Return all entries of a given schema, if any.

Convenience function to more easily access the schema entries associated with this descriptor.

macro.FieldSet¶

class macro.FieldSet¶

Struct that represents a set of fields.

It is a mutable collection of fields, where a field is some data that is described by a descriptor.

add_field(field_descriptor, data)¶

Add data for a field to this field set by its descriptor.

Raises an AssertionError if the field set already contains this field descriptor.

copy()¶: Make a shallow copy of this field set.

classmethod create()¶: Create an empty field set.

get_field(field_descriptor)¶

Get field data for the specified field descriptor.

Raises an AssertionError if the field descriptor does not exist in the field set.

has_field(field_descriptor)¶: Returns true if this field set has the specified field descriptor.

iter_fields()¶: Returns an iterable of (field descriptor, field data) tuples over the fields of this field set.

remove_field(field_descriptor)¶

Remove data for a field from this field set by its descriptor.

Raises an AssertionError if the field set does not contain this field descriptor.

union_with(*field_sets)¶

Take union of this field set with other field set(s).

If there is an overlap of field descriptors, the later field sets takes precedence.

macro.DataBlob¶

class macro.DataBlob(name, *args, **kwargs)¶

Class used to represent a data blob.

A data blob encapsulates and mediates access to large binary data objects used as data input to a machine learning pipeline. Once retrieved from the cloud or local disk, it is preserved as a snapshot to ensure reproducible and download-efficient runs of downstream compute tasks.

To create a data blob, define a function that performs the necessary data fetches and instantiate it with a DataBlob.Definition:

def download_fn(handle):
    with handle.write_opener(tarfile.open, "data.tar.gz", "w:gz") as f:
        f.write(requests.get('https://macro.ai').text)
    return dict(metadata_key='metadata_value')

data_blob = DataBlob.create("my_data_blob", DataBlob.Definition(download_fn))

To use a previously-defined data blob, load it in the following way:

data_blob = DataBlob.load("my_data_blob")

with data_blob.read_opener(tarfile.open, "data.tar.gz", "r:gz") as f:
    print(f.read())  # retrieve data that was stored

print(data_blob.get_metadata())  # {'metadata_key': 'metadata_value'}

To leverage parallel processing for data blob operations, use the following API:

def download_fn(handle, *args):
    for data in datapoints:
        yield (handle, data,) + args  # yield tuple of data

def process_input(handle, data, arg1, arg2):
    with handle.write_opener(...) as f:
        ...  # download and write based on input
    return dict(
        metadata_key='metadata_value')  # gets combined into a single dict

data_blob = DataBlob.create(
    "my_data_blob",
    DataBlob.Definition(download_fn, parallelized_iter=process_input))

This will spin up a pool of worker processes under the hood automatically to call process_input for each item returned by the generator of the download function.

class Definition(download_fn, *args, parallelized_iter=None, **kwargs)¶

Definition object used to define a DataBlob.

By default, this is constructed with a download function (function that performs the necessary download of data for a blob). If additional functionality is needed, consider subclassing this class and overriding the get_download_function method.

If parallelized_iter is passed in, we expect download_fn to be a generator for tasks, and we will apply the parallelized_iter function to each task in a multiprocess pool.

get_download_function()¶

Accessor for the download function.

Subclasses may override this with custom behavior.

class Handle(data_blob)¶

Handle to encapsulate the data blob, primarily the data dir and all writes.

add_metadata(key, value)¶: Add metadata to the data blob.

copy_file(src_path, dest_path)¶: Copy file from other location into this data blob.

download_url(url, dest_path, use_cache=True)¶: Convenience function to download contents of a url into the data blob.

from_relative_path(other_path)¶: Convert a relative path within the data blob to an absolute path.

get_data_dir()¶: Get the absolute data directory used to store data for this resource.

read_opener(opener, dest_path, *args, **kwargs)¶

Convenience wrapper function to apply a file read opener to a relative path within the data blob.

Unlike DataBlob.read_opener, this does not ensure the resource is computed. This is meant to be used if the download function of a DataBlob needs to read some of the data that has been written to it.

remove_file(dest_path)¶: Remove a file from the data blob.

to_relative_path(other_path)¶: Convert an absolute path to a relative path within the data blob.

write_opener(opener, dest_path, *args, **kwargs)¶

Convenience wrapper function to apply a file write opener to a relative path within the data blob.

Ensures the data directory is created before attempting to open the file.

get_metadata()¶: Get a copy of the metadata that was stored within the data blob.

handle()¶: Return a handle object that can be used to store data into the data blob.

hydrate(obj)¶: Hydrate the state of a data blob from a serializable representation of it.

read_opener(opener, dest_path, *args, **kwargs)¶

Convenience function to apply a file read opener to a relative path within the data blob.

Ensures the data blob is computed before attempting to open the file.

serialize()¶: Return the serializable representation of a data schema’s state.

macro.DataSchema¶

class macro.DataSchema(name, *args, **kwargs)¶

Class used to represent a data schema.

A data schema is used to represent a class ontology or a collection of attributes. It is defined as an enumeration over a set of entries, with optional attributes for eache ntry.

It enables data sets and model trainers to be parameterized to handle different predictive tasks without needing to customize them repeatedly. The resulting resource can also be easily identified and used appropriately with the correct output schema.

To create a data schema, define a function that specifies the entries within the schema and instantiate it with a DataSchema.Definition:

def definition_fn(handle):
    handle.add_primary_key('class_name')
    handle.add_field('class_description')  # optional attribute(s)

    handle.add_entry(
        class_name='cat',
        class_description='an animal also known as a feline',
    )
    handle.add_entry(
        class_name='dog',
        class_description='an animal also known as a canine',
    )

data_schema = DataSchema.create(
    "my_data_schema", DataSchema.Definition(definition_fn))

To use a previously-defined data schema, load it in the following way:

data_schema = DataSchema.load("my_data_schema")

for entry in data_schema.entries():
    print(entry.primary_key)  # 'cat' or 'dog'
    print(entry.as_dict())  # {'class_name': ..., 'class_description': ...}

class Definition(definition_fn, *args, **kwargs)¶

Definition object used to define a DataSchema.

By default, this is constructed with a definition function (function that defines the fields and records of a data schema). If additional functionality is needed, consider subclassing this class and overriding the get_definition_function method.

get_definition_function()¶

Accessor for the definition function.

Subclasses may override this with custom behavior.

class Entry(data_schema, primary_key)¶

Object representing a data entry in a data schema.

This is typically used as an identifying key for the data it represents.

as_dict()¶: Return all fields of the data schema entry as a dictionary.

as_tuple()¶: Return the info uniquely identifying this data schema entry as a tuple.

classmethod deserialize(obj)¶: Reconstruct a data schema entry from the serializable representation of its state.

classmethod from_tuple(tup)¶: Reconstruct a data schema entry from the tuple representation of its state.

serialize()¶: Return a serializable representation of an entry that can be used to reconstruct it.

class Handle(data_schema)¶

Handle to encapsulate the data schema object, for writes.

add_entry(**kwargs)¶: Add an entry to the data schema via its handle.

add_field(field_name)¶: Add a field to the data schema via its handle.

add_primary_key(primary_key)¶: Add the primary key field to the data schema via its handle.

add_entry(**kwargs)¶

Add an entry to this data schema.

The primary key and all the field names must be specified for each entry.

add_field(field_name)¶

Add a field to this data schema.

Each field name must be unique.

add_primary_key(field_name)¶

Add the primary key field for this data schema.

There must be exactly one primary key field set.

entries()¶

Returns an iterable over the entries of this data schema.

Ensures the data schema is computed before doing so.

entry(primary_key)¶

Returns a DataSchema.Entry object for a given primary key.

Ensures the data schema is computed and that primary key is defined before returning it.

field_names()¶

Returns a list of field names for this data schema, starting with the primary key.

Ensures the data schema is computed before doing so.

handle()¶: Return a handle object that can be used to add fields and entries to the data schema.

has_entry(primary_key)¶

Determine if this data schema has an entry with the specified primary key.

Ensures the data schema is computed before checking.

hydrate(obj)¶: Hydrate the state of a data schema from a serializable representation of it.

serialize()¶: Return the serializable representation of a data schema’s state.

tables()¶

Returns the entries and their fields for this data schema in table form.

Ensures the data schema is computed before doing so.

macro.DataSchemaMapper¶

class macro.DataSchemaMapper(name, *args, **kwargs)¶

Class used to represent a data schema mapper.

A data schema mapper defines a mapping between two data schemas. Given a mapping between an input and output schema, the mapper allows the caller to use a set of convenience functions to transform resources from one data schema to the other.

To create a data schema mapper, subclass the DataSchemaMapper.Definition class and override the get_mapping method to return a dictionary that maps between an input and output data schema, and instantiate a DataSchemaMapper with it:

class InputToOutput(DataSchemaMapper.Definition):
    def get_mapping(self):
        return {
            self.input_schema.entry('cat'): self.output_schema.entry('feline'),
            self.input_schema.entry('dog'): self.output_schema.entry('canine'),
        }

data_schema_mapper = DataSchemaMapper.create(
    "my_data_schema_mapper", InputToOutput(input_schema, output_schema))

To use a previously-defined data schema mapper, load it in the following way:

data_schema_mapper = DataSchemaMapper.load("my_data_schema_mapper")

output_entry = data_schema_mapper.transform_data_schema_entry(input_entry)
output_field_set = data_schema_mapper.transform_field_set(input_field_set)
output_data_set = data_schema_mapper.transform_data_set(input_data_set)
output_model_wrapper = data_schema_mapper.transform_model_wrapper(input_model_wrapper)

class Definition(input_schema, output_schema)¶

Definition object used to define a DataSchemaMapper.

get_mapping()¶: Return a dictionary mapping entries from the input schema to the output schema.

hydrate(obj)¶: Hydrate the state of a data schema mapper from a serializable representation of it.

serialize()¶: Return the serializable representation of a data schema mapper’s state.

transform_data_schema_entry(data_schema_entry)¶

Return the output data schema entry associated with the input entry in the mapping.

If the input entry is not found in the mapping, return None.

transform_data_set(data_set, skip_empty=True, output_name=None)¶

Return a data set transformed with this data schema mapper.

Each field set in the data set will be transformed by the data schema mapper.

If skip_empty is True, we skip any field sets that do not have the output schema after the transform step. If skip_empty is numeric, we skip these field sets with that probability.

transform_field_set(field_set)¶

Return a field set transformed with this data schema mapper.

If the field set has a field descriptor which involves the input schema, we map it to the output schema and add the mapped descriptor to the field set.

transform_model_wrapper(model_wrapper, output_name=None)¶

Return a model wrapper transformed with this data schema mapper.

Each field set returned by the model wrapper will be transformed.

macro.DataSet¶

class macro.DataSet(name, *args, **kwargs)¶

Class used to represent a data set.

A data set is used to manage ordered collections of FieldSets, which represent rows or records in a data set.

The data set object handles efficient reads and writes under the hood in a compressed data format, and exposes an iterator interface to access records. There are also accessor methods for querying metadata about the data set, such as the histogram of data schema entries represented.

To create a data set, define a generator function that yields FieldSets and instantiate it with a DataSet.Definition:

def generator_fn():
    for data in datapoints:
        record = FieldSet.create()
        ...  # add fields from data to field set
        yield record

data_set = DataSet.create("my_data_set", DataSet.Definition(generator_fn))

To use a previously-defined data set, load it in the following way:

data_set = DataSet.load("my_data_set")

for record in data_set.get_records():
    ...  # process record

To leverage parallel processing for data set operations, use the following API:

def generator_fn(*args):
    for data in datapoints:
        yield (data,) + args  # yield tuple of data

def process_input(data, arg1, arg2):
    record = FieldSet.create()
    ...  # add fields from data to field set
    return [record]  # return 0 or more records for each

data_set = DataSet.create(
    "my_data_set",
    DataSet.Definition(generator_fn, parallelized_iter=process_input))

This will spin up a pool of worker processes under the hood automatically to call process_input for each record to be processed.

static Concat(*data_sets, output_name=None, should_precompute=True)¶: Concatenate 2 or more data sets together, in order of input.

class Definition(generator_fn, *args, parallelized_iter=None, **kwargs)¶

Definition object used to define a DataSet.

By default, this is constructed with a generator function (function that returns a generator of data points). If additional functionality is needed, consider subclassing this class and overriding the get_generator_function method.

Specify args or kwargs if there are parameters to pass into the generator function.

If should_precompute=True is passed in, we assume this data set cannot be streamed on the fly (e.g. involves gpu-based computation, or feeds into multiple consumers), and we will first iterate to compute and then cache the results for subsequent use.

If parallelized_iter is passed in, we will apply this function to each data point in a multiprocess pool. This function should return an iterable of data points. This will set should_precompute to True if it was not specified.

get_generator_function()¶

Accessor for the generator function.

Subclasses may override this with custom behavior.

The generator function has to return a generator that yields field sets.

static Take(data_set, num_rows, output_name=None, should_precompute=True)¶: Take num_rows worth of records from data set.

static Zip(*data_sets, output_name=None, should_precompute=True)¶: Zip 2 or more data sets together. The later data sets take precedence.

hydrate(obj)¶: Hydrate the state of a data set from a serializable representation of it.

serialize()¶: Return the serializable representation of a data set’s state.

macro.DataSetMapper¶

class macro.DataSetMapper(name, *, _allowed=False, **kwargs)¶

Class used to represent a data set mapper.

A data set mapper defines a mapping between two data sets. It can do more than just a one-to-one mapping between records of a data set. You could combine multiple input records into one output record (e.g. combine individual images into a video), or you could convert one input record into multiple output records (e.g. the reverse operation of converting a video into individual images).

To create a data set mapper, subclass the DataSetMapper.Definition class and provide implementations for its interface methods. Then pass that definition instance to the DataSetMapper.create method:

class ConvertDataSet(DataSetMapper.Definition):
    def initialize(self):
        # Returns one or more field sets prior to iterating through data set.
        return []

    def finalize(self):
        # Returns one or more field sets after iterating through data set.
        return []

    def ingest(self, field_set):
        # Takes a field set as input and returns one or more output field sets.
        return [field_set]

    def should_continue(self, num_iterations, num_ingested, num_output):
        # Returns true if we should continue processing the input data set.
        return num_iterations == 0

data_set_mapper = DataSetMapper.create(
    "my_data_set_mapper", ConvertDataSet())

To use a previously-defined data set mapper, load it in the following way:

data_set_mapper = DataSetMapper.load("my_data_set_mapper")

output_data_set = data_set_mapper.transform_data_set(input_data_set)

class Definition(resource_dependencies, should_precompute=True)¶

Definition object used to define a DataSetMapper.

In particular, it takes input field sets and returns optional output field set(s).

finalize()¶: Returns one or more field sets after iterating through data set.

ingest(field_set)¶: Takes a field set as input and returns one or more output field sets.

initialize()¶: Returns one or more field sets prior to iterating through data set.

should_continue(num_iterations, num_ingested, num_output)¶

Returns true if we should continue processing the input data set.

The number of iterations, number of ingested records, and number of output records so far are given as input to this method.

transform_data_set(data_set, should_precompute=True, output_name=None)¶: Transform an input data set using this data set mapper, and return the output data set.

macro.DataVisualizer¶

class macro.DataVisualizer(name, *, _allowed=False, **kwargs)¶

Class used to represent a data visualizer.

A data visualizer takes a data set as input, and products a data visualization as its output.

To create a data visualizer, subclass the DataVisualizer.Definition class and provide implementations for its interface methods. Then pass that definition instance to the DataVisualizer.create() method:

class CustomVisualizer(DataVisualizer.Definition):
    @classmethod
    def create_render_set_from_field_set(cls, field_set, fixed_schemas, variable_schemas):
        # Return a render set from a given field set.
        render_set = Render.Set.create()
        ...
        return render_set

data_visualizer = DataVisualizer.create(
    "my_data_visualizer",
    CustomVisualizer(fixed_schemas))

To use a previously-defined data visualizer, load it in the following way:

data_visualizer = DataVisualizer.load("my_data_visualizer")

data_visualization = data_visualizer.create_data_visualization_from_data_set(
    data_set, variable_schemas)

See macro.Render() for the rendering primitives available for data visualizers to use.

class Definition(*fixed_schemas)¶

Definition object used to define a DataVisualizer.

classmethod create_render_set_from_field_set(field_set, fixed_schemas, variable_schemas)¶

Return a render set from a given field set.

See macro.Render.Set() for how to create one.

create_data_visualization_from_data_set(data_set, variable_schemas=None, num_rows=20, output_name=None)¶: Create data visualization from data set using this data visualizer’s logic.

macro.DataVisualization¶

class macro.DataVisualization(name, *args, **kwargs)¶

Class used to represent a data visualization.

A data visualization is a resource that allows interactive exploration of a sequence of data. It is created by a data visualizer acting on a sequence of field sets from a data set.

To create one, you would typically use a data visualizer in the following way:

data_visualizer = DataVisualizer.load("my_data_visualizer")

data_visualization = data_visualizer.create_data_visualization_from_data_set(
    data_set, variable_schemas)

class Definition(data_set, data_visualizer, variable_schemas, num_rows)¶: Definition object used to define a DataVisualizer.

class Handle(data_visualization)¶

Handle to encapsulate the data visualization, primarily the data dir and all writes.

get_data_dir()¶: Get the absolute data directory used to store data for this resource.

get_num_render_sets()¶

Return the number of render sets within this data visualization.

Ensures the data visualization is computed first.

get_render_set(i)¶

Return the i-th render set read from files within this data visualization.

Ensures the data visualization is computed first.

get_render_sets()¶

Return an iterable of render sets read from files within this data visualization.

Ensures the data visualization is computed first.

handle()¶: Return a handle object that can be used to store data into the data visualization.

hydrate(obj)¶: Hydrate the state of a data visualization from a serializable representation of it.

serialize()¶: Return the serializable representation of a data visualization’s state.

macro.ModelTrainer¶

class macro.ModelTrainer(name, *, _allowed=False, **kwargs)¶

Class used to represent a model trainer.

A model trainer takes a data set as input, runs training, and produces a trained model in the form of a model wrapper.

To create a model trainer, subclass the ModelTrainer.Definition class and provide implementations for its interface methods. Then pass that definition instance to the ModelTrainer.create() method:

class CustomTrainer(ModelTrainer.Definition):
    @classmethod
    def build_local_data_set(cls, handle, data_set, input_descriptors, output_schemas):
        # Write data set to disk in a format used by the training script
        ...

    @classmethod
    def run_training(cls, handle, model_params, data_set_params, **kwargs):
        # Run training and store model artifacts via the handle
        ...

    @classmethod
    def get_model_evaluator(cls, handle, training_params):
        # Load model and return a model evaluator function that fulfills
        # the following interface:
        def model_evaluator(inputs_dict, *output_schemas):
            output = FieldSet.create()
            ...
            return output
        return model_evaluator

model_trainer = ModelTrainer.create(
    "my_model_trainer",
    CustomTrainer(model_params))

To use a previously-defined model trainer, load it in the following way:

model_trainer = ModelTrainer.load("my_model_trainer")

_, model_wrapper = model_trainer.create_model_wrapper(
    data_set, input_descriptors, output_schemas)

class Definition(model_params)¶

Definition object used to define a ModelTrainer.

classmethod build_local_data_set(handle, data_set, input_descriptors, output_schemas)¶

Build a representation of the input data set on local disk for training.

Subclasses should implement this method.

get_generate_compute_function()¶

Returns a function that computes the model.

This would run training using the input data blob.

get_generate_download_function()¶

Returns a function that produces a download function for a training data blob.

This is the step that takes a data set and builds a representation of it on local disk for training, as a data blob.

get_generate_load_function()¶: Returns a function that loads a model and returns a model evaluator function.

classmethod get_model_evaluator(handle, training_params)¶

Load model and return a model evaluator function that fulfills the following interface:

def model_evaluator(inputs_dict, *output_schemas):
    output = FieldSet.create()
    return output

return model_evaluator

Subclasses should override this method to specify custom function.

classmethod initialize_with_model_wrapper(handle, initialization_model_wrapper)¶

Initialize training with an existing model wrapper.

Subclasses should implement this method.

classmethod run_training(handle, model_params, data_set_params, **kwargs)¶

Run training of model given a set of input data set and model parameters.

Subclasses should implement this method.

create_data_blob(data_set, input_descriptors, output_schemas, output_name=None)¶: Create data blob containing data set that has been processed into a format suitable for training.

create_model_wrapper(data_set, input_descriptors, output_schemas, initialization_model_wrapper=None, validation_data_set=None, validation_output_name=None, replace_invalid_with_none=False, output_name=None)¶: Create model wrapper from data set, creating an intermediary data blob containing processed training data.

create_model_wrapper_from_data_blob(data_blob, input_descriptors, output_schemas, initialization_model_wrapper=None, validation_data_blob=None, replace_invalid_with_none=False, output_name=None)¶

Create model wrapper from training on data blob containing processed data set.

If initialization_model_wrapper is specified, use it to initialize training.

macro.ModelWrapper¶

class macro.ModelWrapper(name, *args, **kwargs)¶

Class used to represent a model wrapper.

A model wrapper represents a standalone algorithm or the trained model output of a model trainer.

It provides an initialization context to load the model into memory, and an API that takes an input field set or data set and return the corresponding output field set or data set.

To create a model wrapper from a model trainer, see ModelTrainer.create_model_wrapper().

To create a model wrapper for a standalone function, subclass the ModelWrapper.Definition class and provide implementations for its interface methods. Then pass that definition instance to the ModelWrapper.create() method:

class CustomModel(ModelWrapper.Definition):
    @classmethod
    def get_model_evaluator(cls, handle, model_params):
        # Load model and return a model evaluator function that fulfills
        # the following interface:
        def model_evaluator(inputs_dict, *output_schemas):
            output = FieldSet.create()
            ...
            return output
        return model_evaluator

model_wrapper = ModelWrapper.create(
    "my_model_wrapper",
    CustoModel(model_params, input_descriptors, output_schemas))

To use a previously-defined model wrapper, load it in the following way:

model_wrapper = ModelWrapper.load("my_model_wrapper")

output_field_set = model_wrapper.create_field_set_from_eval(input_field_set)
output_data_set = model_wrapper.create_data_set_from_eval(input_data_set)

class Definition(model_params, input_descriptors, output_schemas, replace_invalid_with_none=False)¶

Definition object used to define a ModelWrapper.

If replace_invalid_with_none is True, then invalid input field sets to the model (i.e. has missing input descriptors) will be replaced entirely with a None value instead of asserting. The model evaluation function has to be able to handle None values as input accordingly.

classmethod compute_model(handle, model_params)¶

Method to compute the model if it is required for inference.

Subclasses may override this with custom behavior.

get_load_function()¶

Accessor for the load function.

Subclasses may override this with custom behavior.

classmethod get_model_evaluator(handle, model_params)¶

Load model and return a model evaluator function that fulfills the following interface:

def model_evaluator(inputs_dict, *output_schemas):
    output = FieldSet.create()
    return output

return model_evaluator

Subclasses should override this method to specify custom functionality.

class Handle(model_wrapper)¶

Handle to encapsulate the model wrapper, primarily for writes to its data dir.

copy_file(src_path, dest_path)¶: Copy file from other location into this model wrapper.

download_url(url, dest_path, use_cache=True)¶: Convenience function to download contents of a url into the model wrapper.

from_relative_path(other_path)¶: Convert a relative path within the model wrapper to an absolute path.

get_data_dir()¶: Get the absolute data directory used to store data for this resource.

read_opener(opener, dest_path, *args, **kwargs)¶: Convenience wrapper function to apply a file read opener to a relative path within the model wrapper.

to_relative_path(other_path)¶: Convert an absolute path to a relative path within the model wrapper.

write_opener(opener, dest_path, *args, **kwargs)¶

Convenience wrapper function to apply a file write opener to a relative path within the data blob.

Ensures the data directory is created before attempting to open the file.

create_data_set_from_eval(data_set, include_original_fields=False, num_rows=None, num_processes=1, should_precompute=True, output_name=None)¶

Create an output data set by evaluating this model wrapper on an input data set.

If include_original_fields is True, the original input field set is unioned with the output field set. If should_precompute is False, we don’t precompute the data set. If num_processes is None, use all available cores (default is 1).

create_field_set_from_eval(field_set, include_original_fields=False)¶

Create an output field set by evaluating this model wrapper on an input field set.

If include_original_fields is True, the original input field set is unioned with the output field set.

handle()¶: Return a handle object that can be used to read and write data for the model wrapper.

hydrate(obj)¶: Hydrate the state of a model wrapper from a serializable representation of it.

load_model()¶

Context manager to load a model so that it is ready for repeated inference.

Ensures the model wrapper is computed prior to yielding.

serialize()¶: Return the serializable representation of a model wrappers’ state.

macro.ModelServer¶

class macro.ModelServer(name, *, _allowed=False, **kwargs)¶

Class used to represent a model server.

A model server takes some resources (typically a model wrapper) as input and defines the request endpoint (a string) and associated handler logic for serving requests.

To create a model server, choose an endpoint string and define a factory function that returns an async request handler, and use those to instantiate a ModelServer.Definition instance. Then pass that definition instance to the ModelServer.create() method:

def request_handler_maker(*args):
    async def request_handler(request):
        data = await request.post()
        ...  # process data
        resp = web.StreamResponse(
            status=200,
            reason='OK',
            headers={'Content-Type': 'text/json'})
        await resp.prepare(request)
        await resp.write(json.dumps(output_json).encode())
        return resp
    return request_handler

model_server = ModelServer.create(
    "my_model_server",
    ModelServer.Definition(
        "my_endpoint",
        request_handler_maker,
        *args))

To use a previously-defined model server, load it in the following way:

model_server = ModelServer.load("my_model_server")

# add route to an app server like one using aiohttp.web
app.router.add_route(
    'POST',
    "/" + model_server.get_request_endpoint(),
    model_server.get_request_handler())

class Definition(request_endpoint, request_handler_fn, *args, **kwargs)¶

Definition object used to define a ModelServer.

get_request_endpoint()¶: Accessor function for request endpoint string.

get_request_handler()¶

Factory function for a request handler.

Request handler is an async function that takes a request and returns a response.

get_request_endpoint()¶: Return request endpoint.

get_request_handler()¶: Return an async request handler.

macro.EvaluationMetric¶

class macro.EvaluationMetric(name, *, _allowed=False, **kwargs)¶

Class used to represent an evaluation metric.

An evaluation metric contains the logic to score a model prediction with its ground truth labels. Given a data set with labels and a data set with predictions, this resource computes one or more custom metrics for each data point and returns a summarized set of metrics across the whole data set.

To create an evaluation metric, subclass the EvaluationMetric.Definition class and provide implementations for its interface methods. Then pass that definition instance to the EvaluationMetric.create() method:

class CustomMetric(EvaluationMetric.Definition):
    @classmethod
    def compute_metrics(self, class_schema, label_field_set, prediction_field_set):
        # Invoke metric computation logic for a given pair of label and prediction field sets.
        ...

    @classmethod
    def summarize_metrics(cls, metrics):
        # Summarize the metrics computed for field sets across the whole data set.
        ...

    @classmethod
    def generate_artifacts(cls, metrics):
        # Generate artifacts of the metrics computed for field sets across the whole data set.
        ...

evaluation_metric = EvaluationMetric.create(
    "my_evaluation_metric",
    CustomMetric(label_entry, prediction_entry))

To use a previously-defined evaluation metric, load it in the following way:

evaluation_metric = EvaluationMetric.load("my_evaluation_metric")

computed_metrics = evaluation_metric.compute_metrics(
    class_schema, label_field_set, prediction_field_set)

evaluation_data_set, evaluation_result = evaluation_metric.create_evaluation_result(
    model_wrapper, data_set, class_schema)

class Definition(label_entry, prediction_entry)¶

Definition object used to define an EvaluationMetric.

compute_metrics(class_schema, label_field_set, prediction_field_set)¶

Invoke metric computation logic for a given pair of label and prediction field sets.

Subclasses should implement this method.

classmethod generate_artifacts(metrics)¶

Generate artifacts of the metrics computed for field sets across the whole data set.

Returns a mapping of filename to data to be stored in the evaluation result resource.

Subclasses may optionally override this method.

classmethod summarize_metrics(metrics)¶

Summarize the metrics computed for field sets across the whole data set.

Subclasses should implement this method.

compute_metrics(class_schema, label_field_set, prediction_field_set)¶: Invoke metric computation logic for a given pair of label and prediction field sets.

create_evaluation_result(model_wrapper, data_set, class_schema, label_data_set=None, include_original_fields=False, num_processes=1, num_rows=20, output_name=None, sorting_key=None, ascending=True)¶

Create an evaluation result by applying the specified model wrapper to a given input data set.

If label_data_set is specified, use that as the data set with labels. Otherwise, defaults to the input data set., If include_original_fields is True, the original input field set is unioned with the output field set during eval.

Returns a tuple of the evaluation data set and the evaluation result.

create_sorted_evaluation_data_set(eval_result, eval_data_set, sorting_key, ascending=True, num_rows=20, output_name=None)¶: Create an evaluation data set where field sets are returned in the order given by the sorting key.

generate_artifacts(metrics)¶: Generate artifacts of the metrics computed for field sets across the whole data set.

summarize_metrics(metrics)¶: Summarize the metrics computed for field sets across the whole data set.

macro.EvaluationResult¶

class macro.EvaluationResult(name, *args, **kwargs)¶

Class used to represent an evaluation result.

An evaluation result is the output of an evaluation metric on a pair of data sets containing the ground truth labels and model predictions respectively.

To create one, you would typically use an evaluation metric in the following way:

evaluation_metric = EvaluationMetric.load("my_evaluation_metric")

evaluation_data_set, evaluation_result = evaluation_metric.create_evaluation_result(
    model_wrapper, data_set, class_schema)

class Definition(evaluation_metric, model_wrapper, label_data_set, prediction_data_set, class_schema, num_rows)¶

Definition object used to define an EvaluationResult.

get_generator()¶: Return a generator function, args and kwargs, which creates a generator that yields evaluation outputs.

get_all_metrics()¶: Get all computed metrics for this evaluation result, one for each field set.

get_summary_metrics()¶: Get summary metrics for this evaluation result.

hydrate(obj)¶: Hydrate the state of an evaluation result from a serializable representation of it.

serialize()¶: Return the serializable representation of an evaluation result’s state.

tables()¶: Get summary metrics for this evaluation result, in table form.

macro.EvaluationRanking¶

class macro.EvaluationRanking(name, *args, **kwargs)¶

Class used to represent an evaluation ranking.

An evaluation ranking is a list of evaluation results (scored model predictions), sorted by a specified key.

To create an evaluation ranking, first collect all evaluation results that make sense to rank together, choose a sorting key, and use these to instantiate an EvaluationRanking.Definition instance. Then pass that definition instance to the EvaluationRanking.create() method:

evaluation_metric = EvaluationMetric.load("my_evaluation_metric")
data_set = DataSet.load('my_data_set')
evaluation_results = [
    evaluation_result for evaluation_result in get_resources_by_type('evaluation_result')
    if evaluation_result.definition.evaluation_metric is evaluation_metric
    and evaluation_result.definition.label_data_set is data_set
]
evaluation_ranking = EvaluationRanking.create(
    "my_evaluation_ranking",
    EvaluationRanking.Definition(
        evaluation_results,
        sorting_key=lambda summary_metrics: summary_metrics['score'],
        ascending=False,
    ),
)

To use a previously-defined evaluation ranking, load it in the following way:

evaluation_ranking = EvaluationRanking.load("my_evaluation_ranking")

rankings = evaluation_ranking.get_rankings()

class Definition(evaluation_results, sorting_key=None, ascending=True)¶: Definition object used to define an EvaluationRanking.

get_rankings()¶

Return list of evaluation results, ranked based on sorting key in definition.

Each item in this list is a tuple of the form (summary_metrics, evaluation_result.name, evaluation_result.package).

hydrate(obj)¶: Hydrate the state of an evaluation ranking from a serializable representation of it.

serialize()¶: Return the serializable representation of an evaluation ranking’s state.

tables()¶

Returns the entries and their fields for this data schema in table form.

Ensures the data schema is computed before doing so.

macro.Render.Set¶

class macro.Render.Set(lazy_load_filename=None)¶

Object to encapsulate set of data to be passed to the view for rendering.

We use lazy_load_filename if we want to defer loading of its contents to later.

To create one, do the following:

render_set = Render.Set.create()
thumbnail = ...  # pick one of the render primitives
render_set.add_thumbnail(thumbnail)

To add additional details for a given datapoint, do the following:

render_set.set_details(dict(
    component=...,
    title=...,
    path=...,
))

add_thumbnail(thumbnail)¶: Add a thumbnail to this render set.

classmethod create()¶: Create an empty render set.

ensure_loaded()¶: Ensure that this render set is loaded, if we are doing lazy loading.

get_details()¶: Get the details for the detailed view of this render set.

get_rendered_html()¶: Get the rendered html for displaying this render set.

get_thumbnails()¶: Get all thumbnails of this render set.

classmethod read_from_file(filename)¶: Read the contents of this render set from the given filename.

set_details(details)¶: Set the details for the detailed view of this render set.

write_to_file(filename)¶: Write the contents of this render set to the given filename.

macro.Render.Text¶

class macro.Render.Text(text, label)¶

Representation of a thumbnail text, a rendering primitive.

To create one, do the following:

# text can be any string
render_text = Render.Text(text, label='text')
render_set.add_thumbnail(render_text)

macro.Render.Image¶

class macro.Render.Image(image, label=None, max_height=160, max_width=240)¶

Representation of a thumbnail image, a rendering primitive.

To create one, do the following:

# image is a (H, W, 3) numpy array.
render_image = Render.Image(image, label='text')
render_set.add_thumbnail(render_image)

macro.Render.Animation¶

class macro.Render.Image(image, label=None, max_height=160, max_width=240)¶

Representation of a thumbnail image, a rendering primitive.

To create one, do the following:

# image is a (H, W, 3) numpy array.
render_image = Render.Image(image, label='text')
render_set.add_thumbnail(render_image)

macro.Render.Table¶

class macro.Render.Table(data, label=None, column_labels=None, row_labels=None)¶

Representation of a thumbnail table, a rendering primitive..

To create one, do the following:

# data is a list of rows, each containing a list of column data.
render_table = Render.Table(data, label='text')
render_set.add_thumbnail(render_table)

macro.FieldDescriptor¶

macro.FieldSet¶

macro.DataBlob¶

macro.DataSchema¶

macro.DataSchemaMapper¶

macro.DataSet¶

macro.DataSetMapper¶

macro.DataVisualizer¶

macro.DataVisualization¶

macro.ModelTrainer¶

macro.ModelWrapper¶

macro.ModelServer¶

macro.EvaluationMetric¶

macro.EvaluationResult¶

macro.EvaluationRanking¶

macro.Render.Set¶

macro.Render.Text¶

macro.Render.Image¶

macro.Render.Animation¶

macro.Render.TagList¶

macro.Render.Table¶