ScoutBot API¶

ScoutBot is the machine learning interface for the Wild Me Scout project. This page specifies the Python API to interact with all of the algorithms and machine learning models that have been pretrained for inference in a production environment.

Overview¶

In general, the structure of this API is to expose four main processing components for the Scout project. These components are, in order: TILE, WIC, LOC, and AGG.

TILE: A module to convert images to tiles

WIC: A module to classify tiles as relevant for further processing (i.e., does it likely have an elephant?)

LOC: A module to detect elephants in tiles

AGG: A module to aggregate the tile-level detections back onto the original image

The TILE step and AGG steps are heuristic-based algorithms and do not need to use any machine learning (ML) models or GPU offload. In contrast, the WIC and LOC steps both require their own ML models and can be computed on CPU or GPU (if available).

The non-ML components (TILE and AGG) both expose compute() functions, which is the single point of interaction as the developer:

scoutbot.tile.compute()

scoutbot.agg.compute()

The ML components (WIC and LOC), in contrast, is a bit more complex and exposes three functions:

pre() (preprocessing)

predict() (inference)

post() (postprocessing)

For the WIC, these functions are:

scoutbot.wic.pre()

scoutbot.wic.predict()

scoutbot.wic.post()

and for the LOC, these functions are:

scoutbot.loc.pre()

scoutbot.loc.predict()

scoutbot.loc.post()

Environment Variables¶

The Scoutbot API and CLI have two environment variables (envars) that allow you to configure global settings and configurations.

CONFIG (default: mvp)
The configuration setting for which machine lerning models to use. Must be one of phase1 or mvp, or their respective aliases as old or new.

WIC_CONFIG (default: not set)
The configuration setting for which machine lerning models to use with the WIC. Must be one of phase1 or mvp, or their respective aliases as old or new. Defaults to the value of the CONFIG environment variable.

LOC_CONFIG (default: not set)
The configuration setting for which machine lerning models to use with the LOC. Must be one of phase1 or mvp, or their respective aliases as old or new. Defaults to the value of the CONFIG environment variable.

AGG_CONFIG (default: not set)
The configuration setting for which machine lerning models to use with the AGG. Must be one of phase1 or mvp, or their respective aliases as old or new. Defaults to the value of the CONFIG environment variable.

WIC_BATCH_SIZE (default: 256)
The configuration setting for how many tiles to send to the GPU in a single batch during the WIC prediction (forward inference). The LOC model has a fixed batch size (16 for phase1 and 32 for mvp) and cannot be adjusted. This setting can be used to control how fast the pipeline runs, as a trade-off of faster compute for more memory usage. It is highly suggested to set this value as high as possible to fit into the GPU.

FAST (default: not set)
A flag that can be set to turn off extracting the second grid of tiles. Defaults to “not set”, which translates to the standard process of extracting all tiles for grid1 and grid2. Setting this value to anything will turn off grid2 and results in faster (but less accurate) detections (e.g., FAST=1).

VERBOSE (default: not set)
A verbosity flag that can be set to turn on debug logging. Defaults to “not set”, which translates to no debug logging. Setting this value to anything will turn on debug logging (e.g., VERBOSE=1).

CDN Model Download (ONNX)¶

All of the machine learning models are hosted on GitHub as LFS files. The two modules (WIC and LOC) however need those files downloaded to the local machine prior to running inference. These models are hosted on a separate CDN for convenient access and can be fetched by running the following functions:

scoutbot.wic.fetch()

scoutbot.loc.fetch()

To pre-download the models for a specific config (e.g., mvp), you can specify an optional config:

scoutbot.wic.fetch(config="mvp")

scoutbot.loc.fetch(config="mvp")

These functions will download the following files and will store them in your Operating System’s default cache folder:

Phase 1: phase1

WIC: https://wildbookiarepository.azureedge.net/models/scout.wic.5fbfff26.3.0.onnx (81MB)
SHA256 checksum: cbc7f381fa58504e03b6510245b6b2742d63049429337465d95663a6468df4c1

LOC: https://wildbookiarepository.azureedge.net/models/scout.loc.5fbfff26.0.onnx (194M)
SHA256 checksum: 85a9378311d42b5143f74570136f32f50bf97c548135921b178b46ba7612b216

MVP: mvp

WIC: https://wildbookiarepository.azureedge.net/models/scout.wic.mvp.2.0.onnx (97MB)
SHA256 checksum: 3ff3a192803e53758af5e112526ba9622f1dedc55e2fa88850db6f32af160f32

LOC: https://wildbookiarepository.azureedge.net/models/scout.loc.mvp.0.onnx (194M)
SHA256 checksum: f5bd22fbacc91ba4cf5abaef5197d1645ae5bc4e63e88839e6848c48b3710c58

Supported Objects of Interest¶

The ONNX models are pre-configured to support a specific batch size and will predict specific species in the final detection results. The input sizes are defined explicitly when they cannot be changed, but the WIC model’s inputs can be balanced using the environment variable WIC_BATCH_SIZE. The outputs of the pipeline is a collection of bounding boxes, confidence values, and class labels. Some of the labels are not clean and are mapped, for convience, when the final detection labels are created. Below are the supported species for each model:

Phase 1: phase1

elephant_savanna

mapped to: elephant

MVP: mvp

buffalo

camel

canoe

car

cow

crocodile

dead_animalwhite_bones

mapped to: white_bones

deadbones

mapped to: white_bones

eland

elecarcass_old

mapped to: white_bones

elephant

gazelle_gr

mapped to: gazelle_grants

gazelle_grants

gazelle_th

mapped to: gazelle_thomsons

gazelle_thomsons

gerenuk

giant_forest_hog

giraffe

goat

hartebeest

hippo

impala

kob

kudu

motorcycle

oribi

oryx

ostrich

roof_grass

roof_mabati

sheep

test

topi

vehicle

warthog

waterbuck

white_bones

wildebeest

zebra

All species above that are highlighted in green have an Average Precision (AP) of at least 50%. The other species are supported in a preliminary sense and should not be heavily relied on.

Tiles (TILE)¶

scoutbot.tile.compute(img_filepath, grid1=True, grid2=True, ext=None, **kwargs)[source]¶

Compute the tiles for a given input image and saves them to disk.

If a given tile has already been rendered to disk, it will not be recomputed.

Parameters:

img_filepath (str) – image filepath (relative or absolute) to compute tiles for.
grid1 (bool, optional) – If True, create a dense grid of tiles on the image. Defaults to True.
grid2 (bool, optional) – If True, create a secondary dense grid of tiles on the image with a 50% offset. Defaults to False. Can be disabled by setting the environment variable FAST=1.
ext (str, optional) – The file extension of the resulting tile files. If this value is not specified, it will use the same extension as img_filepath. Passed as input to scoutbot.tile.tile_filepath(). Defaults to None.
**kwargs – keyword arguments passed to scoutbot.tile.tile_grid()

Returns:

the original image’s shape as (h, w, c).
list of grid coordinates as the output of scoutbot.tile.tile_grid().
list of tile filepaths as the output of scoutbot.tile.tile_filepath().

Return type:

tuple ( tuple ( int ), list ( dict ), list ( str ) )

scoutbot.tile.tile_filepath(img_filepath, grid, ext=None)[source]¶

Returns a suggested filepath for a tile given the original image filepath and the tile’s grid coordinates.

Parameters:

img_filepath (str) – image filepath (relative or absolute)
grid (dict) – a dictionary of one grid coordinate, one output of scoutbot.tile.tile_grid()
ext (str, optional) – The file extension of the resulting tile files. If this value is not specified, it will use the same extension as img_filepath. Defaults to None.

Returns:

the suggested absolute filepath to store the tile

Return type:

str

scoutbot.tile.tile_grid(shape, size=(256, 256), overlap=64, offset=0, borders=True)[source]¶

Calculates a grid of tile coordinates for a given image.

The final output is a list of lists of dictionaries, each representing a single tile coordinate. Each dictionary has a structure with the following keys:

{
    'x': x_top_left (int)
    'y': y_top_left (int)
    'w': width (int)
    'h': height (int)
    'b': border (bool)
}

The x, y, w, h bounding box keys are in real pixel values.

The b key is True if the grid coordinate is on the border of the image.

Parameters:

shape (tuple) – the image’s shape as (h, w, c) or (h, w)
size (tuple, optional) – the tile’s shape as (w, h)
overlap (int, optional) – The amount of pixel overlap between each tile, for both the x-axis and the y-axis.
offset (int, optional) – The amount of pixel offset for the entire grid
borders (bool, optional) – If True, include a set of border-only tiles. Defaults to True.

Returns:

a list of grid coordinate dictionaries

Return type:

list ( dict )

scoutbot.tile.tile_write(img, grid, filepath)[source]¶

Write a single image’s tile to disk using its grid coordinates and an output path.

Parameters:

img (numpy.ndarray) – 3-dimentional Numpy array, the return from cv2.imread()
grid (dict) – the grid coordinate dictionary, one of the returned dictionaries from scoutbot.tile.tile_grid()
filepath (str) – the tile’s full output filepath (relative or absolute)

Returns:

returns True if the tile’s filepath exists on disk.

Return type:

bool

Whole-Image Classifier (WIC)¶

The Whole Image Classifier (WIC) returns confidence scores for image tiles.

This module defines how WIC models are downloaded from an external CDN, how to load an image and prepare it for inference, demonstrates how to run the WIC ONNX model on this input, and finally how to convert this raw CNN output into usable confidence scores.

scoutbot.wic.fetch(pull=False, config='mvp')[source]¶

Fetch the WIC ONNX model file from a CDN if it does not exist locally.

This function will throw an AssertionError if the download fails or the file otherwise does not exists locally on disk.

Parameters:

pull (bool, optional) – If True, force using the downloaded versions stored in the local system’s cache. Defaults to False.
config (str or None, optional) – the configuration to use, one of phase1 or mvp. Defaults to None.

Returns:

local ONNX model file path.

Return type:

str

Raises:

AssertionError – If the model cannot be fetched.

scoutbot.wic.post(gen)[source]¶

Apply a post-processing normalization of the raw ONNX network outputs.

The final output is a dictionary where the key values are the predicted labels and the values are their corresponding confidence values.

Parameters:: gen (generator) – generator of batches of raw ONNX model outputs, the return of scoutbot.wic.predict()
Returns:: list of WIC predictions
Return type:: list ( dict )

scoutbot.wic.pre(inputs, batch_size=256, config='mvp')[source]¶

Load a list of filepaths and return a corresponding list of the image data as a 4-D list of floats. The image data is loaded from disk, transformed as needed, and is normalized to the input ranges that the WIC ONNX model expects.

This function will throw an error if any of the filepaths do not exist.

Parameters:

inputs (list(str)) – list of tile image filepaths (relative or absolute)
batch_size (int, optional) – the maximum number of images to load in a single batch. Defaults to the environment variable WIC_BATCH_SIZE.
config (str or None, optional) – the configuration to use, one of phase1 or mvp. Defaults to None.

Returns:

generator ->
- list of transformed image data with shape (b, c, w, h)
- model configuration

Return type:

generator ( np.ndarray<np.float32>, str )

scoutbot.wic.predict(gen)[source]¶

Run neural network inference using the WIC’s ONNX model on preprocessed data.

Parameters:

gen (generator) – generator of batches of transformed image data, the return of scoutbot.wic.pre()

Returns:

generator ->
- list of raw ONNX model outputs as shape (b, n)
- model configuration

Return type:

generator ( np.ndarray<np.float32>, str )

Localizer (LOC)¶

The localizer (LOC) returns bounding box detections on image tiles.

This module defines how Localizer models are downloaded from an external CDN, how to load an image and prepare it for inference, demonstrates how to run the Localization ONNX model on this input, and finally how to convert this raw CNN output into usable detection bounding boxes with class labels and confidence scores.

scoutbot.loc.fetch(pull=False, config='mvp')[source]¶

Fetch the Localizer ONNX model file from a CDN if it does not exist locally.

This function will throw an AssertionError if the download fails or the file otherwise does not exists locally on disk.

Parameters:

pull (bool, optional) – If True, force using the downloaded versions stored in the local system’s cache. Defaults to False.
config (str or None, optional) – the configuration to use, one of phase1 or mvp. Defaults to None.

Returns:

local ONNX model file path.

Return type:

str

Raises:

AssertionError – If the model cannot be fetched.

scoutbot.loc.post(gen, loc_thresh=None, nms_thresh=None)[source]¶

Apply a post-processing normalization of the raw ONNX network outputs.

The final output is a list of lists of dictionaries, each representing a single detection. Each dictionary has a structure with the following keys:

{
    'l': class_label (str)
    'c': confidence (float)
    'x': x_top_left (float)
    'y': y_top_left (float)
    'w': width (float)
    'h': height (float)
}

The l label is the string class as used when the original ONNX model was trained.

The c confidence value is a bounded float between 0.0 and 1.0 (inclusive), but should not be treated as a probability.

The x, y, w, h bounding box keys are in real pixel values.

Parameters:

gen (generator) – generator of batches of raw ONNX model outputs and sizes, the return of scoutbot.loc.predict()
loc_thresh (float or None, optional) – the confidence threshold for the localizer’s predictions. Defaults to None. Defaults to None.
nms_thresh (float or None, optional) – the non-maximum suppression (NMS) threshold for the localizer’s predictions. Defaults to None.

Returns:

nested list of Localizer predictions

Return type:

list ( list ( dict ) )

scoutbot.loc.pre(inputs, config='mvp')[source]¶

Load a list of filepaths and return a corresponding list of the image data as a 4-D list of floats. The image data is loaded from disk, transformed as needed, and is normalized to the input ranges that the Localizer ONNX model expects.

This function will throw an error if any of the filepaths do not exist.

Parameters:

inputs (list(str)) – list of tile image filepaths (relative or absolute)
config (str or None, optional) – the configuration to use, one of phase1 or mvp. Defaults to None.

Returns:

generator ->
- list of transformed image data with shape (b, c, w, h)
- list of each tile’s original size
- trim index
- model configuration

Return type:

generator ( np.ndarray<np.float32>, list ( tuple ( int ) ), int, str )

scoutbot.loc.predict(gen)[source]¶

Run neural network inference using the Localizer’s ONNX model on preprocessed data.

Parameters:

gen (generator) – generator of batches of transformed image data, the return of scoutbot.loc.pre()

Returns:

generator ->
- list of raw ONNX model outputs as shape (b, n)
- list of each tile’s original size
- model configuration

Return type:

generator ( np.ndarray<np.float32>, list ( tuple ( int ) ), str )

Aggregation (AGG)¶

Aggregation (AGG) returns unified detects for an image given its individual tile detections

This module defines how the tile-base localization detection results are aggregated at the image level. This includes the ability to weight the importance of detections along the border of each tile within an image, and performing non-maximum suppression (NMS) on the combined results.

scoutbot.agg.compute(img_shape, tile_grids, loc_outputs, config=None, agg_thresh=None, nms_thresh=None)[source]¶

Compute the aggregated image-level detection results for a given list of tile-level detections.

Parameters:

img_shape (tuple) – a tuple of the image shape as h, w, c or h, w
tile_grids (list of dict) – a list of tile coordinates
loc_output (list of list of dict) – the output predictions from the Localizer.
config (str or None, optional) – the configuration to use, one of phase1 or mvp. Defaults to None.
agg_thresh (float or None, optional) – the confidence threshold for the aggregated localizer predictions. Defaults to None. Defaults to None.
nms_thresh (float or None, optional) – the non-maximum suppression (NMS) threshold for the aggregated localizer’s predictions. Defaults to None.

Returns:

list of Localizer predictions

Return type:

list ( dict )

scoutbot.agg.demosaic(img_shape, tile_grids, loc_outputs, margin=32.0)[source]¶

Demosaics a list of tiles and their respective detections back into the original image’s coordinate system.

Parameters:

img_shape (tuple) – a tuple of the image shape as h, w, c or h, w
tile_grids (list of dict) – a list of tile coordinates
loc_output (list of list of dict) – the output predictions from the Localizer.
margin (float, optional) – the margin of the image to weight predictions. Defaults to 32.0

Returns:

list of Localizer predictions

Return type:

list ( dict )

scoutbot.agg.iou(box1, box2)[source]¶

Computes the IoU (Intersection over Union) ratio for two bounding boxes.

Each box dictionary must have a structure with the following keys:

{
    'xtl': x_top_left (int)
    'ytl': y_top_left (int)
    'xbr': x_bottom_right (int)
    'ybr': y_bottom_right (int)
}

The (xtl, ytl) coordinate is the top-left corner of the box.

The (xbr, ybr) coordinate is the opposite bottom-right corner of the box.

The order of the boxes does not impact the calculation of the intersection and union values.

Parameters:

box1 (dict) – a dictionary of the first bounding box’s dimensions
box2 (dict) – a dictionary of the second bounding box’s dimensions

Returns:

the pixel area of the first box
the pixel area of the second box
the pixel area of the intersection (overlapping area) between the boxes
the pixel area of the union (combined area) between the boxes

Return type:

tuple ( int, int, int, int )

Pipeline (PIPE)¶

The above components must be run in the correct order, but ScoutbBot also offers a single pipeline.

All of the ML models can be pre-downloaded and fetched in a single call to scoutbot.fetch() and the unified pipeline – which uses the 4 components correctly – can be run by the function scoutbot.pipeline(). Below is example code for how these components interact.

Furthermore, there are two application demo files (app.py and app2.py) that shows how the entire pipeline can be run on tiles or images, respectively.

# Get image filepath
filepath = '/path/to/image.ext'
config = 'mvp'

# Run tiling
img_shape, tile_grids, tile_filepaths = tile.compute(filepath)

# Run WIC
wic_outputs = wic.post(wic.predict(wic.pre(
    tile_filepaths,
    config=config,
    # batch_size=wic_batch_size,  # Optional override of config
)))

# Threshold for WIC
flags = [wic_output.get('positive') >= wic_thresh for wic_output in wic_outputs]
loc_tile_grids = ut.compress(tile_grids, flags)
loc_tile_filepaths = ut.compress(tile_filepaths, flags)

# Run localizer
loc_outputs = loc.post(
    loc.predict(
        loc.pre(loc_tile_filepaths, config=config)
    ),
    # loc_thresh=loc_thresh,  # Optional override of config
    # nms_thresh=loc_nms_thresh,  # Optional override of config
)

# Run Aggregation and get final detections
detects = agg.compute(
    img_shape,
    loc_tile_grids,
    loc_outputs,
    config=config,
    # agg_thresh=agg_thresh,  # Optional override of config
    # nms_thresh=agg_nms_thresh,  # Optional override of config
)

scoutbot.__init__.batch(filepaths, config=None, wic_thresh=0.07, loc_thresh=0.38, loc_nms_thresh=0.6, agg_thresh=0.0, agg_nms_thresh=0.8, clean=True)[source]¶

Run the ML pipeline on a given batch of image filepaths and return the detections in a corresponding list. The output is a list of outputs matching the output of scoutbot.pipeline(), except the processing is done in batch and is much faster.

The final output is a list of lists of dictionaries, each representing a single detection. Each dictionary has a structure with the following keys:

{
    'l': class_label (str)
    'c': confidence (float)
    'x': x_top_left (float)
    'y': y_top_left (float)
    'w': width (float)
    'h': height (float)
}

Parameters:

filepaths (list) – list of str image filepath (relative or absolute)
config (str or None, optional) – the configuration to use, one of phase1 or mvp. Defaults to None.
wic_thresh (float or None, optional) – the confidence threshold for the WIC’s predictions. Defaults to the default configuration setting.
loc_thresh (float or None, optional) – the confidence threshold for the localizer’s predictions. Defaults to the default configuration setting.
nms_thresh (float or None, optional) – the non-maximum suppression (NMS) threshold for the localizer’s predictions. Defaults to the default configuration setting.
agg_thresh (float or None, optional) – the confidence threshold for the aggregated localizer predictions. Defaults to the default configuration setting.
agg_nms_thresh (float or None, optional) – the non-maximum suppression (NMS) threshold for the aggregated localizer’s predictions. Defaults to the default configuration setting.
clean (bool, optional) – a flag to clean up any on-disk tiles that were generated. Defaults to True.

Returns:

corresponding list of wic scores, corresponding list of lists of predictions

Return type:

tuple ( list ( float ), list ( list ( dict ) )

scoutbot.__init__.example()[source]¶: Run the pipeline on an example image with the default configuration

scoutbot.__init__.fetch(pull=False, config=None)[source]¶

Fetch the WIC and Localizer ONNX model files from a CDN if they do not exist locally.

This function will throw an AssertionError if either download fails or the files otherwise do not exist locally on disk.

Parameters:

pull (bool, optional) – If True, force using the downloaded versions stored in the local system’s cache. Defaults to False.
config (str or None, optional) – the configuration to use, one of phase1 or mvp. Defaults to None.

Returns:

None

Raises:

AssertionError – If any model cannot be fetched.

scoutbot.__init__.pipeline(filepath, config=None, wic_thresh=0.07, loc_thresh=0.38, loc_nms_thresh=0.6, agg_thresh=0.0, agg_nms_thresh=0.8, clean=True)[source]¶

Run the ML pipeline on a given image filepath and return the detections

The final output is a list of dictionaries, each representing a single detection. Each dictionary has a structure with the following keys:

{
    'l': class_label (str)
    'c': confidence (float)
    'x': x_top_left (float)
    'y': y_top_left (float)
    'w': width (float)
    'h': height (float)
}

Parameters:

filepath (str) – image filepath (relative or absolute)
config (str or None, optional) – the configuration to use, one of phase1 or mvp. Defaults to None.
wic_thresh (float or None, optional) – the confidence threshold for the WIC’s predictions. Defaults to the default configuration setting.
loc_thresh (float or None, optional) – the confidence threshold for the localizer’s predictions. Defaults to the default configuration setting.
nms_thresh (float or None, optional) – the non-maximum suppression (NMS) threshold for the localizer’s predictions. Defaults to the default configuration setting.
agg_thresh (float or None, optional) – the confidence threshold for the aggregated localizer predictions. Defaults to the default configuration setting.
agg_nms_thresh (float or None, optional) – the non-maximum suppression (NMS) threshold for the aggregated localizer’s predictions. Defaults to the default configuration setting.
clean (bool, optional) – a flag to clean up any on-disk tiles that were generated. Defaults to True.

Returns:

wic score, list of predictions

Return type:

tuple ( float, list ( dict ) )

Utilities¶

Scoutbot utilities file for common and handy functions.

scoutbot.utils.init_logging()[source]¶: Setup Python’s built in logging functionality with on-disk logging, and prettier logging with Rich