ScoutBot API¶
ScoutBot is the machine learning interface for the Wild Me Scout project. This page specifies the Python API to interact with all of the algorithms and machine learning models that have been pretrained for inference in a production environment.
Overview¶
In general, the structure of this API is to expose four main processing components for the Scout project.
These components are, in order: TILE, WIC, LOC, and AGG.
TILE: A module to convert images to tiles
WIC: A module to classify tiles as relevant for further processing (i.e., does it likely have an elephant?)
LOC: A module to detect elephants in tiles
AGG: A module to aggregate the tile-level detections back onto the original image
The TILE step and AGG steps are heuristic-based algorithms and do not need to use any
machine learning (ML) models or GPU offload. In contrast, the WIC and LOC steps both require
their own ML models and can be computed on CPU or GPU (if available).
The non-ML components (TILE and AGG) both expose compute() functions, which is the single
point of interaction as the developer:
The ML components (WIC and LOC), in contrast, is a bit more complex and exposes three functions:
pre()(preprocessing)
predict()(inference)
post()(postprocessing)
For the WIC, these functions are:
and for the LOC, these functions are:
Environment Variables¶
The Scoutbot API and CLI have two environment variables (envars) that allow you to configure global settings and configurations.
CONFIG(default: mvp)The configuration setting for which machine lerning models to use. Must be one of
phase1ormvp, or their respective aliases asoldornew.
WIC_CONFIG(default: not set)The configuration setting for which machine lerning models to use with the WIC. Must be one of
phase1ormvp, or their respective aliases asoldornew. Defaults to the value of theCONFIGenvironment variable.
LOC_CONFIG(default: not set)The configuration setting for which machine lerning models to use with the LOC. Must be one of
phase1ormvp, or their respective aliases asoldornew. Defaults to the value of theCONFIGenvironment variable.
AGG_CONFIG(default: not set)The configuration setting for which machine lerning models to use with the AGG. Must be one of
phase1ormvp, or their respective aliases asoldornew. Defaults to the value of theCONFIGenvironment variable.
WIC_BATCH_SIZE(default: 256)The configuration setting for how many tiles to send to the GPU in a single batch during the WIC prediction (forward inference). The LOC model has a fixed batch size (16 for
phase1and 32 formvp) and cannot be adjusted. This setting can be used to control how fast the pipeline runs, as a trade-off of faster compute for more memory usage. It is highly suggested to set this value as high as possible to fit into the GPU.
FAST(default: not set)A flag that can be set to turn off extracting the second grid of tiles. Defaults to “not set”, which translates to the standard process of extracting all tiles for grid1 and grid2. Setting this value to anything will turn off grid2 and results in faster (but less accurate) detections (e.g.,
FAST=1).
VERBOSE(default: not set)A verbosity flag that can be set to turn on debug logging. Defaults to “not set”, which translates to no debug logging. Setting this value to anything will turn on debug logging (e.g.,
VERBOSE=1).
CDN Model Download (ONNX)¶
All of the machine learning models are hosted on GitHub as LFS files. The two modules (WIC and LOC)
however need those files downloaded to the local machine prior to running inference. These models are
hosted on a separate CDN for convenient access and can be fetched by running the following functions:
To pre-download the models for a specific config (e.g., mvp), you can specify an optional config:
scoutbot.wic.fetch(config="mvp")
scoutbot.loc.fetch(config="mvp")
These functions will download the following files and will store them in your Operating System’s default cache folder:
- Phase 1:
phase1
- WIC:
https://wildbookiarepository.azureedge.net/models/scout.wic.5fbfff26.3.0.onnx(81MB)SHA256 checksum:
cbc7f381fa58504e03b6510245b6b2742d63049429337465d95663a6468df4c1
- LOC:
https://wildbookiarepository.azureedge.net/models/scout.loc.5fbfff26.0.onnx(194M)SHA256 checksum:
85a9378311d42b5143f74570136f32f50bf97c548135921b178b46ba7612b216
- MVP:
mvp
- WIC:
https://wildbookiarepository.azureedge.net/models/scout.wic.mvp.2.0.onnx(97MB)SHA256 checksum:
3ff3a192803e53758af5e112526ba9622f1dedc55e2fa88850db6f32af160f32
- LOC:
https://wildbookiarepository.azureedge.net/models/scout.loc.mvp.0.onnx(194M)SHA256 checksum:
f5bd22fbacc91ba4cf5abaef5197d1645ae5bc4e63e88839e6848c48b3710c58
Supported Objects of Interest¶
The ONNX models are pre-configured to support a specific batch size and will predict specific species in
the final detection results. The input sizes are defined explicitly when they cannot be changed, but the
WIC model’s inputs can be balanced using the environment variable WIC_BATCH_SIZE. The outputs of
the pipeline is a collection of bounding boxes, confidence values, and class labels. Some of the labels
are not clean and are mapped, for convience, when the final detection labels are created. Below are the
supported species for each model:
- Phase 1:
phase1
elephant_savanna
mapped to: elephant
- MVP:
mvp
buffalo
camel
canoe
car
cow
crocodile
dead_animalwhite_bones
mapped to: white_bones
deadbones
mapped to: white_bones
eland
elecarcass_old
mapped to: white_bones
elephant
gazelle_gr
mapped to: gazelle_grants
gazelle_grants
gazelle_th
mapped to: gazelle_thomsons
gazelle_thomsons
gerenuk
giant_forest_hog
giraffe
goat
hartebeest
hippo
impala
kob
kudu
motorcycle
oribi
oryx
ostrich
roof_grass
roof_mabati
sheep
test
topi
vehicle
warthog
waterbuck
white_bones
wildebeest
zebra
All species above that are highlighted in green have an Average Precision (AP) of at least 50%. The other species are supported in a preliminary sense and should not be heavily relied on.
Tiles (TILE)¶
- scoutbot.tile.compute(img_filepath, grid1=True, grid2=True, ext=None, **kwargs)[source]¶
Compute the tiles for a given input image and saves them to disk.
If a given tile has already been rendered to disk, it will not be recomputed.
- Parameters:
img_filepath (str) – image filepath (relative or absolute) to compute tiles for.
grid1 (bool, optional) – If
True, create a dense grid of tiles on the image. Defaults toTrue.grid2 (bool, optional) – If
True, create a secondary dense grid of tiles on the image with a 50% offset. Defaults toFalse. Can be disabled by setting the environment variableFAST=1.ext (str, optional) – The file extension of the resulting tile files. If this value is not specified, it will use the same extension as img_filepath. Passed as input to
scoutbot.tile.tile_filepath(). Defaults toNone.**kwargs – keyword arguments passed to
scoutbot.tile.tile_grid()
- Returns:
the original image’s shape as
(h, w, c).list of grid coordinates as the output of
scoutbot.tile.tile_grid().list of tile filepaths as the output of
scoutbot.tile.tile_filepath().
- Return type:
- scoutbot.tile.tile_filepath(img_filepath, grid, ext=None)[source]¶
Returns a suggested filepath for a tile given the original image filepath and the tile’s grid coordinates.
- Parameters:
img_filepath (str) – image filepath (relative or absolute)
grid (dict) – a dictionary of one grid coordinate, one output of
scoutbot.tile.tile_grid()ext (str, optional) – The file extension of the resulting tile files. If this value is not specified, it will use the same extension as img_filepath. Defaults to
None.
- Returns:
the suggested absolute filepath to store the tile
- Return type:
- scoutbot.tile.tile_grid(shape, size=(256, 256), overlap=64, offset=0, borders=True)[source]¶
Calculates a grid of tile coordinates for a given image.
The final output is a list of lists of dictionaries, each representing a single tile coordinate. Each dictionary has a structure with the following keys:
{ 'x': x_top_left (int) 'y': y_top_left (int) 'w': width (int) 'h': height (int) 'b': border (bool) }
The
x,y,w,hbounding box keys are in real pixel values.The
bkey isTrueif the grid coordinate is on the border of the image.- Parameters:
shape (tuple) – the image’s shape as
(h, w, c)or(h, w)size (tuple, optional) – the tile’s shape as
(w, h)overlap (int, optional) – The amount of pixel overlap between each tile, for both the x-axis and the y-axis.
offset (int, optional) – The amount of pixel offset for the entire grid
borders (bool, optional) – If
True, include a set of border-only tiles. Defaults toTrue.
- Returns:
a list of grid coordinate dictionaries
- Return type:
- scoutbot.tile.tile_write(img, grid, filepath)[source]¶
Write a single image’s tile to disk using its grid coordinates and an output path.
- Parameters:
img (numpy.ndarray) – 3-dimentional Numpy array, the return from
cv2.imread()grid (dict) – the grid coordinate dictionary, one of the returned dictionaries from
scoutbot.tile.tile_grid()filepath (str) – the tile’s full output filepath (relative or absolute)
- Returns:
returns
Trueif the tile’s filepath exists on disk.- Return type:
Whole-Image Classifier (WIC)¶
The Whole Image Classifier (WIC) returns confidence scores for image tiles.
This module defines how WIC models are downloaded from an external CDN, how to load an image and prepare it for inference, demonstrates how to run the WIC ONNX model on this input, and finally how to convert this raw CNN output into usable confidence scores.
- scoutbot.wic.fetch(pull=False, config='mvp')[source]¶
Fetch the WIC ONNX model file from a CDN if it does not exist locally.
This function will throw an AssertionError if the download fails or the file otherwise does not exists locally on disk.
- Parameters:
- Returns:
local ONNX model file path.
- Return type:
- Raises:
AssertionError – If the model cannot be fetched.
- scoutbot.wic.post(gen)[source]¶
Apply a post-processing normalization of the raw ONNX network outputs.
The final output is a dictionary where the key values are the predicted labels and the values are their corresponding confidence values.
- Parameters:
gen (generator) – generator of batches of raw ONNX model outputs, the return of
scoutbot.wic.predict()- Returns:
list of WIC predictions
- Return type:
- scoutbot.wic.pre(inputs, batch_size=256, config='mvp')[source]¶
Load a list of filepaths and return a corresponding list of the image data as a 4-D list of floats. The image data is loaded from disk, transformed as needed, and is normalized to the input ranges that the WIC ONNX model expects.
This function will throw an error if any of the filepaths do not exist.
- Parameters:
inputs (list(str)) – list of tile image filepaths (relative or absolute)
batch_size (int, optional) – the maximum number of images to load in a single batch. Defaults to the environment variable
WIC_BATCH_SIZE.config (str or None, optional) – the configuration to use, one of
phase1ormvp. Defaults toNone.
- Returns:
generator ->
list of transformed image data with shape
(b, c, w, h)
model configuration
- Return type:
generator ( np.ndarray<np.float32>, str )
- scoutbot.wic.predict(gen)[source]¶
Run neural network inference using the WIC’s ONNX model on preprocessed data.
- Parameters:
gen (generator) – generator of batches of transformed image data, the return of
scoutbot.wic.pre()- Returns:
generator ->
list of raw ONNX model outputs as shape
(b, n)
model configuration
- Return type:
generator ( np.ndarray<np.float32>, str )
Localizer (LOC)¶
The localizer (LOC) returns bounding box detections on image tiles.
This module defines how Localizer models are downloaded from an external CDN, how to load an image and prepare it for inference, demonstrates how to run the Localization ONNX model on this input, and finally how to convert this raw CNN output into usable detection bounding boxes with class labels and confidence scores.
- scoutbot.loc.fetch(pull=False, config='mvp')[source]¶
Fetch the Localizer ONNX model file from a CDN if it does not exist locally.
This function will throw an AssertionError if the download fails or the file otherwise does not exists locally on disk.
- Parameters:
- Returns:
local ONNX model file path.
- Return type:
- Raises:
AssertionError – If the model cannot be fetched.
- scoutbot.loc.post(gen, loc_thresh=None, nms_thresh=None)[source]¶
Apply a post-processing normalization of the raw ONNX network outputs.
The final output is a list of lists of dictionaries, each representing a single detection. Each dictionary has a structure with the following keys:
{ 'l': class_label (str) 'c': confidence (float) 'x': x_top_left (float) 'y': y_top_left (float) 'w': width (float) 'h': height (float) }
The
llabel is the string class as used when the original ONNX model was trained.The
cconfidence value is a bounded float between0.0and1.0(inclusive), but should not be treated as a probability.The
x,y,w,hbounding box keys are in real pixel values.- Parameters:
gen (generator) – generator of batches of raw ONNX model outputs and sizes, the return of
scoutbot.loc.predict()loc_thresh (float or None, optional) – the confidence threshold for the localizer’s predictions. Defaults to None. Defaults to
None.nms_thresh (float or None, optional) – the non-maximum suppression (NMS) threshold for the localizer’s predictions. Defaults to
None.
- Returns:
nested list of Localizer predictions
- Return type:
- scoutbot.loc.pre(inputs, config='mvp')[source]¶
Load a list of filepaths and return a corresponding list of the image data as a 4-D list of floats. The image data is loaded from disk, transformed as needed, and is normalized to the input ranges that the Localizer ONNX model expects.
This function will throw an error if any of the filepaths do not exist.
- scoutbot.loc.predict(gen)[source]¶
Run neural network inference using the Localizer’s ONNX model on preprocessed data.
- Parameters:
gen (generator) – generator of batches of transformed image data, the return of
scoutbot.loc.pre()- Returns:
generator ->
list of raw ONNX model outputs as shape
(b, n)
list of each tile’s original size
model configuration
- Return type:
generator ( np.ndarray<np.float32>, list ( tuple ( int ) ), str )
Aggregation (AGG)¶
Aggregation (AGG) returns unified detects for an image given its individual tile detections
This module defines how the tile-base localization detection results are aggregated at the image level. This includes the ability to weight the importance of detections along the border of each tile within an image, and performing non-maximum suppression (NMS) on the combined results.
- scoutbot.agg.compute(img_shape, tile_grids, loc_outputs, config=None, agg_thresh=None, nms_thresh=None)[source]¶
Compute the aggregated image-level detection results for a given list of tile-level detections.
- Parameters:
img_shape (tuple) – a tuple of the image shape as
h, w, corh, wloc_output (list of list of dict) – the output predictions from the Localizer.
config (str or None, optional) – the configuration to use, one of
phase1ormvp. Defaults toNone.agg_thresh (float or None, optional) – the confidence threshold for the aggregated localizer predictions. Defaults to None. Defaults to
None.nms_thresh (float or None, optional) – the non-maximum suppression (NMS) threshold for the aggregated localizer’s predictions. Defaults to
None.
- Returns:
list of Localizer predictions
- Return type:
- scoutbot.agg.demosaic(img_shape, tile_grids, loc_outputs, margin=32.0)[source]¶
Demosaics a list of tiles and their respective detections back into the original image’s coordinate system.
- Parameters:
- Returns:
list of Localizer predictions
- Return type:
- scoutbot.agg.iou(box1, box2)[source]¶
Computes the IoU (Intersection over Union) ratio for two bounding boxes.
Each box dictionary must have a structure with the following keys:
{ 'xtl': x_top_left (int) 'ytl': y_top_left (int) 'xbr': x_bottom_right (int) 'ybr': y_bottom_right (int) }
The
(xtl, ytl)coordinate is the top-left corner of the box.The
(xbr, ybr)coordinate is the opposite bottom-right corner of the box.The order of the boxes does not impact the calculation of the intersection and union values.
- Parameters:
- Returns:
the pixel area of the first box
the pixel area of the second box
the pixel area of the intersection (overlapping area) between the boxes
the pixel area of the union (combined area) between the boxes
- Return type:
Pipeline (PIPE)¶
The above components must be run in the correct order, but ScoutbBot also offers a single pipeline.
All of the ML models can be pre-downloaded and fetched in a single call to scoutbot.fetch() and
the unified pipeline – which uses the 4 components correctly – can be run by the function
scoutbot.pipeline(). Below is example code for how these components interact.
Furthermore, there are two application demo files (app.py and app2.py) that shows
how the entire pipeline can be run on tiles or images, respectively.
# Get image filepath
filepath = '/path/to/image.ext'
config = 'mvp'
# Run tiling
img_shape, tile_grids, tile_filepaths = tile.compute(filepath)
# Run WIC
wic_outputs = wic.post(wic.predict(wic.pre(
tile_filepaths,
config=config,
# batch_size=wic_batch_size, # Optional override of config
)))
# Threshold for WIC
flags = [wic_output.get('positive') >= wic_thresh for wic_output in wic_outputs]
loc_tile_grids = ut.compress(tile_grids, flags)
loc_tile_filepaths = ut.compress(tile_filepaths, flags)
# Run localizer
loc_outputs = loc.post(
loc.predict(
loc.pre(loc_tile_filepaths, config=config)
),
# loc_thresh=loc_thresh, # Optional override of config
# nms_thresh=loc_nms_thresh, # Optional override of config
)
# Run Aggregation and get final detections
detects = agg.compute(
img_shape,
loc_tile_grids,
loc_outputs,
config=config,
# agg_thresh=agg_thresh, # Optional override of config
# nms_thresh=agg_nms_thresh, # Optional override of config
)
- scoutbot.__init__.batch(filepaths, config=None, wic_thresh=0.07, loc_thresh=0.38, loc_nms_thresh=0.6, agg_thresh=0.0, agg_nms_thresh=0.8, clean=True)[source]¶
Run the ML pipeline on a given batch of image filepaths and return the detections in a corresponding list. The output is a list of outputs matching the output of
scoutbot.pipeline(), except the processing is done in batch and is much faster.The final output is a list of lists of dictionaries, each representing a single detection. Each dictionary has a structure with the following keys:
{ 'l': class_label (str) 'c': confidence (float) 'x': x_top_left (float) 'y': y_top_left (float) 'w': width (float) 'h': height (float) }
- Parameters:
filepaths (list) – list of str image filepath (relative or absolute)
config (str or None, optional) – the configuration to use, one of
phase1ormvp. Defaults toNone.wic_thresh (float or None, optional) – the confidence threshold for the WIC’s predictions. Defaults to the default configuration setting.
loc_thresh (float or None, optional) – the confidence threshold for the localizer’s predictions. Defaults to the default configuration setting.
nms_thresh (float or None, optional) – the non-maximum suppression (NMS) threshold for the localizer’s predictions. Defaults to the default configuration setting.
agg_thresh (float or None, optional) – the confidence threshold for the aggregated localizer predictions. Defaults to the default configuration setting.
agg_nms_thresh (float or None, optional) – the non-maximum suppression (NMS) threshold for the aggregated localizer’s predictions. Defaults to the default configuration setting.
clean (bool, optional) – a flag to clean up any on-disk tiles that were generated. Defaults to
True.
- Returns:
corresponding list of wic scores, corresponding list of lists of predictions
- Return type:
- scoutbot.__init__.example()[source]¶
Run the pipeline on an example image with the default configuration
- scoutbot.__init__.fetch(pull=False, config=None)[source]¶
Fetch the WIC and Localizer ONNX model files from a CDN if they do not exist locally.
This function will throw an AssertionError if either download fails or the files otherwise do not exist locally on disk.
- Parameters:
- Returns:
None
- Raises:
AssertionError – If any model cannot be fetched.
- scoutbot.__init__.pipeline(filepath, config=None, wic_thresh=0.07, loc_thresh=0.38, loc_nms_thresh=0.6, agg_thresh=0.0, agg_nms_thresh=0.8, clean=True)[source]¶
Run the ML pipeline on a given image filepath and return the detections
The final output is a list of dictionaries, each representing a single detection. Each dictionary has a structure with the following keys:
{ 'l': class_label (str) 'c': confidence (float) 'x': x_top_left (float) 'y': y_top_left (float) 'w': width (float) 'h': height (float) }
- Parameters:
filepath (str) – image filepath (relative or absolute)
config (str or None, optional) – the configuration to use, one of
phase1ormvp. Defaults toNone.wic_thresh (float or None, optional) – the confidence threshold for the WIC’s predictions. Defaults to the default configuration setting.
loc_thresh (float or None, optional) – the confidence threshold for the localizer’s predictions. Defaults to the default configuration setting.
nms_thresh (float or None, optional) – the non-maximum suppression (NMS) threshold for the localizer’s predictions. Defaults to the default configuration setting.
agg_thresh (float or None, optional) – the confidence threshold for the aggregated localizer predictions. Defaults to the default configuration setting.
agg_nms_thresh (float or None, optional) – the non-maximum suppression (NMS) threshold for the aggregated localizer’s predictions. Defaults to the default configuration setting.
clean (bool, optional) – a flag to clean up any on-disk tiles that were generated. Defaults to
True.
- Returns:
wic score, list of predictions
- Return type:
Utilities¶
Scoutbot utilities file for common and handy functions.