Model Generator prediction (batch)

Overview

This BLOCK uses a trained model from a Model Generator with input data to make predictions. It’s designed for making predictions with collections of large amounts of input data.

It supports the following four types of Model Generator trainings:

Classification
Regression
Image classification
Object detection

This BLOCK takes a comparatively longer time for predictions than the Model Generator prediction (online) BLOCK. However, it is more efficient at making predictions when using large amounts of data.

The BLOCK performs batch predictions for regression and classification models by reading input variable data from a text file stored in Google Cloud Storage (GCS). It outputs the results as text files to a GCS folder. As a general rule, it splits the results into multiple files.

For the image classification model, the BLOCK either reads image files stored in GCS or a JSON file that contains data for images. It outputs the results in the same manner as with the regression and classification models.

You must first click Apply for a training on a Model Generator beforehand you can make predictions with this BLOCK.

How to prepare the input data

Classification and regression

Prepare your input data as a JSON format text file like the following:

{"key": "1", "sepal_length": 5.9, "sepal_width": 3.0, "petal_length": 4.2, "petal_width": 1.5}
{"key": "2", "sepal_length": 6.9, "sepal_width": 3.1, "petal_length": 5.4, "petal_width": 2.1}
{"key": "3", "sepal_length": 5.1, "sepal_width": 3.3, "petal_length": 1.7, "petal_width": 0.5}

Make each line a JSON object ({...}).
Separate objects with line breaks.
Gather input data for one instance into one JSON object.
JSON objects consist of name and value pairs.
- In each pair, the name and value are separated by a :.
- The name is on the left of the : and the value is on the right (name: value).
Values can be set as the following three types:
- Numbers: 1, 23.45, and the like (numerical value, month, and day data).
- Strings: "abc", "xyz", and the like. Strings are enclosed in " (Strings (enumerated) data).
- Arrays: [1, 2, 3], [4, 5.6, 7.0], and the like. Arrays can contain multiple numerical values enclosed between [ and ] (Numerical value with multiple dimensions and sequence data).
  This section explains how to set sequence data.
  
  Sequence data, like the image below for weather data for the past 7 days, has meaning in its ordering. This section explains how to set an array for this example sequence.
  
  To use this sequence data as part of an Model Generator’s training data, you would simple write it out with commas (,) separating each number, as shown below.
  
  To use sequence data in prediction input data, prepare is as an array.
```
"weather": [21, 35, 0, 20, 34, 0, . . . , 18, 32, 20]
```
  (This assumes the item name for the sequence data is weather)
Each JSON object (an instance of input data) should include a pair with the name "key". The value should be a string to identify that instance of input data.

The results of the prediction are output as JSON format text files to a GCS folder. The files are named automatically as shown below (the XXXXX and YYYYY change based on the number of files).

prediction.results-XXXXX-of-YYYYY

XXXXX: A number starting with 0 that represents the file’s index number (00000, 00001, etc.).
YYYYY: The total number of files (00001, 00003, etc.).

The following example shows prediction results for the classification model:

{"label_index": 2, "score": [9.230815578575857e-08, 0.007054927293211222, 0.9929450154304504], "key": "2", "label": "Iris-virginica"}

Each row contains one JSON object ({...}).
Rows are separated by line breaks.

Prediction results for each instance of input data are contained within individual JSON objects.

"label_index"	Shows which element of the "score" array the value of "label" represents. The elements in the array are ordered starting from 0.
"score"	The level of certainty for predicting each class. In this example, the certainty for class 0 is 0.000009231%, class 1 is 0.705492729%, and class 2 is 99.294501543%.
"key"	The value for the "key" used in the prediction input data.
"label"	The predicted class.

The following example shows prediction results for the regression model:

 {"output": 10304.1962890625, "key": "20170103"}

Each row contains one JSON object ({...}).
Rows are separated by line breaks.

Prediction results for each instance of input data are contained within individual JSON objects.

"output"	The predicted value.
"key"	The value for the "key" used in the prediction input data.

Image classification and object detection

There are two ways to make predictions with the image classification or image object detection models.

Specify image files stored in GCS
Specify a JSON file

Keep in mind the following in regards to image files:

File size: Images must be less than approximately 1.125 MB.
For object detection predictions: When making predictions for multiple images at once, all of the prediction images must be the same size.

Each method is explained below.

Making predictions with image files stored in GCS

This method is the simplest way to make predictions.

Upload a collection of image files into GCS in a single folder.
Designate this folder into the Input GCS URL property.
Make sure to include a / at the end of the GCS URL.
Images for predictions must be JPEG, PNG, or GIF format.
The file extension for the images can be .jpg, .jpeg, .png, or .gif.
The folder containing the uploaded images can also contain folders within it (see the following image).

Making predictions with a JSON file

This method makes predictions using a JSON file that contains Base64 encoded image data.

The JSON file must contain more than one JSON object with each object separated by line breaks. See the following example:
```
{"key": "samp01", "image": {"b64": "/9j/4....../2Q=="}}
{"key": "samp02", "image": {"b64": "/9j/4....../2Q=="}}
```

The JSON objects should be formatted as follows:

{"key": "key", "image": {"b64": "Base64 encoded image data"}}

*Set the red portions with information for your prediction images.

Name	Value
`"key"`	Designate a string to act as an identifying key for the prediction image.
`"b64"`	Designate the Base64 encoded data for the prediction image.

Images for predictions must be JPEG, PNG, or GIF format.
Upload the JSON file to GCS and enter its GCS URL into the Input GCS URL property.

Prediction results

The BLOCK outputs the prediction results as JSON format text files to a GCS folder. The files are named automatically as shown below (the XXXXX and YYYYY change based on the number of files).

prediction.results-XXXXX-of-YYYYY

XXXXX: A number starting with 0 that represents the file’s index number (00000, 00001, etc.).
YYYYY: The total number of files (00001, 00003, etc.).

The following shows example prediction results for the image classification model:

{"labels": ["cat", "dog"], "score": [1.0, 2.0886015139609526e-10], "key": "gs://my-bucket/images/sample_01.jpg", "label": "cat"}
{"labels": ["cat", "dog"], "score": [3.7939051367175125e-07, 0.9999996423721313], "key": "gs://my-bucket/images/sample_02.jpg", "label": "dog"}

Each row contains one JSON object ({...}).
Rows are separated by line breaks.

Prediction results for one image is contained in one JSON object.

"labels"	A list of the classes. In this case, there are two classes: "cat" and "dog". The order of the "labels" list matches the order in the "score" list that follows.
"score"	The certainty for predicting each class. In this example, the first value in "score" refers to "cat" and the second value in "score" refers to "dog".
"key"	If you made the prediction using image files uploaded to GCS, this will be the GCS URL for the image file. If you made the prediction using a JSON file, this will be the value for "key" as designated within that file.
"label"	The predicted class. This is the class that corresponds with the highest value within "score".

Properties

Property	Explanation
BLOCK name	Configure the name displayed on this BLOCK.
GCP service account	Select the GCP service account to use with this BLOCK.
Model Generator	Select the Model Generator to use for this prediction.
Input GCS URL	Classification and regression models: Designate the GCS URL for the text file containing the input data. For a GCS bucket named `blocks-sample` and a text file containing prediction input data named `sample.json`, the Input GCS URL would be `gs://blocks-sample/sample.json`. Image classification models: Designate the GCS URL for either a folder containing the image files for the prediction, or the JSON file containing the collected data for the prediction images. Image files within a folder in GCS: For a bucket named `blocks-sample` that contains a folder named `images`, the Input GCS URL would be `gs://blocks-sample/images/`. Make sure the URL ends with a `/`. JSON file containing data for images: For a bucket named `blocks-sample` that contains a JSON file named `sample.json`, the Input GCS URL would be `gs://blocks-sample/sample.json`.
Output GCS URL	Designate a GCS URL for the folder that will contain the results of the prediction. For example, for the results to be stored into a folder named `results` within a bucket named `blocks-sample`, you would set this property to `gs://blocks-sample/results/`. The BLOCK will create a new folder automatically if the designated folder does not already exist. For existing folders, new files will overwrite older files if they have the same name.
Batch size	For batch predictions, the input feature data is saved into memory (buffered) before the prediction is performed. Designate the amount (number of records) of input feature data that will be buffered in this property. There is the possibility that increasing the batch size will improve the speed of the prediction. However, the amount of memory used also increases, so there is also the possibility that the prediction will fail due to a memory shortage. Because of this, you will need to set an appropriate batch size for your input feature data that will not cause a memory shortage.
Version used for predictions	Select whether to use the Production (current) or Testing (preview) version.

BLOCKS Reference

Machine Learning