Machine Learning
Speech recognition
Overview
This BLOCK uses the Google Cloud Speech-to-Text API to convert speech from audio files into text.
See Basic Guide > Hints > How to use the Speech recognition BLOCK for details on using the Speech recognition BLOCK.
warning Note for Self-Service Plan users:
The Google Cloud Speech-to-Text API must be enabled to use this BLOCK. For details, refer to Basic Guide > Hints > Enabling Google APIs
In order to effectively use this BLOCK, we suggest reading through Google’s Best Practices open_in_new guide for the Google Cloud Speech-to-Text API.
Properties
Property | Explanation |
---|---|
BLOCK name | Configure the name displayed on this BLOCK. |
GCP service account | Select the GCP service account to use with this BLOCK. |
Audio file URL | Designate the GCS URL where the audio file is stored. |
Variable |
Designate the variable that will store the resultant text data. Refer to Output specifications > Speech recognition for details. |
Encoding | Designate the encoding type of the audio file saved to the Audio file URL. The following encoding types may be used:
FLAC and LINEAR16 are recommended as the best encoding types for voice recognition. For further details, see the Basic Guide entry titled, Audio encoding for Google Cloud Speech-to-Text API. It explains more about each encoding type, and demonstrates how to convert them. |
Sampling rate |
Designate the sampling rate of the audio file saved to the Audio file URL. Sampling rates may range between 8,000 and 48,000 and are measured in Hertz (Hz). For best results, Google suggests using a 16,000 Hz sampling rate. |
Language code |
Designate the language to be detected from the audio file saved in the Audio file URL. For example, to detect American English, choose en-US. See Language Support open_in_new for a full list of possible language codes. |
BLOCK memos | Make notes about this BLOCK. |
Max alternatives |
When the audio data is converted into text, multiple recognition alternatives can be returned. This property sets the maximum number of these alternative results within a range of 0 to 30. Setting this to 0 or 1 will return a maximum of 1 alternative result. |
Profanity filter | Activating this property will turn on the profanity filter, thus removing any swear words from the resultant text data. |
Contextual word/phrase hints | Provide any words or phrases that might strengthen the Speech-to-Text API’s recognition accuracy. |