whisper-tiny-en Beta
Automatic Speech Recognition • OpenAIWhisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalize to many datasets and domains without the need for fine-tuning. This is the English-only version of the Whisper Tiny model which was trained on the task of speech recognition.
Usage
Workers - TypeScript
export interface Env { AI: Ai;}
export default { async fetch(request, env): Promise<Response> { const res = await fetch( "https://github.com/Azure-Samples/cognitive-services-speech-sdk/raw/master/samples/cpp/windows/console/samples/enrollment_audio_katie.wav" ); const blob = await res.arrayBuffer();
const input = { audio: [...new Uint8Array(blob)], };
const response = await env.AI.run( "@cf/openai/whisper-tiny-en", input );
return Response.json({ input: { audio: [] }, response }); },} satisfies ExportedHandler<Env>;
curl
curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run/@cf/openai/whisper-tiny-en \ -X POST \ -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \ --data-binary "@talking-llama.mp3"
Parameters
Input
-
0
string -
1
object-
audio
arrayAn array of integers that represent the audio data constrained to 8-bit unsigned integer values
-
items
numberA value between 0 and 255
-
-
source_lang
stringThe language of the recorded audio
-
target_lang
stringThe language to translate the transcription into. Currently only English is supported.
-
Output
-
text
stringThe transcription
-
word_count
number -
words
array-
items
object-
word
string -
start
numberThe second this word begins in the recording
-
end
numberThe ending second when the word completes
-
-
-
vtt
string
API Schemas
The following schemas are based on JSON Schema
{ "oneOf": [ { "type": "string", "format": "binary" }, { "type": "object", "properties": { "audio": { "type": "array", "description": "An array of integers that represent the audio data constrained to 8-bit unsigned integer values", "items": { "type": "number", "description": "A value between 0 and 255" } }, "source_lang": { "type": "string", "description": "The language of the recorded audio" }, "target_lang": { "type": "string", "description": "The language to translate the transcription into. Currently only English is supported." } }, "required": [ "audio" ] } ]}
{ "type": "object", "contentType": "application/json", "properties": { "text": { "type": "string", "description": "The transcription" }, "word_count": { "type": "number" }, "words": { "type": "array", "items": { "type": "object", "properties": { "word": { "type": "string" }, "start": { "type": "number", "description": "The second this word begins in the recording" }, "end": { "type": "number", "description": "The ending second when the word completes" } } } }, "vtt": { "type": "string" } }, "required": [ "text" ]}