How to Call OpenAI Whisper API from Power Automate (HTTP Action)

This time, I tried calling OpenAI’s Whisper (Speech to Text) via Power Automate and wrote down the results.

About Whisper
1. Key Features
Preliminary Preparation: Obtaining APIKey for OpenAI
Build Power Automate
Building Power Apps
1. Pass audio from the microphone control
2. Passing audio files from attachments control
Extra 1: Translation from Japanese to English often fails.
Addition 2: Brazilian language was acceptable.

About Whisper

Whisper is a general-purpose automatic speech recognition (ASR) system developed by OpenAI.
Trained on 680,000 hours of multilingual supervised data, it delivers state-of-the-art accuracy in speech-to-text, translation, and language identification across dozens of languages.

Key Features

Robustness: Performs exceptionally well against accents, background noise, and technical jargon.
Versatility: Supports both Transcriptions (speech-to-text in original language) and Translations (speech-to-English).
Supported Formats: mp3, mp4, mpeg, mpga, m4a, wav, webm (File size limit: 25 MB)

In this tutorial, we will bypass standard connectors and call the Whisper API directly via Power Automate’s HTTP action to build a powerful speech-to-text workflow.

*API Reference
https://platform.openai.com/docs/api-reference/audio

Preliminary Preparation: Obtaining APIKey for OpenAI

First, go to the OpenAI home page and log in.

Then select “View API keys” from the icon in the upper right corner.

Press “Create new secret key”,

Create a Key with an appropriate name,

Note the key that was created.

Preliminary preparations are now complete.

Build Power Automate

Then we’ll start building Power Automate.

The overall view of the completed Power Automate looks like this.

The two arguments of the trigger are as follows

File type argument to have “voice” from Apps.
String type argument asking you to select “transcriptions” or “translations”.

Next, the corresponding definitions of “extension” and “MIME type” are created in variables.
*There are actually a few more file types that are supported, but I’ll leave it at that for now.

Here’s what’s inside.

{
  "flac": "audio/flac",
  "mp3": "audio/mpeg",
  "mp4": "video/mp4",
  "wav": "audio/wav",
  "ogg": "audio/ogg",
  "webm": "audio/webm"
}

Then the MIME type defined above is obtained from the file extension obtained in the argument.

The formulas are as follows, respectively

// Getting file extensions from file names
@{last(split(triggerBody()['file']['name'],'.'))}

// Get MIME type from file extension
@{variables('ContentTypes')?[outputs('作成：extention')]}

Then you call the OpenAI API in an HTTP action.
※Change the URL (transcription or English translation) to be called depending on the argument.

Click here to see what’s in the text.

{
  "$content-type": "multipart/form-data",
  "$multipart": [
    {
      "headers": {
        "Content-Disposition": "form-data; name="model""
      },
      "body": "whisper-1"
    },
    {
      "headers": {
        "Content-Disposition": "form-data; name="file"; filename="@{triggerBody()['file']['name']}""
      },
      "body": {
        "$Content-type": "@{outputs('作成：content-type')}",
        "$content": @{triggerBody()['file']['contentBytes']}
      }
    }
  ]
}

The response from OpenAI will be returned in the following simple JSON,

Parses JSON and returns contents to Power Apps.

Power Automate is now complete!

Building Power Apps

Power Apps creates the following two screens

Screen to pass audio from microphone control
Screen to pass audio file from attachments control

Pass audio from the microphone control

The formula for passing audio from the microphone control is as follows

With({ret:SpeechToTextByOpenAI.Run(rdoType.Selected.Value,{
    file:{
        name:"audio.webm",
        contentBytes:Microphone1.Audio
        }
    })},
    UpdateContext({locTextFromMicrophone:ret.text})
)

When the voice is registered in the microphone control and Power Automate is invoked, the voice to text conversion is successful, as shown in the following image.

English translation was successful.

Passing audio files from attachments control

To pass an audio file from the attachment control, do the following
※Attachment limit set to 1.

With({ret:SpeechToTextByOpenAI.Run(rdoType_1.Selected.Value,{
    file:{
        name:First(DataCardValue3_1.Attachments).Name,
        contentBytes:First(DataCardValue3_1.Attachments).Value}
    })},
    UpdateContext({locTextFromFile:ret.text})
)

You can actually attach a file and press the button,

It works without incident.

English translation works fine.

*We used the audio files from this site.
https://soundeffect-lab.info/sound/voice/info-lady1.html

Extra 1: Translation from Japanese to English often fails.

I just read a page from a picture book to a friend,

Translation fails like this. It may be necessary to have some tight sentences.

Addition 2: Brazilian language was acceptable.

My friend also spoke Brazilian, so I asked him to record it for me, and the transcription was a success!

English translation was also available!

It seems that “prompt”, “temperature”, “language”, etc. can be passed as arguments to this API, so there are many more things to play with!

How to Call OpenAI Whisper API from Power Automate (HTTP Action) – Step-by-Step Guide