Power Apps Call OpenAI Whisper (Speech To Text) via Power Automate

This time, I tried calling OpenAI’s Whisper (Speech to Text) via Power Automate and wrote down the results.

About Whisper
Preliminary Preparation: Obtaining APIKey for OpenAI
Build Power Automate
Building Power Apps
1. Pass audio from the microphone control
2. Passing audio files from attachments control
Extra 1: Translation from Japanese to English often fails.
Addition 2: Brazilian language was acceptable.

About Whisper

As the image shows, Whisper is a speech-to-text system developed by OpenAI.

An API to call this Whisper was published by OpenAI, so I called it from Power Apps via Power Automate.

※Click here for API reference

OpenAI Platform

Explore developer resources, tutorials, API docs, and dynamic examples to get the most out of OpenAI's platform.

Preliminary Preparation: Obtaining APIKey for OpenAI

First, go to the OpenAI home page and log in.

Then select “View API keys” from the icon in the upper right corner.

Press “Create new secret key”,

Create a Key with an appropriate name,

Note the key that was created.

Preliminary preparations are now complete.

Build Power Automate

Then we’ll start building Power Automate.

The overall view of the completed Power Automate looks like this.

The two arguments of the trigger are as follows

File type argument to have “voice” from Apps.
String type argument asking you to select “transcriptions” or “translations”.

Next, the corresponding definitions of “extension” and “MIME type” are created in variables.
*There are actually a few more file types that are supported, but I’ll leave it at that for now.

Here’s what’s inside.

{
  "flac": "audio/flac",
  "mp3": "audio/mpeg",
  "mp4": "video/mp4",
  "wav": "audio/wav",
  "ogg": "audio/ogg",
  "webm": "audio/webm"
}

Then the MIME type defined above is obtained from the file extension obtained in the argument.

The formulas are as follows, respectively

// Getting file extensions from file names
@{last(split(triggerBody()['file']['name'],'.'))}

// Get MIME type from file extension
@{variables('ContentTypes')?[outputs('作成：extention')]}

Then you call the OpenAI API in an HTTP action.
※Change the URL (transcription or English translation) to be called depending on the argument.

Click here to see what’s in the text.

{
  "$content-type": "multipart/form-data",
  "$multipart": [
    {
      "headers": {
        "Content-Disposition": "form-data; name="model""
      },
      "body": "whisper-1"
    },
    {
      "headers": {
        "Content-Disposition": "form-data; name="file"; filename="@{triggerBody()['file']['name']}""
      },
      "body": {
        "$Content-type": "@{outputs('作成：content-type')}",
        "$content": @{triggerBody()['file']['contentBytes']}
      }
    }
  ]
}

The response from OpenAI will be returned in the following simple JSON,

Parses JSON and returns contents to Power Apps.

Power Automate is now complete!

Building Power Apps

Power Apps creates the following two screens

Screen to pass audio from microphone control
Screen to pass audio file from attachments control

Pass audio from the microphone control

The formula for passing audio from the microphone control is as follows

When the voice is registered in the microphone control and Power Automate is invoked, the voice to text conversion is successful, as shown in the following image.

English translation was successful.

Passing audio files from attachments control

To pass an audio file from the attachment control, do the following
※Attachment limit set to 1.

You can actually attach a file and press the button,

It works without incident.

English translation works fine.

*We used the audio files from this site.
https://soundeffect-lab.info/sound/voice/info-lady1.html

Extra 1: Translation from Japanese to English often fails.

I just read a page from a picture book to a friend,

Translation fails like this. It may be necessary to have some tight sentences.

Addition 2: Brazilian language was acceptable.

My friend also spoke Brazilian, so I asked him to record it for me, and the transcription was a success!

English translation was also available!

It seems that “prompt”, “temperature”, “language”, etc. can be passed as arguments to this API, so there are many more things to play with!

Power Apps Call OpenAI Whisper (Speech To Text) via Power Automate | Speech to Text Conversion