Power Apps Call OpenAI Whisper (Speech To Text) via Power Automate | Speech to Text Conversion

This time, I tried calling OpenAI’s Whisper (Speech to Text) via Power Automate and wrote down the results.

スポンサーリンク

About Whisper

As the image shows, Whisper is a speech-to-text system developed by OpenAI.

An API to call this Whisper was published by OpenAI, so I called it from Power Apps via Power Automate.

Preliminary Preparation: Obtaining APIKey for OpenAI

First, go to the OpenAI home page and log in.

Then select “View API keys” from the icon in the upper right corner.
Press “Create new secret key”,
Create a Key with an appropriate name,
Note the key that was created.

Preliminary preparations are now complete.

Build Power Automate

Then we’ll start building Power Automate.

The overall view of the completed Power Automate looks like this.
The two arguments of the trigger are as follows

  • File type argument to have “voice” from Apps.
  • String type argument asking you to select “transcriptions” or “translations”.

Next, the corresponding definitions of “extension” and “MIME type” are created in variables.
*There are actually a few more file types that are supported, but I’ll leave it at that for now.
Here’s what’s inside.

{
  "flac": "audio/flac",
  "mp3": "audio/mpeg",
  "mp4": "video/mp4",
  "wav": "audio/wav",
  "ogg": "audio/ogg",
  "webm": "audio/webm"
}
Then the MIME type defined above is obtained from the file extension obtained in the argument.
The formulas are as follows, respectively

// Getting file extensions from file names
@{last(split(triggerBody()['file']['name'],'.'))}
// Get MIME type from file extension
@{variables('ContentTypes')?[outputs('作成:extention')]}
Then you call the OpenAI API in an HTTP action.
※Change the URL (transcription or English translation) to be called depending on the argument.
Click here to see what’s in the text.

{
  "$content-type": "multipart/form-data",
  "$multipart": [
    {
      "headers": {
        "Content-Disposition": "form-data; name="model""
      },
      "body": "whisper-1"
    },
    {
      "headers": {
        "Content-Disposition": "form-data; name="file"; filename="@{triggerBody()['file']['name']}""
      },
      "body": {
        "$Content-type": "@{outputs('作成:content-type')}",
        "$content": @{triggerBody()['file']['contentBytes']}
      }
    }
  ]
}
The response from OpenAI will be returned in the following simple JSON,
Parses JSON and returns contents to Power Apps.

Power Automate is now complete!

Building Power Apps

Power Apps creates the following two screens

  • Screen to pass audio from microphone control
  • Screen to pass audio file from attachments control

Pass audio from the microphone control

The formula for passing audio from the microphone control is as follows
When the voice is registered in the microphone control and Power Automate is invoked, the voice to text conversion is successful, as shown in the following image.
English translation was successful.

Passing audio files from attachments control

To pass an audio file from the attachment control, do the following
※Attachment limit set to 1.
You can actually attach a file and press the button,
It works without incident.
English translation works fine.

*We used the audio files from this site.
https://soundeffect-lab.info/sound/voice/info-lady1.html

Extra 1: Translation from Japanese to English often fails.

I just read a page from a picture book to a friend,
Translation fails like this. It may be necessary to have some tight sentences.

Addition 2: Brazilian language was acceptable.

My friend also spoke Brazilian, so I asked him to record it for me, and the transcription was a success!
English translation was also available!

It seems that “prompt”, “temperature”, “language”, etc. can be passed as arguments to this API, so there are many more things to play with!

コメント

Copied title and URL