Speak Translated Image

export const meta = {
  title: `Connect to machine learning services`,
  description: `Add AI/ML capabilities such as text recognition, image labeling, text-to-speech, and translation to your GraphQL API.`,
};

<MigrationAlert url={"/cli-legacy/graphql-transformer/predictions"}/>

Amplify allows you to identify text on an image, identify labels on an image, translate text, and synthesize speech from text with the `@predictions` directive.

> Note: The `@predictions` directive requires a S3 storage bucket configured via `amplify add storage`. 

## Identify text on an image

To configure text recognition on an image use the `identifyText` action in the `@predictions` directive.

```graphql
type Query {
  recognizeTextFromImage: String @predictions(actions: [identifyText])
}
```

In your GraphQL query, can pass in a S3 `key` for the image. At the moment, this directive works only with objects located within the `public/` folder of your S3 bucket. The `public/` prefix is automatically added to the `key` input. For instance, in the example below, `public/myimage.jpg` will be used as the input. 

```graphql
query RecognizeTextFromImage($input: RecognizeTextFromImageInput!) {
  recognizeTextFromImage(input: {
    identifyText: {
      key: "myimage.jpg"
    }
  })
}
```

## Identify labels on an image

To configure label recognition on an image use the `identifyLabels` action in the `@predictions` directive.

```graphql
type Query {
  recognizeLabelsFromImage: [String] @predictions(actions: [identifyLabels])
}
```

In your GraphQL query, you can pass in a S3 `key` for the image. At the moment, this directive works only with objects located within `public/` folder in your S3 bucket. The `public/` prefix is automatically added to the `key` input. For instance, in the example below, `public/myimage.jpg` will be used as the input. 

The query below will return a list of identified labels. Review [Detecting Labels](https://docs.aws.amazon.com/rekognition/latest/dg/labels.html) in the Amazon Rekognition documentation for the full list of supported labels.

```graphql
query RecognizeLabelsFromImage($input: RecognizeLabelsFromImageInput!) {
  recognizeLabelsFromImage(input: {
    identifyLabels: {
      key: "myimage.jpg"
    }
  })
}
```

## Translate text

To configure text translation use the `identifyLabels` action in the `@predictions` directive. 

```graphql
type Query {
  translate: String @predictions(actions: [translateText])
}
```

The query below will return the translated string. Populate the `sourceLanguage` and `targetLanguage` parameters with one of the [Supported Language Codes](https://docs.aws.amazon.com/translate/latest/dg/what-is.html#what-is-languages). Pass in the text to translate via the `text` parameter.

```graphql
query TranslateText($input: TranslateTextInput!) {
  translate(input: {
    translateText: {
      sourceLanguage: "en"
      targetLanguage: "de"
      text: "Translate me"
    }
  })
}
```

## Synthesize speech from text

To configure Text-to-Speech synthesis use the `convertTextToSpeech` action in the `@predictions` directive. 

```graphql
type Query {
  textToSpeech: String @predictions(actions: [convertTextToSpeech])
}
```

The query below will return a presigned URL with the synthesized speech. Populate the `voiceID` parameter with one of the [Supported Voice IDs](https://docs.aws.amazon.com/polly/latest/dg/voicelist.htm). Pass in the text to synthesize via the `text` parameter.

```graphql
query ConvertTextToSpeech($input: ConvertTextToSpeechInput!) {
  textToSpeech(input: {
    convertTextToSpeech: {
      voiceID: "Nicole"
      text: "Hello from AWS Amplify!"
    }
  })
}
```

## Combining Predictions actions

You can also combine multiple Predictions actions together into a sequence. The following action sequences are supported:
- `identifyText -> translateText -> convertTextToSpeech`
- `identifyLabels -> translateText -> convertTextToSpeech`
- `translateText -> convertTextToSpeech`

In the example below, `speakTranslatedImageText` identifies text from an image, then translates it into another language, and finally converts the translated text to speech.

```graphql
type Query {
  speakTranslatedImageText: String @predictions(actions: [
    identifyText
    translateText
    convertTextToSpeech
  ])
}
```

An example of that query will look like:

```graphql
query SpeakTranslatedImageText($input: SpeakTranslatedImageTextInput!) {
  speakTranslatedImageText(input: {
    identifyText: {
      key: "myimage.jpg"
    }
    translateText: {
      sourceLanguage: "en"
      targetLanguage: "es"
    }
    convertTextToSpeech: {
      voiceID: "Conchita"
    }
  })
}
```

A code example of this using the JS Library is shown below:

```js
import React, { useState } from 'react';
import { Amplify, Storage, API, graphqlOperation } from 'aws-amplify';
import awsconfig from './aws-exports';
import { speakTranslatedImageText } from './graphql/queries';

/* Configure Exports */
Amplify.configure(awsconfig);

function SpeakTranslatedImage() {
  const [src, setSrc] = useState("");
  const [img, setImg] = useState("");

  function putS3Image(event) {
    const file = event.target.files[0];
    Storage.put(file.name, file)
    .then(async (result) => {
      setSrc(await speakTranslatedImageTextOP(result.key))
      setImg(await Storage.get(result.key));
    })
    .catch(err => console.log(err));
  }

  return (
    <div className="Text">
      <div>
        <h3>Upload Image</h3>
        <input
          type = "file" accept='image/jpeg'
          onChange = {(event) => {
            putS3Image(event)
          }}
          />
        <br />
        { img && <img src = {img}></img>}
        { src &&
          <div> 
            <audio id="audioPlayback" controls>
              <source id="audioSource" type="audio/mp3" src = {src}/>
            </audio>
          </div>
        }
      </div>
    </div>
  );
}

async function speakTranslatedImageTextOP(key) {
  const inputObj = {
    translateText: {
      sourceLanguage: "en",
      targetLanguage: "es"
    },
    identifyText: { key },
    convertTextToSpeech: { voiceID: "Conchita" }
  };
  const response = await API.graphql(
    graphqlOperation(speakTranslatedImageText, { input: inputObj }));
  return response.data.speakTranslatedImageText;
}
function App() {
  return (
    <div className="App">
      <h1>Speak Translated Image</h1>
      <SpeakTranslatedImage />
    </div>
  );
}
export default App;
```

## How it works

Definition of the `@predictions` directive:

```graphql
directive @predictions(actions: [PredictionsActions!]!) on FIELD_DEFINITION
enum PredictionsActions {
  identifyText # uses Amazon Rekognition to detect text
  identifyLabels # uses Amazon Rekognition to detect labels
  convertTextToSpeech # uses Amazon Polly in a lambda to output a presigned url to synthesized speech
  translateText # uses Amazon Translate to translate text from source to target language
}
```

`@predictions` creates resources to communicate with Amazon Rekognition, Translate, and Polly.
For each action the following is created:

- IAM Policy for each service (e.g. Amazon Rekognition `detectText` Policy)
- An AppSync VTL function
- An AppSync DataSource

Finally, a pipeline resolver is created for the query or field. The pipeline resolver is composed of AppSync functions which are defined by the action list provided in the directive.