Text-to-Speech in Unity WebGL Builds

For our entry to the Ludum Dare 43 gamejam (Missing Parts) we wanted our protagonist to use simple text-to-speech (TTS) for the monologues. There are assets in Unitys Asset Store that can do this – but they start upwards of 10 Euro. It seemed to us this was a rather simple feature so we decided to implement it ourselves. Turns out, it really was very simple to do! Admittedly the assets in the Asset Store probably have more features, but as long as you only need simple TTS my advice is to do it yourself. Let me show you how.

TTS in JavaScript

Using the Web Speech API we can use the speech synthesis of any modern browser to read text. To have a particularly simple interface we can write a small function that will create an Utterance object and immediately start its synthesis.

Speak: function (str) {
  var msg = new SpeechSynthesisUtterance(str);
  msg.lang = 'en-US';
  msg.volume = 1; // 0 to 1
  msg.rate = 1; // 0.1 to 10
  msg.pitch = 1.5; //0 to 2
  // stop any TTS that may still be active
  window.speechSynthesis.cancel();
  window.speechSynthesis.speak(msg);
}

This allows us to simply call Speak("Hello World") to say the words “Hello World” – but only from JavaScript. All we need now is a way to call this from Unitys c# code.

Calling JavaScript code from c#

Unity will parse any JavaScript code in .jslib files located in any folder called “Plugins”. To make the function we just wrote available to the c# code we have to add it to the LibraryManager.library object:

mergeInto(LibraryManager.library, {
  // ... functions we want to add ...
});

And we have to change one more small thing: to have JavaScript understand c# strings we have to explicitly construct JavaScript string objects from the pointer it will get using the function Pointer_stringify. In total our JavaScript code thus looks like this:

mergeInto(LibraryManager.library, {
  Speak: function (strPointer) {
    var str = Pointer_stringify(strPointer);
    var msg = new SpeechSynthesisUtterance(str);
    msg.lang = 'en-US';
    msg.volume = 1; // 0 to 1
    msg.rate = 1; // 0.1 to 10
    msg.pitch = 1.5; //0 to 2
    // stop any TTS that may still be active
    window.speechSynthesis.cancel();
    window.speechSynthesis.speak(msg);
  }
});

In c# we still have to define this function and tell the compiler where to find it, but then we can call it just as any other function. In this process we will associate it with a class as a private method – but it is not important that this happens only once. So we are free to either write a wrapper class that we will then use throughout our project – or to simply import the function into every class we want to use it in.

using UnityEngine;
using System.Runtime.InteropServices;

public class Character : MonoBehaviour {
  // this is supplied by TTS.jslib in the plugins folder
  [DllImport("__Internal")]
  private static extern void Speak(string str);

  // ...

  public void Say(string line) {
    // the jslib only works while in the browser
    if (Application.platform == RuntimePlatform.WebGLPlayer) {
      Speak(line);
    }
  }
}

For some more details on this import see the Unity documentation.

Leave a Reply

Your email address will not be published. Required fields are marked *