Using Userscripts to Translate Subtitles On-the-Fly

Userscripts are one of the most underused tools online. Many developers can write JavaScript code, yet people often forget that when a website does something they don’t like, it’s often easy to write a script to change it. Userscripts let you write JavaScript (and CSS) code that will run in the page as if it was there originally, allowing you to change the layout, alter the behavior, and introduce additional functionalities. This little post is about adding English translations to the website of SVT, Sweden’s public broadcaster.

It’s great when learning a language

As I have been trying to improve my Swedish over the past year, one of the easy things I realized I can do is to immerse myself with the language as much as I can. Luckily, SVT runs a website called SVT Play with vast amount of Swedish content to watch. While I usually understand the content, sometimes I miss a word or come across an unfamiliar phrase. The website offers subtitles, but they’re almost always only in Swedish.

Testing the hypothesis

Since about 10 years ago websites stopped using Flash for showing videos online. One of the best things about it is that everything you now see is using native HTML elements. As I was watching the video I thought - the subtitles too must be somewhere in the DOM.

Opening the devtools and searching for text revealed exactly that:

finding the subtitles element in the DOM

From here on, the path is pretty simple. We write some JavaScript code that takes the text, sends it to some translation API (because browsers don’t do it, yet), and replace the content with the result.

Using a translation service

At first I thought I should try using Workers AI m2m100-1.b for the translation. Workers AI has a free tier and writing a Worker that translates is super simple:

export default {
  async fetch(request, env) {
    const { text } = await request.json();
    const result = await env.AI.run("@cf/meta/m2m100-1.2b", {
      text,
      source_lang: "swedish",
      target_lang: "english",
    });
    return new Response(JSON.stringify(result));
  },
};

We receive JSON with the original text, and we return JSON with the translated text.

This generally worked, but two issues bothered me:

  1. It would sometimes take more than 5 seconds to get the response. It could be because I’m using the free Workers tier, I am not sure.
  2. The layout of the text would often break, and the model would sometimes translate just one sentence, even though there were too. If the subtitles span over 2 lines, they would return as one line, often missing the full translation.

Then I remembered that DeepL has a free API that one can use for up to 500k characters per month. That should be enough, I’m not watching that much TV…

The final Userscript

I had some issues working around the tight CSP rules of SVT Play’s website. The rules prevented me from sending requests to hosts that were not in the allowlist. After trying some really nasty solutions (I realized localhost was in the allowlist, so I wrote a little proxy server to run locally and call the translation service from it), I figured I should try a different Userscript manager. I have been using FireMonkey for a long time now, but I always had issues with CSP. I gave TamperMonkey (they’re all monkeys because the original Userscript manager was called GreaseMonkey) a shot and everything just worked. I ended up with this script:

// ==UserScript==
// @name             SVTsubs
// @match            https://www.svtplay.se/*
// @version          1.0
// @connect          *
// @grant GM_xmlhttpRequest
// ==/UserScript==

const handleKeydown = async (event) => {
  if (event.key === "Shift") {
    const subtitleElement = document.querySelector(
      "[class*=video-player__text] span",
    );
    const originalText = subtitleElement.getAttribute("data-original-text");
    const translatedText = subtitleElement.getAttribute("data-translated-text");
    if (originalText) {
      for (const span of [
        ...document.querySelectorAll("[class*=video-player__text] span"),
      ]) {
        span.setAttribute(
          "data-translated-text",
          encodeURIComponent(subtitleElement.innerText),
        );
        span.innerText = decodeURIComponent(originalText);
        subtitleElement.removeAttribute("data-original-text");
      }
    } else if (translatedText) {
      for (const span of [
        ...document.querySelectorAll("[class*=video-player__text] span"),
      ]) {
        span.setAttribute(
          "data-original-text",
          encodeURIComponent(subtitleElement.innerText),
        );
        span.innerText = decodeURIComponent(translatedText);
        subtitleElement.removeAttribute("data-translated-text");
      }
    } else {
      const videoText = subtitleElement.innerText;

      GM.xmlHttpRequest({
        method: "POST",
        url: "https://api-free.deepl.com/v2/translate",
        data: JSON.stringify({ text: [videoText], target_lang: "EN" }),
        headers: {
          "Content-Type": "application/json",
          Authorization: "DeepL-Auth-Key MY_SECRET_KEY_GOES_HERE",
        },
        onload: function (response) {
          const { translations } = JSON.parse(response.response);
          const translatedText = translations[0].text;

          for (const span of [
            ...document.querySelectorAll("[class*=video-player__text] span"),
          ]) {
            span.setAttribute(
              "data-original-text",
              encodeURIComponent(span.innerText),
            );
            span.innerText = translatedText;
          }
        },
      });
    }
  }
};

document.addEventListener("keydown", handleKeydown);

As you can see, the script ended up doing a little more than I originally intended. Mainly, I thought that a common thing to do would be to pause and flip back-n-forth between the original text and the translation. As I don’t want to send a request to DeepL for text that I have already translated, I decided to keep the original text under a new attribute called data-original-text and the translated text under data-translated-text. Lastly, I binded the whole thing to my Shift key - you just press the key and the translation appear, you press it again and it goes back to the original. With DeepL’s API being super fast to begin with, this created a very smooth experience:

And that’s all! Now I can watch all the Swedish TV shows, and when I don’t understand something, I just press the Shift key. I think the above should be easy to replicate on other websites too, so if you’re learning a language - give it a try!

Netflix version

A friend asked me to create a similar Userscript for Netflix. Here is the same thing adjusted for Netflix subtitles:

// ==UserScript==
// @name             Netflix Translate
// @match            https://www.netflix.com/*
// @version          1.0
// @connect          *
// @grant GM_xmlhttpRequest
// ==/UserScript==

const handleKeydown = async (event) => {
  if (event.key === "Shift") {
    const subtitleElement = document.querySelector(
      ".player-timedtext-text-container",
    );
    const originalText = subtitleElement.getAttribute("data-original-text");
    const translatedText = subtitleElement.getAttribute("data-translated-text");
    if (originalText) {
      subtitleElement.setAttribute(
        "data-translated-text",
        encodeURIComponent(subtitleElement.innerHTML),
      );
      subtitleElement.innerHTML = decodeURIComponent(originalText);
      subtitleElement.removeAttribute("data-original-text");
    } else if (translatedText) {
      subtitleElement.setAttribute(
        "data-original-text",
        encodeURIComponent(subtitleElement.innerHTML),
      );
      subtitleElement.innerHTML = decodeURIComponent(translatedText);
      subtitleElement.removeAttribute("data-translated-text");
    } else {
      const subtitlesHTML = subtitleElement.innerHTML;
      const regex = />[^<]+</g;
      const matches = [...subtitlesHTML.matchAll(regex)];
      const subtitles = matches.map((match) => match[0].slice(1, -1).trim());
      console.log(subtitles);
      GM.xmlHttpRequest({
        method: "POST",
        url: "https://api-free.deepl.com/v2/translate",
        data: JSON.stringify({ text: subtitles, target_lang: "EN" }),
        headers: {
          "Content-Type": "application/json",
          Authorization: "DeepL-Auth-Key MY_SECRET_KEY_GOES_HERE",
        },
        onload: function (response) {
          const { translations } = JSON.parse(response.response);
          subtitleElement.setAttribute(
            "data-original-text",
            encodeURIComponent(subtitlesHTML),
          );
          for (const i in translations) {
            subtitleElement.innerHTML = subtitleElement.innerHTML.replace(
              subtitles[i],
              translations[i].text,
            );
          }
        },
      });
    }
  }
};

document.addEventListener("keydown", handleKeydown);

- September 26, 2024