I thought to do this when I started reading the pinned post titled ‘Listening-Reading Method and Spanish’.

https://farkastranslations.com/bilingual_books.php

I encoded it in 112kbps opus to an ogg file using ffmpeg (it’s mono, ~731MB, 15:12:58 long): https://github.com/holdengreen/lingtool/blob/main/streams/center-earth-journey-es-en-original-botched.ogg

I wrote the script to process the text file at: https://farkastranslations.com/books/Verne_Jules-Voyage_au_Centre_de_la_Terre-fr-en-es-hu-nl.zip

Here is the script (https://github.com/holdengreen/lingtool/blob/main/src/center-earth-parallel.py):

import re
import sys
import torch

import numpy as np
MAX_WAV_VALUE = 32768.0

sample_rate = 48000

accelerator = 'cpu'
device = torch.device(accelerator)

class SpanishTTS:
    language = 'es'
    model_id = 'v3_es'
    speaker = 'es_1'

    def __init__(self):
        self.model, example_text = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                                model='silero_tts',
                                                language=self.language,
                                                speaker=self.model_id)

        self.model.to(device)  # gpu or cpu

    def apply(self, text):
        return self.model.apply_tts(text=text,
                        speaker=self.speaker,
                        sample_rate=sample_rate) * MAX_WAV_VALUE


class EnglishTTS:
    language = 'en'
    model_id = 'v3_en'
    speaker = 'en_117'

    def __init__(self):
        self.model, example_text = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                                model='silero_tts',
                                                language=self.language,
                                                speaker=self.model_id)

        self.model.to(device)  # gpu or cpu

    def apply(self, text):
        return self.model.apply_tts(text=text,
                        speaker=self.speaker,
                        sample_rate=sample_rate) * MAX_WAV_VALUE


spanishtts = SpanishTTS()
englishtts = EnglishTTS()


FFMPEG_BIN = "ffmpeg"

import subprocess as sp
from fcntl import fcntl, F_GETFL, F_SETFL
from os import O_NONBLOCK, read



fl = open("res/foreign/parallel-translations/Verne_Jules-Voyage_au_Centre_de_la_Terre-fr-en-es-hu-nl.farkastranslations.com/Verne_Jules-Voyage_au_Centre_de_la_Terre-fr-en-es-hu-nl.txt", 'r')
t = fl.read()
fl.close()


errfl = open("log/err.txt", 'a+')

proc = sp.Popen([ FFMPEG_BIN,
       '-y', # (optional) means overwrite the output file if it already exists.
       "-f", 's16le', # means 16bit input
       "-acodec", "pcm_s16le", # means raw 16bit input
       '-ar', str(sample_rate), # the input will have 48000 Hz
       '-ac','1', # the input will have 2 channels (stereo)
       '-i', 'pipe:0', # means that the input will arrive from the pipe
       '-vn', # means "don't expect any video input"
       '-acodec', "libopus", # output audio codec
       '-b:a', "112k", # output bitrate (=quality).
       'streams/center-earth-journey-es-en-1.ogg',
       '-loglevel', 'debug'
       ],
        stdin=sp.PIPE,stdout=errfl, stderr=errfl, shell=False)


#flags = fcntl(proc.stdout, F_GETFL) # get current p.stdout flags
#fcntl(proc.stdout, F_SETFL, flags | O_NONBLOCK)


def readlines():
    #print(proc.stdout.readlines())

    #while True:
    while False:
        try:
            print(read(proc.stdout.fileno(), 1024))
        except OSError:
            # the os throws an exception if there is no data
            print('[No more data]')
            break

#print(ascii(t))

t = t.split('\n')

max_ln = len(t)
ln_cnt = 1
for e in t:
    print("processing {0}/{1}".format(str(ln_cnt), str(max_ln)))

    g = re.split(r'\t+', e)

    try:

        #spanish
        proc.stdin.write(np.asarray(spanishtts.apply(g[2]), dtype=np.int16).tobytes())

        #1 second pause
        proc.stdin.write(np.asarray([0] * sample_rate, dtype=np.int16).tobytes())


        # english
        proc.stdin.write(np.asarray(englishtts.apply(g[1]), dtype=np.int16).tobytes())

        #2 second pause
        proc.stdin.write(np.asarray([0] * (sample_rate*2), dtype=np.int16).tobytes())

    except Exception as e:
        print(repr(e))
        
        print("occured for lines: ")
        print(g[2])
        print(g[1])


    ln_cnt += 1

Run it with python3.9 and use python3.9 -m pip install to install the dependencies such as PyTorch.

This took maybe five hours to generate on my i7 6700HQ laptop. And btw I counted 13 exceptions either Exception("Model couldn't generate your text, probably it's too long") or one was UserWarning: Text string is longer than 1000 symbols.. This means I think the text is too big or there were some symbols or something it can’t handle. I will investigate. ValueError(s) don’t seem to be much of an issue tho.

There were 2155 translation pairs in total (most being large paragraphs) so missing 13 isn’t a huge deal. The format of the file is separated by \t and \n. It comes in chunks where the paragraphs in different languages are seperated by the tabs and those chunks where the parallel translation moves on to the next set of equivalent paragraphs are separated by new lines. English is at index 1 and spanish at index 2.

Can’t wait to use this tomorrow on my walks.

  • @redtea
    link
    31 year ago

    Nice one, comrade!

    • @holdengreenOP
      link
      1
      edit-2
      1 year ago

      yeah I tried it today on my long wallk… it’s a bit advanced for my level of learning and I’m not sure whether spanish should come before or after. I may have to try it with the stuff @Franfran2424 recommended me if I can find proper parallel translations or produce them.

      Also the passages are too big and I will have to prob make the english like 1.25x sped up, and other tweaks, the TTS isn’t perfect.

        • @holdengreenOP
          link
          11 year ago

          No I haven’t had time for this yet… I still need to do what I said.

          • @redtea
            link
            21 year ago

            No worries, I was just curious. I tend to make language plans, forget them, try something else with the language, and come back to them months later. My road to fluency will be paved with discarded grammar books.

  • lemmygrabber
    link
    31 year ago

    do you know any popular libraries that can do speech to text? i have some long audio files that i want to search within. if there is some speech to text thing i can use i can index these audio files.