I thought to do this when I started reading the pinned post titled ‘Listening-Reading Method and Spanish’.
https://farkastranslations.com/bilingual_books.php
I encoded it in 112kbps opus to an ogg file using ffmpeg (it’s mono, ~731MB, 15:12:58 long): https://github.com/holdengreen/lingtool/blob/main/streams/center-earth-journey-es-en-original-botched.ogg
I wrote the script to process the text file at: https://farkastranslations.com/books/Verne_Jules-Voyage_au_Centre_de_la_Terre-fr-en-es-hu-nl.zip
Here is the script (https://github.com/holdengreen/lingtool/blob/main/src/center-earth-parallel.py):
import re
import sys
import torch
import numpy as np
MAX_WAV_VALUE = 32768.0
sample_rate = 48000
accelerator = 'cpu'
device = torch.device(accelerator)
class SpanishTTS:
language = 'es'
model_id = 'v3_es'
speaker = 'es_1'
def __init__(self):
self.model, example_text = torch.hub.load(repo_or_dir='snakers4/silero-models',
model='silero_tts',
language=self.language,
speaker=self.model_id)
self.model.to(device) # gpu or cpu
def apply(self, text):
return self.model.apply_tts(text=text,
speaker=self.speaker,
sample_rate=sample_rate) * MAX_WAV_VALUE
class EnglishTTS:
language = 'en'
model_id = 'v3_en'
speaker = 'en_117'
def __init__(self):
self.model, example_text = torch.hub.load(repo_or_dir='snakers4/silero-models',
model='silero_tts',
language=self.language,
speaker=self.model_id)
self.model.to(device) # gpu or cpu
def apply(self, text):
return self.model.apply_tts(text=text,
speaker=self.speaker,
sample_rate=sample_rate) * MAX_WAV_VALUE
spanishtts = SpanishTTS()
englishtts = EnglishTTS()
FFMPEG_BIN = "ffmpeg"
import subprocess as sp
from fcntl import fcntl, F_GETFL, F_SETFL
from os import O_NONBLOCK, read
fl = open("res/foreign/parallel-translations/Verne_Jules-Voyage_au_Centre_de_la_Terre-fr-en-es-hu-nl.farkastranslations.com/Verne_Jules-Voyage_au_Centre_de_la_Terre-fr-en-es-hu-nl.txt", 'r')
t = fl.read()
fl.close()
errfl = open("log/err.txt", 'a+')
proc = sp.Popen([ FFMPEG_BIN,
'-y', # (optional) means overwrite the output file if it already exists.
"-f", 's16le', # means 16bit input
"-acodec", "pcm_s16le", # means raw 16bit input
'-ar', str(sample_rate), # the input will have 48000 Hz
'-ac','1', # the input will have 2 channels (stereo)
'-i', 'pipe:0', # means that the input will arrive from the pipe
'-vn', # means "don't expect any video input"
'-acodec', "libopus", # output audio codec
'-b:a', "112k", # output bitrate (=quality).
'streams/center-earth-journey-es-en-1.ogg',
'-loglevel', 'debug'
],
stdin=sp.PIPE,stdout=errfl, stderr=errfl, shell=False)
#flags = fcntl(proc.stdout, F_GETFL) # get current p.stdout flags
#fcntl(proc.stdout, F_SETFL, flags | O_NONBLOCK)
def readlines():
#print(proc.stdout.readlines())
#while True:
while False:
try:
print(read(proc.stdout.fileno(), 1024))
except OSError:
# the os throws an exception if there is no data
print('[No more data]')
break
#print(ascii(t))
t = t.split('\n')
max_ln = len(t)
ln_cnt = 1
for e in t:
print("processing {0}/{1}".format(str(ln_cnt), str(max_ln)))
g = re.split(r'\t+', e)
try:
#spanish
proc.stdin.write(np.asarray(spanishtts.apply(g[2]), dtype=np.int16).tobytes())
#1 second pause
proc.stdin.write(np.asarray([0] * sample_rate, dtype=np.int16).tobytes())
# english
proc.stdin.write(np.asarray(englishtts.apply(g[1]), dtype=np.int16).tobytes())
#2 second pause
proc.stdin.write(np.asarray([0] * (sample_rate*2), dtype=np.int16).tobytes())
except Exception as e:
print(repr(e))
print("occured for lines: ")
print(g[2])
print(g[1])
ln_cnt += 1
Run it with python3.9 and use python3.9 -m pip install
to install the dependencies such as PyTorch.
This took maybe five hours to generate on my i7 6700HQ laptop. And btw I counted 13 exceptions either Exception("Model couldn't generate your text, probably it's too long")
or one was UserWarning: Text string is longer than 1000 symbols.
. This means I think the text is too big or there were some symbols or something it can’t handle. I will investigate. ValueError
(s) don’t seem to be much of an issue tho.
There were 2155 translation pairs in total (most being large paragraphs) so missing 13 isn’t a huge deal. The format of the file is separated by \t
and \n
. It comes in chunks where the paragraphs in different languages are seperated by the tabs and those chunks where the parallel translation moves on to the next set of equivalent paragraphs are separated by new lines. English is at index 1 and spanish at index 2.
Can’t wait to use this tomorrow on my walks.
Nice one, comrade!
yeah I tried it today on my long wallk… it’s a bit advanced for my level of learning and I’m not sure whether spanish should come before or after. I may have to try it with the stuff @Franfran2424 recommended me if I can find proper parallel translations or produce them.
Also the passages are too big and I will have to prob make the english like 1.25x sped up, and other tweaks, the TTS isn’t perfect.
Any improvement in your listening, @holdengreen@lemmygrad.ml?
No I haven’t had time for this yet… I still need to do what I said.
No worries, I was just curious. I tend to make language plans, forget them, try something else with the language, and come back to them months later. My road to fluency will be paved with discarded grammar books.
do you know any popular libraries that can do speech to text? i have some long audio files that i want to search within. if there is some speech to text thing i can use i can index these audio files.
thanks