Madhura Gore
5 min readNov 29, 2019

--

How to get rid of Longtime elapsed without audio and Resource exhausted error in Google Speech to Text for m3u8 video stream

Speech to Text

This post is a part of “Generating transcripts for an m3u8 video stream using Google Cloud Speech to Text” series. It would be better if you go through the previous post first:

Note: We assume you know the basics of Google Cloud Speech to Text and have already set up a project on Google Cloud Platform and have activated a service account. If not, see Speech to Text Quickstart.

Let’s first understand what these errors mean. Here’s a quick overview for both of these errors:

  • Error 11: This error occurs because of following reasons:
  1. Long time elapsed without audio: This means there was no audio sent to the Google Cloud Speech to Text streamingRecognize stream (will be referred as recognizeStream) for a particular duration of time, 10 seconds to be precise. Here, “no audio” does not mean silence. In case of speech processing, silence is also considered to be a form of audio and hence even if you send some continuous silence to the stream, error 11 will not occur. For eg, if you write chunks of 0’s to the stream, this error won’t occur. Let me explain this with an example. Suppose you write some audio chunks to recognizeStream at time 12.01, 12.02, and there’s absolutely no data coming from your audio stream till 12.15, so the next audio chunk getting written to recognizeStream is at 12.15, this is the case where the “no audio” duration is greater than 10 seconds, and you’ll get error 11 — long time elapsed without audio.
  2. Exceeded maximum allowed stream duration of 305 seconds: The current maximum duration (streamingLimit) for which recognizeStream can be open is ~5 minutes, 305 seconds to be precise, so if a stream is open for more than 305 seconds, this error occurs. So, we have to make sure that the recognizeStream will be closed properly when this time limit is reached and a new stream has to be created for further processing.
  • Error 8 (Resource exhausted): As the name suggests, when content/request limits are exceeded, this error is thrown. Usage limit for requests/per 60 seconds is 900 and 480 hours of audio processing is allowed per day. You can refer to Quotas & limits page of Google Speech to Text. In our case, the data which was coming from the m3u8 stream was of variable streaming rate. Initially we were writing chunks directly into recognizeStream and hence a large amount of data was getting written to the stream in a short interval of time. Hence we controlled the data being written into the Google Cloud Speech to Text stream through a buffer and sent these controlled audio chunks to speech to text.

Approach for controlling audio data:

  1. Store the audio chunks coming into the transform method of Node.js stream to an audio buffer.
  2. At a periodic time interval, remove a certain amount of audio chunks from this buffer and write these removed chunks into recognizeStream.
  3. You can have a fixed size of audio chunks to be removed from the buffer if the data coming from your m3u8 stream is at a constant rate. Since the audio data coming from our m3u8 stream was of variable rate, based on that rate, we considered 4 slabs of streaming rate.
let out_write_size = {
high: 160000,
medium: 80000,
low: 20000,
very_low: 1024
};

So, at a particular time, we will check the audio buffer (this will give us an idea of current rate at which audio data is coming), remove the chunks of audio in that particular slab and then write these chunks to recognizeStream. (Higher rate means we will remove the high slab, lower rate means we will remove the low slab and so on..)

Node.js Code:

//imports
const ffmpeg = require('fluent-ffmpeg');
const {
Transform
} = require('stream');
const fs = require('fs');
//google speech to text
const speech = require('@google-cloud/speech');
//creates a speech client
const speechClient = new speech.v1p1beta1.SpeechClient();
let recognizeStream, timeout;
let streamingLimit = 210000 //3.5 minutes
let configurations = {
config: {
encoding: 'FLAC',
sampleRateHertz: 48000,
languageCode: 'en-US',
model: 'default',
audioChannelCount: 2,
enableWordTimeOffsets: true,
},
interimResults: true,
};
let out_write_size = {
high: 160000,
medium: 80000,
low: 20000,
very_low: 1024
};
//link to your m3u8 stream
let livestream_endpoint = 'https://<your_m3u8_stream...>.m3u8';
let audio_buffer = null;
const empty_buffer = Buffer.alloc(1, 0);
let isFirstProgress = true;

function startStream() {
console.log("started recognition stream ");
recognizeStream = speechClient
.streamingRecognize(configurations)
.on('error', (err) => {
console.log(err);
})
.on('data', (stream) => {
speechCallback(stream);
});
//restarting stream every 3.5 mins for infinite streaming
timeout = setTimeout(restartStream, streamingLimit);
}

function speechCallback(stream, incoming_which_stream) {
let stdoutText = stream.results[0].alternatives[0].transcript;
if (stream.results[0] && stream.results[0].isFinal) {
console.log("Final Result : ", stdoutText);
fs.appendFile('transcripts.txt', stdoutText, (err) => {
if (err)
console.log(err);
});
} else {
console.log("Interim Result : ", stdoutText);
}
}

function restartStream() {
if (recognizeStream) {
recognizeStream.removeListener('data', speechCallback);
recognizeStream.destroy();
recognizeStream = null;
}
}

let dest = new Transform({
transform: (chunk, enc, next) => {
if (!audio_buffer) {
audio_buffer = chunk;
} else {
// append data to audio buffer
audio_buffer = Buffer.concat([audio_buffer, chunk]);
}
console.log('chunk length: ', chunk.length);
next(null, chunk);
}
}).on('data', (data) => {});

function writeDataToRecognizeStream() {
// select a size of data to be removed from buffer depending on current size of audio_buffer
let write_size = audio_buffer ?
(audio_buffer.length > 1280000 ? out_write_size.high :
(audio_buffer.length > 640000 ? out_write_size.medium :
(audio_buffer.length > 320000) ? out_write_size.low : out_write_size.very_low)) :
out_write_size.very_low;

if (audio_buffer && (audio_buffer.length >= write_size)) {
if (recognizeStream) {
// remove the chunks of selected size and write to recognizeStream
let data_to_write = audio_buffer.slice(0, write_size);
audio_buffer = audio_buffer.slice(write_size);
recognizeStream.write(data_to_write);
} else {
console.log("no recognise stream");
}
} else {
// no data coming from stream, write 0's into stream
console.log("Nothing to write, buffer empty, writing dummy chunk");
recognizeStream.write(empty_buffer);
}
}

// ffmpeg processing
let command = ffmpeg(livestream_endpoint)
.on('start', () => {
startStream();
console.log("ffmpeg : processing Started");
})
.on('progress', (progress) => {
if (isFirstProgress) {
// write data into recognizeStream after every 1 second
setInterval(writeDataToRecognizeStream, 1000);
}
isFirstProgress = false;
console.log('ffmpeg : Processing: ' + progress.targetSize + ' KB converted');
})
.on('end', () => {
console.log('ffmpeg : Processing finished !');
})
.on('error', (err) => {
console.log('ffmpeg : ffmpeg error :' + err.message);
})
.format('flac')
.audioCodec('flac')
.output(dest)
command.run();

Summary:

  • You won’t face the resource exhausted error since you’re controlling the audio data and then writing it into google cloud speech to text stream.
  • The chunks are getting written into the stream at a continuous interval and when there’s absolutely no audio data coming from m3u8 stream, we’re writing 0’s into recognizeStream, hence Error 11 (Long time elapsed without audio) is solved.
  • The key to solve Error 11 (Exceeded maximum allowed stream duration of 305 seconds) is to call recognizeStream.destroy();in your restartStream() method. This ensures that the recognizeStream which is currently open, is closed properly, then sets it to null. If you don’t destroy it, the stream remains open for maximum allowed duration of 305 seconds, causing the error.

--

--