XML Speed control for TTS voices

Compdoggie · July 01, 2014, 09:12:44 PM

First, let me say thanks Marius for implementing Text to Speech as a plug in!

My problem is I do not seem to have the syntax correct for setting the speed of the voice in XML.

I play a lot of ambient space music and I like to have my voices slower than the default voice.

The command in XML for voice speed is :

<rate speed="-2">
This text should be spoken at a slower rate.
</rate>

How do I get this to work in the nowplaying.xml file?

Marius · July 01, 2014, 09:33:33 PM

The speed is not implemented yet, but i put it on my list.

Compdoggie · July 01, 2014, 09:55:45 PM

Thanks Marius!! Can you please consider the following items to your list concerning TTS.

Speed - we talked about.

Delay - Need a delay value in seconds of when the voice begins after a song starts to play. Right now the voice hammers on the song start. When playing soft space ambient, I like the voice to come in up to around 8 seconds after the song starts, very few vocals in ambient music and you can have the voice say, " you have been listening to...".
A silence tag is available under XML
(Five hundred milliseconds of silence <silence msec="500"/> just occurred) , and could be used in the nowplaying.xml on a per line basis if implemented. Inline silence xml could be handy in all speech text files, commas only work one time as a pause in the speech engine.

lastly, A pron lookup file, Like many bands need a pronunciation table, example Clock DVA is pronounced by the voice clockdeeva and a lookup table could say "Clock DVA"="Clock Dee Vee A" but that has to be a global table and would be harder to setup than the first two I have suggested.
XML Command PRON <pron sym="h eh 1 l ow & w er 1 l d"> hello world </pron> is inline and will only be handy for text file announcements.

Great work!! Best wishes!

Marius · July 01, 2014, 10:08:50 PM

Delay: Timing using Window's engine cannot be achieved well, because whenever you send a text to the Windows TTS engine it will generate a wav file, and that it takes some time that depends by the length of the text, the power of the computer and so on. So you have here a variable which cannot be accurately predicted.
Silence: the same problem.
Pron lookup file: This is a basic plugin as it was requested and it is used by a very few users, future updates are not planned, at least not in that manner you need. I cannot spend more time to a plugin which is used by maybe 1-2% of the users, time in which i can do more important things for the program.

Compdoggie · July 01, 2014, 11:02:10 PM

Marius, I agree about the 1%to2% feature requests. Not needed for PRON My workaround is to change the spelling in the mp3 tag file to make the voice pronounce properly.

My voice scripts call the tts voice directly and does not use a .wav file that has to be produced, however I do have to make changes to the global volume of RDJ down to volume level 2 to match the volume of the computers voice level output. I see you create a .wav file and mix it internally within RDJ, (which is Nice) but a delay could be set to wait x seconds before the wav is called to be generated. I guess that would be a global value setting in the .dll I am having no problems currently with the delay of the voice announcement of artist and Title due to wave file generation, and I even have my titles modified withing the mp3 tag to say title from the album, example voice says here is the Beatles with a track called yellow submarine from the album Sargent peppers lonely hearts club band. where that whole title string is included in the title field. (Another workaround)

Thanks for your reply and being candid with what to expect from TTS.

How is the voice/wav mixed into RDJ, is it a VT?

Marius · July 02, 2014, 07:37:54 AM

Yes, it is used very similar to VT. Indeed there is also the possibility to use the tts engine "on the fly", but then other problems appear, like the sound will be played outside of rdj which means that a encoder plugin will not "hear" the speech, the volume is pretty low and i cannot process it. I tried both ways and this seemed much better.

Compdoggie · July 02, 2014, 03:38:56 PM

You know the text to speech function has been my thing for over a year now and I have been successful with making it work for me using autohotkey scripting. I even show how I use external mixing and effects processing on the voices and call up patches in my youtube video on the wiki. It is not to hard to use the voices externally and you can use windows internal mixer to get the voice to the some encoder standalone encoder. I actually need the voice outside the mix to run my talking head robot that is triggered from the voice out using a Picotalk controller.

But, I do like the way your implementation sounds and your use of the comments field on the mp3 tag is brilliant. You can now have trivia and general comments thrown into the mix as a track is played in rotation. I have yet to try some of the other modes.
All I can say is keep up the good work, and all I need is speed control and a delay so the voice doesn't pounce on the music segue way.