Unable to create MULTIPLE TTS “wav” files using MS-SAPI 5.1 in C#

Question

Greetings folks!

I'm working on a project where I will have to create WAV files of names using TTS.

I have the MS-SAPI 5.1 SDK installed on a Windows Server 2003 and use C# to write the TTS program. Apart from the default Microsoft Sam voice, I have voices from NeoSpeech TTS installed on the server.

The issue I'm having is, the program does not produce more than 1 working WAV file .

To be more specific, if I send 4 names to the program, the program creates 4 WAV files. However only the first name is converted correctly. The file size is greater than 1 kb and the file also plays in media player.

The other 3 files are created but are of size 1 kb and do not work in any media player.

I'm new to both C# and MS-SAPI but I believe I have done a decent job creating the code. I have spent days trying to figure this out but I'm out of energy now.

Any insight on this issue is greatly appreciated. Thanks for your time.

Here is my code:

using System;
using System.Collections.Generic;
using System.Collections;
using System.Text;
using SpeechLib;
using System.Threading;

namespace TTS_Text_To_Wav
{
    class Gender
    {
        public static String MALE = "Male";
        public static String FEMALE = "Female";
    }

    class Languages
    {
        public static String ENGLISH = "409;9";
        public static String SPANISH = "40a";
    }

    class Vendor
    {
        public static String VOICEWARE = "Voiceware";
        public static String MICROSOFT = "Microsoft";
    }

    class SampleTTS
    {
        static void Main(string[] args)
        {
            SampleTTS processor = null;

            try
            {
                processor = new SampleTTS();

                // get unprocessed items
                ArrayList unProcessedItems = new ArrayList();
                unProcessedItems.Add("Kate");
                unProcessedItems.Add("Sam");
                unProcessedItems.Add("Paul");
                unProcessedItems.Add("Violeta");

                if (unProcessedItems != null)
                {
                    foreach (string record in unProcessedItems)
                    {
                        // convert text to wav
                        processor.ConvertStringToSpeechWav(record, "c:/temp/" + record + ".wav", Vendor.VOICEWARE, Gender.MALE, Languages.ENGLISH);
                    }
                }
            }
            catch (Exception e)
            {
                Console.WriteLine(e.Message);
            }
        }

        void ConvertStringToSpeechWav(String textToConvert, String pathToCreateWavFile, String vendor, String gender, String language)
        {
            SpVoice voice = null;
            SpFileStream spFileStream = null;

            try
            {
                spFileStream = new SpFileStream();
                voice = new SpVoice();

                spFileStream.Format.Type = SpeechAudioFormatType.SAFT8kHz16BitMono;
                spFileStream.Open(pathToCreateWavFile, SpeechStreamFileMode.SSFMCreateForWrite, false);

                voice.Voice = voice.GetVoices("Vendor=" + vendor + ";Gender=" + gender, "Language=" + language).Item(0);
                voice.AudioOutputStream = spFileStream;
                voice.Speak(textToConvert, SpeechVoiceSpeakFlags.SVSFlagsAsync | SpeechVoiceSpeakFlags.SVSFPurgeBeforeSpeak);
                voice.WaitUntilDone(Timeout.Infinite);
            }
            catch (Exception e)
            {
                throw new Exception("Error occured in ConvertStringToSpeechWav()\n" + e.Message);
            }
            finally
            {
                if (spFileStream != null)
                {
                    spFileStream.Close();
                }
            }
        }
    }
}

Edit:

I seem to notice some new behavior. The code works fine for Microsoft voices on the system. It is only with the NeoSpeech voices I seem to have this issue.

Does that mean my code is correct and something is wrong with the voices? For one, I got the voice from my clients so there is nothing I can do about it. Secondly these are production ready voices. I'm pretty sure they are well tested or we would have heard a lot about it.

I'm still inclined to believe something is up with the code I wrote.

Are there any other suggestions available? I'm in a real fix here and any help will be appreciated.

Answer 1

While I don't see anything glaring that is causing the TTS issue, there are some best practices and code simplifications you could be using.

First off, the class which includes Main(), SampleTTS doesn't need to be instantiated in order to call ConvertStringToSpeechWav():

class SampleTTS
{
    static void Main(string[] args)
    {
        SampleTTS processor = null;

        try
        {
            processor = new SampleTTS();

The Sample TTS class can be rewritten as follows:

class SampleTTS
{
    static void Main(string[] args)
    {
        try
        {
            // get unprocessed items
            List<String> unProcessedItems = new List<String>();
            unProcessedItems.Add("Kate");
            unProcessedItems.Add("Sam");
            unProcessedItems.Add("Paul");
            unProcessedItems.Add("Violeta");

            foreach (string record in unProcessedItems)
            {
                // convert text to wav
                ConvertStringToSpeechWav(record, "c:/temp/" + record + ".wav", Vendor.VOICEWARE, Gender.MALE, Languages.ENGLISH);
            }
        }
        catch (Exception e)
        {
            Console.WriteLine(e.Message);
        }
    }

Note I also changed the list from ArrayList -> List<String> as a best practice because List(T) performs better than ArrayList and is type safe. I also removed the if (unProcessedItems != null check) as you're already instantiating the list above, so it will either be non null or throw an exception.

Lastly you're creating a new voice object each time ConvertStringToSpeechWav() is called:

voice = new SpVoice();

and letting GC clean it up. Have you tried calling GC.Collect() like PauloPinto suggested above, just to see if it works? You don't have to stick to rigid coding principles just to get something working. The goal should always be to code cleanly and with principles, but more so to get your code in a working state, and then refactoring as needed.

I hope some of this helps.

Cheers.

Answer 2

It's been a while since I did TTS, but from what I recall the Speak method is asynchronous so the subsequent calls are probably being blocked while the first is playing.

It looks like you're doing it explicitly by using the "SpeechVoiceSpeakFlags.SVSFlagsAsync" flag, so try change that first.

Answer 3

I was having a similar issue except for the fact that I was using voices from a different vendor (not NeoSpeech) and that the problem only appeared after some 300 or so successful wav files generated.

But the symptom was the same: all wav files that didn't work were less than 1K in size.

I noticed that moving the failed lines to the top of the list still produced a similar result: the initial 300 or so lines succeeded (even though some of those lines had failed in the previous run). So the problem was not the lines themselves, but rather an issue to do with how much was being processed.

I couldn't find any way to 'reset' the speech system so I tried calling the Garbage Collector every 100 lines. It worked!

So I'd suggest you try:

GC.Collect();

at the end of your ConvertStringToSpeechWav function.

Unable to create MULTIPLE TTS “wav” files using MS-SAPI 5.1 in C#

Question

3 answers

solution1
2 2011-02-19 00:02:24

solution2
0 2010-12-05 06:42:50

solution3
0 2011-02-18 00:25:38

Unable to create MULTIPLE TTS “wav” files using MS-SAPI 5.1 in C#

Question

3 answers

solution1 2 2011-02-19 00:02:24

solution2 0 2010-12-05 06:42:50

solution3 0 2011-02-18 00:25:38

solution1
2 2011-02-19 00:02:24

solution2
0 2010-12-05 06:42:50

solution3
0 2011-02-18 00:25:38