[英]How can I change the audio format of a voice?
I'm messing around with SAPI on Windows, and I've noticed that the audio quality of voices is quite underwhelming. 我在Windows上搞乱了SAPI,并且我注意到语音的音频质量非常差劲。 By comparing the audio quality of a simple test program and various qualities that eSpeak provides, I've concluded that the default quality is somewhere around 16kHz 16 Bit Mono
. 通过比较简单测试程序的音频质量和eSpeak提供的各种质量,我得出结论,默认质量约为16kHz 16 Bit Mono
。
#include <string>
#include <iostream>
#include <Windows.h>
#include <sapi.h>
#define _CHECK_HR(hr, debug_str) \
if(FAILED(hr)) { \
std::cout << debug_str << ": " << std::hex << "0x" << hr << std::dec << std::endl; \
goto check_failure; \
}
#define CHECK_HR(expr, debug_str) \
_CHECK_HR(expr, debug_str);
#define SAFE_RELEASE(obj) \
if(obj != NULL) { \
obj->Release(); \
obj = NULL; \
}
int main()
{
ISpVoice* voice = NULL;
CHECK_HR(CoInitialize(NULL), "CoInitialize");
CHECK_HR(CoCreateInstance(CLSID_SpVoice, NULL, CLSCTX_ALL, IID_ISpVoice, (LPVOID*)&voice), "voice = CoCreateInstance");
CHECK_HR(voice->Speak(TEXT("This is a simple test."), 0, NULL), "voice->Speak");
std::cout << "No errors!" << std::endl;
check_failure:
SAFE_RELEASE(voice);
CoUninitialize();
}
Naturally, I've tried consulting the SAPI documentation , but haven't found out how to change the format. 自然,我尝试过查阅SAPI文档 ,但是还没有发现如何更改格式。 ISpVoice doesn't have a method which sets the format, but it has a SetOuput method, which takes: ISpVoice没有设置格式的方法,但是有一个SetOuput方法,该方法需要:
either a stream, audio device, or an object token for an output audio device 流,音频设备或输出音频设备的对象令牌
My next step was creating an IAudioClient , with the format provided by SpConvertStreamFormatEnum , and setting its IAudioRenderClient as the voice's output. 下一步是使用SpConvertStreamFormatEnum提供的格式创建IAudioClient ,并将其IAudioRenderClient设置为声音的输出。 The attempt failed because I couldn't initialize IAudioClient. 尝试失败,因为我无法初始化IAudioClient。
#include <string>
#include <iostream>
#include <Windows.h>
#include <Mmdeviceapi.h>
#include <Audioclient.h>
#include <audiopolicy.h>
#include <sapi.h>
#include <sphelper.h>
#define _CHECK_HR(hr, debug_str) \
if(FAILED(hr)) { \
std::cout << debug_str << ": " << std::hex << "0x" << hr << std::dec << std::endl; \
goto check_failure; \
}
#define CHECK_HR(expr, debug_str) \
_CHECK_HR(expr, debug_str);
#define SAFE_RELEASE(obj) \
if(obj != NULL) { \
obj->Release(); \
obj = NULL; \
}
#define SAFE_FREE(obj) \
if(obj != NULL) { \
CoTaskMemFree(obj); \
obj = NULL; \
}
int main()
{
ISpVoice* voice = NULL;
IMMDeviceEnumerator* device_enumerator = NULL;
IMMDevice* audio_device = NULL;
WAVEFORMATEX *audio_format = NULL;
GUID format_guid;
IAudioClient* audio_client = NULL;
IAudioRenderClient* audio_render_client = NULL;
CHECK_HR(CoInitialize(NULL), "CoInitialize");
CHECK_HR(CoCreateInstance(CLSID_SpVoice, NULL, CLSCTX_ALL, IID_ISpVoice, (LPVOID*)&voice), "CoCreateInstance");
CHECK_HR(CoCreateInstance(__uuidof(MMDeviceEnumerator), NULL, CLSCTX_ALL, __uuidof(IMMDeviceEnumerator), reinterpret_cast<void**>(&device_enumerator)), "CoCreateInstance");
CHECK_HR(device_enumerator->GetDefaultAudioEndpoint(eRender, eMultimedia, &audio_device), "device_enumerator->GetDefaultAudioEndpoint");
CHECK_HR(audio_device->Activate(__uuidof(IAudioClient), CLSCTX_ALL, NULL, reinterpret_cast<void**>(&audio_client)), "audio_device->Activate");
CHECK_HR(SpConvertStreamFormatEnum(SPSF_48kHz16BitStereo, &format_guid, &audio_format), "SpConvertStreamFormatEnum");
CHECK_HR(audio_client->Initialize(AUDCLNT_SHAREMODE_SHARED, AUDCLNT_STREAMFLAGS_NOPERSIST | AUDCLNT_SESSIONFLAGS_DISPLAY_HIDE, 0, 0, audio_format, NULL), "audio_client->Initialize");
CHECK_HR(audio_client->Start(), "audio_client->Start");
CHECK_HR(audio_client->GetService(__uuidof(IAudioRenderClient), reinterpret_cast<void**>(&audio_render_client)), "audio_client->GetService");
CHECK_HR(voice->SetOutput(audio_render_client, FALSE), "voice->SetOutput");
CHECK_HR(voice->Speak(TEXT("This is a test."), 0, NULL), "voice->Speak");
std::cout << "No errors!" << std::endl;
check_failure:
SAFE_RELEASE(device_enumerator);
SAFE_RELEASE(audio_device);
SAFE_FREE(audio_format);
SAFE_RELEASE(audio_client);
SAFE_RELEASE(audio_render_client);
CoUninitialize();
}
Besides that, I've poked around SAPI Audio Interfaces , finding a bunch of other interfaces and implementations, none of which seem particularly useful for this task. 除此之外,我还研究了SAPI音频接口 ,找到了许多其他接口和实现,这些接口和实现似乎都不对这项任务特别有用。 I feel like I'm running in circles here. 我觉得我在圈子里跑。
The question: How can I change the audio format of a voice as eSpeak's TTSApp does? 问题: 如何像eSpeak的TTSApp一样更改语音的音频格式?
Try: 尝试:
ATL::CComPtr<ISpVoice> voice;
voice.CoCreateInstance(CLSID_SpVoice);
CSpStreamFormat format;
format.AssignFormat(SPSF_44kHz16BitMono);
ATL::CComPtr<ISpAudio> audio;
SpCreateDefaultObjectFromCategoryId(SPCAT_AUDIOOUT, &audio);
audio->SetFormat(format.FormatId(), format.WaveFormatExPtr());
voice->SetOutput(audio, FALSE);
NOTE: This does not include any error handling, so your code will need to check HRESULT return codes and object/pointer validity. 注意:这不包括任何错误处理,因此您的代码将需要检查HRESULT返回代码和对象/指针的有效性。
Also note that eSpeak's native output format is 16-bit 22050Hz mono. 另请注意,eSpeak的本机输出格式是16位22050Hz单声道。
For a C version, you will need to handle the COM object lifetime yourself, and look at what CSpStreamFormat
is doing in the AssignFormat
, FormatId
and WaveFormatExPtr
methods. 对于C版本,您需要自己处理COM对象的生存期,并查看CSpStreamFormat
在AssignFormat
, FormatId
和WaveFormatExPtr
方法中正在做什么。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.