简体   繁体   English

如何从Speech Synthesis API访问音频结果?

[英]How to access audio result from Speech Synthesis API?

The Speech Synthesis API allows text-to-speech functionality in Chrome Beta. Speech Synthesis API允许Chrome Beta中的文本到语音功能。 However, results from TTS requests are automatically played by the browser. 但是,浏览器会自动播放TTS请求的结果。 How do I access the audio results for post-processing and disable the default behavior of the API? 如何访问音频结果以进行后处理并禁用API的默认行为?

There is no standard audio output for the TTS system and that seems quite intentional so it is unlikely to change anytime soon. TTS系统没有标准的音频输出,这看起来非常有意,所以不太可能很快改变。

To understand why, you can look at the other side of this interface where a browser extension can act as a TTS Engine and provide the voices the client can use: 要了解原因,您可以查看此界面的另一面,其中浏览器扩展可以充当TTS引擎并提供客户端可以使用的声音:

Being a valid TTS Engine accessible by this API in chrome is about supporting starting/pausing/canceling and resuming of TTS requests and sending updates on the progress as events of the following types: 作为此API在chrome中可访问的有效TTS引擎 ,支持启动/暂停/取消和恢复TTS请求,并将进度更新作为以下类型的事件发送:

https://developer.chrome.com/extensions/tts#type-TtsEvent https://developer.chrome.com/extensions/tts#type-TtsEvent

As such, there is no standard way for a TTS engine to indicate the resulting audio aside from actually playing it. 因此,除了实际播放之外,TTS引擎没有标准方式来指示产生的音频。 Depending on the specific TTS engine, it may not use a standard audio format or even the browser's normal audio devices access. 根据特定的TTS引擎,它可能不使用标准音频格式甚至浏览器的普通音频设备访问。 (For example, it may be forwarding the text to the platform's accessibility system.) (例如,它可能将文本转发到平台的辅助功能系统。)

If you know something about a specific TTS Engine (or create your own) then you can build your own interface 1 to retrieve the audio file. 如果您对特定TTS引擎有所了解(或创建自己的TTS引擎),那么您可以构建自己的接口1来检索音频文件。 But that TTS Engine must then be installed on every client's browser where you want to use it. 但是,必须在每个要使用它的客户端浏览器上安装TTS引擎。 This is why any solution must point you to a specific TTS Engine or an outside TTS solution if you want to control the playback beyond adjusting valid inputs to a TTS Engine request (relative pitch, relative volume, relative rate, sex.) 这就是为什么任何解决方案都必须指向特定的TTS引擎或外部TTS解决方案,如果您想要控制播放,而不是调整TTS引擎请求的有效输入(相对音高,相对音量,相对速率,性别)。

Notes- 笔记-

1 If you give a TTS Engine such an interface, it can not trivially extend the existing TTS event API since the browser is checking them: 1如果您为TTS引擎提供此类接口,则由于浏览器正在检查它们,因此无法轻松扩展现有的TTS事件API:

// attempt to add properties to an otherwise legal event in an Engine:
sendTTSev({'type': 'end', 'charIndex': len, foo:'george'});
...
Uncaught Error: Invalid value for argument 2. Property 'foo': Unexpected property.
    at validate (extensions::schemaUtils:34:13)
    at Object.normalizeArgumentsAndValidate  (extensions::schemaUtils:117:3)
    at Object.<anonymous> (extensions::binding:361:30)
    at sendTtsEvent (extensions::ttsEngine:17:22)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM