简体   繁体   English

在 Google-text-to-speech 中添加暂停

[英]Adding a pause in Google-text-to-speech

I am looking for a small pause, wait, break or anything that will allow for a short break (looking for about 2 seconds +-, configurable would be ideal) when speaking out the desired text.当说出所需的文本时,我正在寻找一个小的暂停、等待、休息或任何允许短暂休息的东西(寻找大约 2 秒 +-,可配置将是理想的)。

People online have said that adding three full stops followed by a space creates a break but I don't seem to be getting that.网上有人说,添加三个句号后跟一个空格会造成休息,但我似乎没有明白这一点。 Code below is my test that has no pauses, sadly.. Any ideas or suggestions?下面的代码是我的测试,没有停顿,可悲的是..有什么想法或建议吗?

Edit: It would be ideal if there is some command from gTTS that would allow me to do this, or maybe some trick like using the three full stops if that actually worked.编辑:如果 gTTS 有一些命令可以让我这样做,或者像使用三个句号这样的技巧,如果这确实有效,那将是理想的。

from gtts import gTTS
import os

tts = gTTS(text=" Testing ... if there is a pause ... ... ... ... ...  longer pause? ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... insane pause   " , lang='en', slow=False)

tts.save("temp.mp3")
os.system("temp.mp3")

Ok, you need Speech Synthesis Markup Language (SSML) to achieve this.好的,您需要语音合成标记语言 (SSML) 来实现这一点。
Be aware you need to setting up Google Cloud Platform credentials请注意,您需要设置Google Cloud Platform凭据

first in the bash:首先在 bash 中:

pip install --upgrade google-cloud-texttospeech

Then here is the code:然后这里是代码:

import html
from google.cloud import texttospeech

def ssml_to_audio(ssml_text, outfile):
    # Instantiates a client
    client = texttospeech.TextToSpeechClient()

    # Sets the text input to be synthesized
    synthesis_input = texttospeech.SynthesisInput(ssml=ssml_text)

    # Builds the voice request, selects the language code ("en-US") and
    # the SSML voice gender ("MALE")
    voice = texttospeech.VoiceSelectionParams(
        language_code="en-US", ssml_gender=texttospeech.SsmlVoiceGender.MALE
    )

    # Selects the type of audio file to return
    audio_config = texttospeech.AudioConfig(
        audio_encoding=texttospeech.AudioEncoding.MP3
    )

    # Performs the text-to-speech request on the text input with the selected
    # voice parameters and audio file type
    response = client.synthesize_speech(
        input=synthesis_input, voice=voice, audio_config=audio_config
    )

    # Writes the synthetic audio to the output file.
    with open(outfile, "wb") as out:
        out.write(response.audio_content)
        print("Audio content written to file " + outfile)

def text_to_ssml(inputfile):

    raw_lines = inputfile

    # Replace special characters with HTML Ampersand Character Codes
    # These Codes prevent the API from confusing text with
    # SSML commands
    # For example, '<' --> '&lt;' and '&' --> '&amp;'

    escaped_lines = html.escape(raw_lines)

    # Convert plaintext to SSML
    # Wait two seconds between each address
    ssml = "<speak>{}</speak>".format(
        escaped_lines.replace("\n", '\n<break time="2s"/>')
    )

    # Return the concatenated string of ssml script
    return ssml



text = """Here are <say-as interpret-as="characters">SSML</say-as> samples.
  I can pause <break time="3s"/>.
  I can play a sound"""

ssml = text_to_ssml(text)
ssml_to_audio(ssml, "test.mp3")

More documentation:更多文档:
Speaking addresses with SSML 用 SSML 说地址

But if you don't have Google Cloud Platform credentials , the cheaper and easier way is to use time.sleep(1) method但如果您没有Google Cloud Platform 凭证,更便宜、更简单的方法是使用 time.sleep(1) 方法

If there is any background waits required, you can use the time module to wait as below.如果需要任何后台等待,您可以使用 time 模块进行等待,如下所示。

import time
# SLEEP FOR 5 SECONDS AND START THE PROCESS
time.sleep(5)

Or you can do a 3 time check with wait etc..或者您可以使用等待等方式进行 3 次检查。

import time

for tries in range(3):
    if someprocess() is False:
        time.sleep(3)

Sadly the answer is no , gTTS package has no additional function for pause , an issue already been created in 2018 for adding a pause function ,but it is smart enough to add natural pauses by tokenizer .可悲的是,答案是否定的,腹围包已经无需任何额外的功能pause已经在2018年被创造了一个问题,增加了暂停功能,但它是足够聪明,通过添加天然暂停标记生成器

What is tokenizer?什么是分词器?

Function that takes text and returns it split into a list of tokens (strings).接收文本并将其返回的函数将其拆分为一个标记列表(字符串)。 In the gTTS context, its goal is to cut the text into smaller segments that do not exceed the maximum character size allowed(100) for each TTS API request, while making the speech sound natural and continuous.在 gTTS 上下文中,其目标是将文本切割成不超过每个 TTS API 请求允许的最大字符大小 (100) 的更小段,同时使语音听起来自然且连续。 It does so by splitting text where speech would naturaly pause (for example on "." ) while handling where it should not (for example on “10.5” or “USA”).它通过拆分文本来实现语音自然停顿(例如在"." ),同时处理不应该停顿的地方(例如在“10.5”或“USA”上)。 Such rules are called tokenizer cases, which it takes a list of.这样的规则称为分词器案例,它需要一个列表。

Here is an example:下面是一个例子:

text = "regular text speed no pause regular text speed comma pause, regular text speed period pause. regular text speed exclamation pause! regular text speed ellipses pause... regular text speed new line pause \n regular text speed "

So in this case, adding a sleep() seems like the only answer.所以在这种情况下,添加sleep()似乎是唯一的答案。 But tricking the tokenizer is worth mentioning.但值得一提的是欺骗分词器。

You can save multiple mp3 files, then use time.sleep() to call each with your desired amount of pause:您可以保存多个 mp3 文件,然后使用time.sleep()以所需的暂停量调用每个文件:

from gtts import gTTS
import os
from time import sleep

tts1 = gTTS(text="Testingn" , lang='en', slow=False)
tts2 = gTTS(text="if there is a pause" , lang='en', slow=False)
tts3 = gTTS(text="insane pause   " , lang='en', slow=False)

tts1.save("temp1.mp3")
tts2.save("temp2.mp3")
tts3.save("temp3.mp3")

os.system("temp1.mp3")
sleep(2)
os.system("temp2.mp3")
sleep(3)
os.system("temp3.mp3")

Late to the party here, but you might consider trying out the audio_program_generator package .迟到了,但您可以考虑试用audio_program_generator 包 You provide a text file comprised of individual phrases, each of which has a configurable pause at the end.您提供一个由单个短语组成的文本文件,每个短语末尾都有一个可配置的暂停。 In return, it gives you an mp3 file that 'stitches together' all the phrases and their pauses into one continuous audio file.作为回报,它会为您提供一个 mp3 文件,该文件将所有短语及其停顿“拼接”成一个连续的音频文件。 You can optionally mix in a background sound-file, as well.您也可以选择混入背景声音文件。 And it implements several of the other bells and whistles that Google TTS provides, like accents, slow-play-speech, etc.它实现了 Google TTS 提供的其他一些花里胡哨的功能,例如口音、慢速播放语音等。

Disclaimer: I am the author of the package.免责声明:我是该包的作者。

You can add arbitrary pause with Pydub by saving and concatenating temporary mp3 .您可以通过保存和连接临时 mp3来使用 Pydub 添加任意暂停。 Then you can use a silent audio for your pause.然后,您可以使用无声音频暂停。 You can use any break point symbols of your choice where you want to add pause (here $):您可以在要添加暂停的位置使用您选择的任何断点符号(此处为 $):

from pydub import AudioSegment
from gtts import gTTS

contents = "Hello with $$ 2 seconds pause"
contents.split("$") # I have chosen this symbol for the pause.
pause2s = AudioSegment.from_mp3("silent.mp3") 
# silent.mp3 contain 2s blank mp3 
cnt = 0
for p in parts:
       # The pause will happen for the empty element of the list
       if not p:
            combined += pause2s
       else:
            tts = gTTS(text=p , lang=langue, slow=False)
            tmpFileName="tmp"+str(cnt)+".mp3"
            tts.save(tmpFileName)
            combined+=AudioSegment.from_mp3(tmpFileName) 
       cnt+=1
                
combined.export("out.mp3", format="mp3")  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM