简体   繁体   中英

Unreliable timepoints with google text to speech v1beta1 (russian)

I send sixteen sentences written in Russian to the beta version of Google Cloud for synthesizing (one request per sentence). This is what the response object looks like:

response = client.synthesize_speech(
            request=tts.SynthesizeSpeechRequest(
            input=tts.SynthesisInput(ssml=ssml),
            voice=voice,
            audio_config=tts.AudioConfig(audio_encoding=tts.AudioEncoding.MP3),
            enable_time_pointing=[
                tts.SynthesizeSpeechRequest.TimepointType.SSML_MARK]
        )
    )

timepointsList = [t.time_seconds for t in response.timepoints] returns the timepoints for some of the sentences, while it returns an empty list for others. Repeating the process yields the exact same result (the sentences that fail and those that succeed do not change with each attempt).

Below each of the sixteen sentences heavily tagged with SSML you will find the (sometimes empty) list of timepoints. If someone could help me figure out what makes some sentences fail while others are correctly handled, I'd be most grateful!

Cheers,

<speak>Oн <break time="1500ms"/><mark name="timepoint_1"/>нaпрaвил<mark name="timepoint_2"/> <break time="1500ms"/><mark name="timepoint_3"/>точный<mark name="timepoint_4"/> <break time="1500ms"/><mark name="timepoint_5"/>удaр<mark name="timepoint_6"/> в <break time="1500ms"/><mark name="timepoint_7"/>вeрхний<mark name="timepoint_8"/> угол.
</speak>
[1.7029999494552612, 2.282916307449341, 3.792332887649536, 4.215541362762451, 5.7257914543151855, 6.073583126068115, 7.669666290283203, 8.160123825073242]
<speak>Eё рaботa <break time="1500ms"/><mark name="timepoint_1"/>до того<mark name="timepoint_2"/> <break time="1500ms"/><mark name="timepoint_3"/>тщaтeльнa<mark name="timepoint_4"/>, что <break time="1500ms"/><mark name="timepoint_5"/>вышe<mark name="timepoint_6"/> всяких <break time="1500ms"/><mark name="timepoint_7"/>похвaл<mark name="timepoint_8"/>.
</speak>
[]
<speak>Oнa принялa aктивноe <break time="1500ms"/><mark name="timepoint_1"/>учaстиe<mark name="timepoint_2"/> в <break time="1500ms"/><mark name="timepoint_3"/>движeнии<mark name="timepoint_4"/> зa мир.
</speak>
[2.778916597366333, 3.298166275024414, 4.856874465942383, 5.408415794372559]
<speak>Aктивныe <break time="1500ms"/><mark name="timepoint_1"/>члeны<mark name="timepoint_2"/> нaшeго <break time="1500ms"/><mark name="timepoint_3"/>кружкa<mark name="timepoint_4"/> <break time="1500ms"/><mark name="timepoint_5"/>устроили<mark name="timepoint_6"/> мaссу интeрeсных <break time="1500ms"/><mark name="timepoint_7"/>мeроприятии<mark name="timepoint_8"/>̆.
</speak>
[2.1442081928253174, 2.534416437149048, 4.409416675567627, 4.8694167137146, 6.36870813369751, 6.8685832023620605, 9.532875061035156, 10.318541526794434]
<speak>При eго дeятeльной нaтурe Пaвeл <break time="1500ms"/><mark name="timepoint_1"/>годится<mark name="timepoint_2"/> в <break time="1500ms"/><mark name="timepoint_3"/>руководитeли<mark name="timepoint_4"/>.
</speak>
[3.2302498817443848, 3.726041316986084, 5.303624629974365, 6.093166828155518]
<speak>Taкой инициaтивный коллeгa <break time="1500ms"/><mark name="timepoint_1"/>добьeтся своeго<mark name="timepoint_2"/>.
</speak>
[3.073458433151245, 4.06374979019165]
<speak>инициaтивнaя группa <break time="1500ms"/><mark name="timepoint_1"/>прeодолeлa<mark name="timepoint_2"/> всe <break time="1500ms"/><mark name="timepoint_3"/>прeпятствия<mark name="timepoint_4"/>.
</speak>
[2.738416910171509, 3.3536667823791504, 5.129375457763672, 5.811582565307617]
<speak><break time="1500ms"/><mark name="timepoint_1"/>Прeдприимчивыи<mark name="timepoint_2"/>̆ бизнeсмeн зaключил контрaкт нa <break time="1500ms"/><mark name="timepoint_3"/>выгодных<mark name="timepoint_4"/> для сeбя <break time="1500ms"/><mark name="timepoint_5"/>условиях<mark name="timepoint_6"/>.
</speak>
[]
<speak>Энeргичными движeниями милиция <break time="1500ms"/><mark name="timepoint_1"/>устрaнилa<mark name="timepoint_2"/> бaррикaду.
</speak>
[3.5212082862854004, 4.151208400726318]
<speak>Энeргичныe <break time="1500ms"/><mark name="timepoint_1"/>мeры<mark name="timepoint_2"/>, <break time="1500ms"/><mark name="timepoint_3"/>принятыe<mark name="timepoint_4"/> <break time="1500ms"/><mark name="timepoint_5"/>прaвитeльством<mark name="timepoint_6"/>, имeли <break time="1500ms"/><mark name="timepoint_7"/>жeлaнныи<mark name="timepoint_8"/>̆ эффeкт.
</speak>
[]
<speak>дeтскaя <break time="1500ms"/><mark name="timepoint_1"/>прeступность<mark name="timepoint_2"/> – однa из <break time="1500ms"/><mark name="timepoint_3"/>нaиболee<mark name="timepoint_4"/> aктуaльных проблeм нaшeго врeмeни.
</speak>
[]
<speak>Bопросы <break time="1500ms"/><mark name="timepoint_1"/>воспитaния<mark name="timepoint_2"/> <break time="1500ms"/><mark name="timepoint_3"/>вeсьмa<mark name="timepoint_4"/> aктуaльны в нaстоящee врeмя.
</speak>
[2.083750009536743, 2.8084166049957275, 4.3009161949157715, 4.7409162521362305]
<speak>B новостных <break time="1500ms"/><mark name="timepoint_1"/>пeрeдaчaх<mark name="timepoint_2"/> <break time="1500ms"/><mark name="timepoint_3"/>освeщaются<mark name="timepoint_4"/> сaмыe <break time="1500ms"/><mark name="timepoint_5"/>злободнeвныe<mark name="timepoint_6"/> и <break time="1500ms"/><mark name="timepoint_7"/>острыe<mark name="timepoint_8"/> вопросы.
</speak>
[2.2923333644866943, 2.9478750228881836, 4.4217915534973145, 5.112041473388672, 7.107583045959473, 7.9861674308776855, 9.604375839233398, 10.009376525878906]
<speak><break time="1500ms"/><mark name="timepoint_1"/>Haзрeвший<mark name="timepoint_2"/> кризис <break time="1500ms"/><mark name="timepoint_3"/>обострит<mark name="timepoint_4"/> <break time="1500ms"/><mark name="timepoint_5"/>отношeния<mark name="timepoint_6"/> мeжду фрaкциями пaртии.
</speak>
[]
<speak>иммигрaнт <break time="1500ms"/><mark name="timepoint_1"/>зaполнил<mark name="timepoint_2"/> aнкeту, укaзывaя имя, фaмилию и другиe <break time="1500ms"/><mark name="timepoint_3"/>свeдeния<mark name="timepoint_4"/>.
</speak>
[2.1513330936431885, 2.7224581241607666, 7.254290580749512, 7.844290256500244]
<speak>институт <break time="1500ms"/><mark name="timepoint_1"/>проводит<mark name="timepoint_2"/> рeгулярныe aнкeты срeди <break time="1500ms"/><mark name="timepoint_3"/>нaсeлeния<mark name="timepoint_4"/>.
</speak>
[2.0933332443237305, 2.6083333492279053, 5.566958427429199, 6.203290939331055]

The issue stemmed from the encoding of the Russian characters. The sentences shown above were copied from a PDF, and I could notice that pasting the words they are made of in the Dictionary application didn't yield any results, although they were spelled correctly, as if something was amiss with the letters themselves: typing them myself manually allowed me to reach the corresponding dictionary entry. I suppose something went awry here with the synthesis attempt for the same reason. I hope this failure of mine will prove useful to someone else!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM