简体   繁体   中英

create text file from textGrid using Praat or any other audio processing tool

I have a textGrid file generated by Prosodylab-Aligner which I can open in Praat . Is there any possibility to get out of it a text file that looks like that:

Word in text | Pronounciation started at
Hello          0:0:0.000
my             0:0:1.125
friends        0:0:2.750

EDIT

Attached textGrid file:

File type = "ooTextFile"
Object class = "TextGrid"

xmin = 0.0
xmax = 2.53
tiers? <exists>
size = 2
item []:
    item [1]:
        class = "IntervalTier"
        name = "phones"
        xmin = 0.0
        xmax = 2.53
        intervals: size = 13
            intervals [1]:
                xmin = 0.0
                xmax = 0.62
                text = "sil"
            intervals [2]:
                xmin = 0.62
                xmax = 0.78
                text = "K"
            intervals [3]:
                xmin = 0.78
                xmax = 0.81
                text = "L"
            intervals [4]:
                xmin = 0.81
                xmax = 0.92
                text = "IH1"
            intervals [5]:
                xmin = 0.92
                xmax = 1.02
                text = "K"
            intervals [6]:
                xmin = 1.02
                xmax = 1.07
                text = ""
            intervals [7]:
                xmin = 1.07
                xmax = 1.22
                text = "T"
            intervals [8]:
                xmin = 1.22
                xmax = 1.31
                text = "UW1"
            intervals [9]:
                xmin = 1.31
                xmax = 1.51
                text = "S"
            intervals [10]:
                xmin = 1.51
                xmax = 1.67
                text = "T"
            intervals [11]:
                xmin = 1.67
                xmax = 1.85
                text = "AA1"
            intervals [12]:
                xmin = 1.85
                xmax = 1.88
                text = "P"
            intervals [13]:
                xmin = 1.88
                xmax = 2.53
                text = "sil"
    item [2]:
        class = "IntervalTier"
        name = "words"
        xmin = 0.0
        xmax = 2.53
        intervals: size = 6
            intervals [1]:
                xmin = 0.0
                xmax = 0.62
                text = "sil"
            intervals [2]:
                xmin = 0.62
                xmax = 1.02
                text = "CLICK"
            intervals [3]:
                xmin = 1.02
                xmax = 1.07
                text = "sp"
            intervals [4]:
                xmin = 1.07
                xmax = 1.31
                text = "TO"
            intervals [5]:
                xmin = 1.31
                xmax = 1.88
                text = "STOP"
            intervals [6]:
                xmin = 1.88
                xmax = 2.53
                text = "sil"

The syntax of TextGrid files is a little bit odd. For your restricted purpose, a list of the words and their starting points, your parser could be quite simple:

  1. Find the text line containing 8 spaces and the string 'name = "words"'

  2. Inspect all following lines and stop at the next occurence of 8 spaces and the string 'name = "'

    2a. Save the floating point numbers immediately following 12 spaces and the string 'xmin = '

    2b. Save the strings immediately following 12 spaces and the string 'text = '

The result of this procedure would be:

0.0 0.62 1.02 1.07 1.31 1.88

"sil" "CLICK" "sp" "TO" "STOP" "sil"

Now just re-order these two arrays and you will have your table (the numbers are the starting points given in seconds).

Keep in mind that "sil" is an abbreviation for the meta tag "silence" and "sp" for "speech pause". While the silence at the beginning and end of an utterance is expected, the speech pause might be wrong because the plosive /t/ of the word "TO" starts with an articulatory occlusion, which is pretty similar to a speech pause, but part of the plosive.

Since this is a Praat file, and you say you can open it in Praat , I thought a better solution would be to use Praat to solve it. A script like the following involves a lot fewer leaps of faith:

form Parse TextGrid...
  sentence File /path/to/your.TextGrid
  integer Tier 2
endform
Read from file: file$
intervals = Get number of intervals: tier
writeInfoLine: "Word in text", tab$, "Pronounciation started at"
for i to intervals
  label$ = Get label of interval: tier, i
  if label$ != ""
    start = Get start point: tier, i
    appendInfoLine: label$, tab$, string$(start)
  endif
endfor

If you save that into a script somewhere, you could then call Praat from the command line like praat /path/to/your/script.praat "/path/to/your.TextGrid" 2 and get the desired output from stdout .

You could also run it manually, and maybe use this to write your file.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM