[英]Convert Speech Start and End Time into Time Series

我希望將以下 R 數據幀轉換為以秒為索引的數據幀,但不知道該怎么做。 也許 dcast 但隨后對如何擴展正在說的單詞感到困惑。

startTime endTime           word
1     1.900s  2.300s         hey
2     2.300s  2.800s         I'm
3     2.800s      3s        John
4         3s  3.400s       right
5     3.400s  3.500s         now
6     3.500s  3.800s           I
7     3.800s  4.300s        help

Time           word
1.900s         hey
2.000s         hey
2.100s         hey
2.200s         hey
2.300s         I'm
2.400s         I'm
2.500s         I'm
2.600s         I'm
2.700s         I'm
2.800s         John
2.900s         John
3.000s         right
3.100s         right
3.200s         right
3.300s         right


編輯:根據 OP 的反饋,因為他的數據有重復的startTime

step = 0.1
df %>% group_by(rnum = row_number()) %>%
  expand(Time = seq(startTime, max(startTime, (endTime-step)), by=step), word = word) %>%
  arrange(Time) %>% 
  ungroup() %>%

# # A tibble: 24 x 2
# # Groups: word [7]
#    Time word 
#   <dbl> <chr>
# 1  1.90 hey  
# 2  2.00 hey  
# 3  2.10 hey  
# 4  2.20 hey  
# 5  2.30 I'm  
# 6  2.40 I'm  
# 7  2.50 I'm  
# 8  2.60 I'm  
# 9  2.70 I'm  
# 10  2.80 John
# ... with 14 more rows


df <- read.table(text = 
"startTime endTime           word
     1.900  2.300         hey
     2.300  2.800         I'm
     2.800      3        John
     3      3.400       right
     3.400  3.500         now
     3.500  3.800           I
     3.800  4.300        help",
header = TRUE, stringsAsFactors = FALSE)

dcast()用於將數據從長格式改造成寬格式(從而聚合),而 OP 想要從寬格式改成長格式,從而填充丟失的時間戳。

有一種使用non-equi join的替代方法。



cols <- stringr::str_subset(names(DF), "Time$")
setDT(DF)[, (cols) := lapply(.SD, function(x) as.numeric(stringr::str_replace(x, "s", ""))), 
          .SDcols = cols]


創建涵蓋整個時間段的時間戳序列並將其正確連接到數據集,但僅保留那些落在給定時間間隔內的時間戳。 從接受的答案來看,似乎endTime不得包含在結果中。 因此,必須相應地調整連接條件。

DF[DF[, CJ(time = seq(min(startTime), max(endTime), 0.1))], 
   on = .(startTime <= time, endTime > time), nomatch = 0L][
     , endTime := NULL][]   # a bit of clean-up
 startTime word 1: 1.9 hey 2: 2.0 hey 3: 2.1 hey 4: 2.2 hey 5: 2.3 I'm 6: 2.4 I'm 7: 2.5 I'm 8: 2.6 I'm 9: 2.7 I'm 10: 2.8 John 11: 2.9 John 12: 3.0 right 13: 3.1 right 14: 3.2 right 15: 3.3 right 16: 3.4 now 17: 3.5 I 18: 3.6 I 19: 3.7 I 20: 3.8 help 21: 3.9 help 22: 4.0 help 23: 4.1 help 24: 4.2 help startTime word


nomatch = 0L在對話中出現間隙時避免 NA 行。


DF <- fread("
rn startTime endTime           word
1     1.900s  2.300s         hey
2     2.300s  2.800s         I'm
3     2.800s      3s        John
4         3s  3.400s       right
5     3.400s  3.500s         now
6     3.500s  3.800s           I
7     3.800s  4.300s        help
", drop = 1L)


