在R中合並具有特定條件的兩個數據幀

Question

我有兩個數據框：

df1
Syllable Duration Pitch
@         0.08    93
@         0.05    107
@         0.13    56
@         0.07    95
@         0.07    123

df2
Syllable Duration 
@        0.08 
@        0.05 
@        0.07
@        0.07

我想將它們合並到另一個數據框中：

df3
Syllable Duration Pitch
@        0.08     93
@        0.05     107
@        0.07     95
@        0.07     123

問題是我重復了Syllable和Duration值。 我已經試過了這段代碼，但是它給了我不正確的Pitch：

df3 <- merge(df2, df1[!duplicated(df1$Syllable),], by="Syllable")

df3
Syllable Duration Pitch
@        0.08     93
@        0.05     93
@        0.07     93
@        0.07     93

Answer 1

使用data.table您可以執行以下操作：

library("data.table")
df1 <- fread(
"Syllable Duration Pitch
@ 0.08 93
@ 0.05 107
@ 0.13 56
@ 0.07 95
@ 0.07 123")
df2 <- fread(
"Syllable Duration 
@ 0.08 
@ 0.05 
@ 0.07
@ 0.07")
merge(df1, unique(df2))
# > merge(df1, unique(df2))
#    Syllable Duration Pitch
# 1:        @     0.05   107
# 2:        @     0.07    95
# 3:        @     0.07   123
# 4:        @     0.08    93

或不排序：

merge(df1, unique(df2), sort=FALSE)
# > merge(df1, unique(df2), sort=FALSE)
#    Syllable Duration Pitch
# 1:        @     0.08    93
# 2:        @     0.05   107
# 3:        @     0.07    95
# 4:        @     0.07   123

最后這是一樣的：

df1[unique(df2), on=c("Syllable", "Duration")]
# > df1[unique(df2), on=c("Syllable", "Duration")]
#    Syllable Duration Pitch
# 1:        @     0.08    93
# 2:        @     0.05   107
# 3:        @     0.07    95
# 4:        @     0.07   123

使用基數`R` ：

df1 <- read.table(header=TRUE, text=
"Syllable Duration Pitch
@         0.08    93
@         0.05    107
@         0.13    56
@         0.07    95
@         0.07    123")

df2 <- read.table(header=TRUE, text=
"Syllable Duration 
@        0.08 
@        0.05 
@        0.07
@        0.07 ")
merge(df1, unique(df2))
merge(df1, unique(df2), sort=FALSE)

Answer 2

我建議使用dplyr軟件包。 如果使用它，則可以選擇要作為連接依據的列。 加入時，應使用semi_join而不是inner_join 。 區別在於inner_join保留所有組合，並可能重復行（“如果x和y之間存在多個匹配項，則返回所有匹配項組合。”）

semi_join卻這樣做：“半semi_join不同於內部semi_join ，因為內部semi_join將為y的每個匹配行返回x的一行，其中半聯接將永遠不會重復x的行。”

對於您的情況，可以使用semi_join(df1, df2, by = c("Syllable", "Duration"))合並數據幀。 by向量定義您要作為聯接依據的列名。

這給您您想要的：

  Syllable Duration Pitch
1        @     0.08    93 
2        @     0.05   107
3        @     0.07    95
4        @     0.07   123

Answer 3

#now keeps unique values for Syllable and the Pitch Values

df1 <- df1[order(df1$Syllable),]

df4<-merge(df2,df1)

df5<-df4[!duplicated(df4$Syllable),]

在R中合並具有特定條件的兩個數據幀

問題描述

3 個解決方案

解決方案1
4 2018-06-05 11:49:29

使用基數`R` ：

解決方案2
1 已采納 2018-06-05 11:45:52

解決方案3
1 2018-06-05 12:27:51

在R中合並具有特定條件的兩個數據幀

問題描述

3 個解決方案

解決方案1 4 2018-06-05 11:49:29

使用基數R ：

解決方案2 1 已采納 2018-06-05 11:45:52

解決方案3 1 2018-06-05 12:27:51

解決方案1
4 2018-06-05 11:49:29

使用基數`R` ：

解決方案2
1 已采納 2018-06-05 11:45:52

解決方案3
1 2018-06-05 12:27:51