[英]R merge strange behavior
用merge
功能將2 data.frame連接起來,我發現它的參數sort
沒有改變。 例:
id_df <- structure(list(id = c("click", "event", "funnel", "impression",
"tracker", "visibility"),
id_Havas = c("a1", "a2", "a3", "a4", "a5", "a6")),
.Names = c("id", "my_id"), class = "data.frame",
row.names = c(NA, -6L))
my_df <- data.frame("id" = c("click", "click", "impression", "visibility", "click"),
stringsAsFactors = F)
結果:
my_df
# id
# 1 click
# 2 click
# 3 impression
# 4 visibility
# 5 click
merge(my_df, id_df, by = "id", all.x = TRUE, sort = FALSE)
# id my_id
# 1 click a1
# 2 click a1
# 3 click a1
# 4 impression a4
# 5 visibility a6
merge(my_df, id_df, by = "id", all.x = TRUE, sort = TRUE)
# id my_id
# 1 click a1
# 2 click a1
# 3 click a1
# 4 impression a4
# 5 visibility a6
我想念什么嗎?
要保持原始順序,您可以使用match
:
my_df$my_id <- id_df$my_id[match(my_df$id, id_df$id)]
my_df
# id my_id
#1 click a1
#2 click a1
#3 impression a4
#4 visibility a6
#5 click a1
針對您的特定情況,考慮merge
和match
之間的基准比較,考慮60000個不同的id和100000行的my_df
:
f_merge <- function(){merge(my_df, id_df, by = "id", all.x = TRUE, sort = FALSE)}
f_match <- function(){my_df$my_id <- id_df$my_id[match(my_df$id, id_df$id)]}
microbenchmark(f_match(), f_merge(), unit="relative")
# expr min lq mean median uq max neval cld
#f_match() 1.00000 1.00000 1.00000 1.00000 1.00000 1.000000 100 a
#f_merge() 41.16602 46.42379 26.62328 47.59711 17.28836 7.176999 100 b
merge
中的參數sort
如何工作:
從?merge
“值”部分,您可以閱讀:
默認情況下,這些行在公共列上按字典順序進行排序,但對於sort = FALSE,則未指定順序。
因此,所有通用名稱都將“放在一起”,但是如果sort=FALSE
則不對唯一名稱進行排序。
例:
my_df <- data.frame("id" = c("impression", "click", "click", "impression", "visibility", "click"), stringsAsFactors = F)
merge(my_df, id_df, by = "id", all.x = TRUE, sort = FALSE)
# id my_id
#1 impression a4
#2 impression a4
#3 click a1
#4 click a1
#5 click a1
#6 visibility a6
merge(my_df, id_df, by = "id", all.x = TRUE, sort = TRUE)
# id my_id
#1 click a1
#2 click a1
#3 click a1
#4 impression a4
#5 impression a4
#6 visibility a6
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.