基于长列表将列添加到数据框中，另一列中的值太慢

Question

I am adding a new column to a dataframe using apply() and mutate.我正在使用 apply() 和 mutate 向 dataframe 添加一个新列。 It works.有用。 Unfortunately, it is very slow.不幸的是，它非常慢。 I have 24M rows and I am adding column based on values in a long (58 items).我有 24M 行，我正在根据 long（58 项）中的值添加列。 It was bearable with smaller list.较小的列表是可以忍受的。 Not anymore.不再。 Here is my example这是我的例子

large_df <-data.frame(A=(1:4),
                   B= c('a','b','c','d'),
                  C= c('e','f','g','h')) 
long_list = c('e','f','g')

large_df =mutate (large_df, new_C = apply(large_df[,2:3], 1, 
                 function(r) any(r %in% long_list)))

The new column (new_C) will read True or False.新列 (new_C) 将读取 True 或 False。 It works but I am looking for a speedy alternative.它有效，但我正在寻找一个快速的替代方案。

Thank you so much.太感谢了。 Serhiy谢尔伊

Bonus Q. I couldn't just select one column with in apply(), needed range.奖金 Q. 我不能只 select 一列在 apply() 中，需要范围。 Why?为什么？

Answer 1

Try one of these alternatives using lapply :使用lapply尝试以下替代方案之一：

large_df$new_c <- Reduce(`|`, lapply(large_df[, 2:3], `%in%`, long_list))

or sapply :或sapply ：

large_df$new_c <- rowSums(sapply(large_df[, 2:3], `%in%`, long_list)) > 0

Both of which return:两者都返回：

large_df
#  A B C new_c
#1 1 a e  TRUE
#2 2 b f  TRUE
#3 3 c g  TRUE
#4 4 d h FALSE

基于长列表将列添加到数据框中，另一列中的值太慢

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-06-22 02:38:17

基于长列表将列添加到数据框中，另一列中的值太慢

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-06-22 02:38:17

解决方案1
0 已采纳 2020-06-22 02:38:17