繁体   English   中英

给定条件并提高R中代码的效率,将值从另一个数据帧追加到一个数据帧

[英]Append values to a data frame from another data frame, given a condition and improving efficiency of the code in R

我有一个名为train的数据集,当两个数据集中的created_at属性和user_id属性都匹配时,我希望将这些值追加到total列中。 以下是我编写的代码。

total = read.csv('Data.csv')
train = read.csv('train.csv', sep='\t')

train$lang=NA
train$tweet_lang=NA
train$time_zone=NA
train$instrumentalness=NA
train$liveness=NA

for (i in 1:nrow(train))
{
    train[i,'lang'] = total[which( total$created_at == as.character(train[i,'created_at']) && total$user_id == as.character(train[i,'user_id']) ),'lang']
    train[i,'tweet_lang'] = total[which( total$created_at == as.character(train[i,'created_at'])&& total$user_id == as.character(train[i,'user_id']) ),'tweet_lang']
    train[i,'time_zone'] = total[which( total$created_at == as.character(train[i,'created_at'])&& total$user_id == as.character(train[i,'user_id']) ),'time_zone']
    train[i,'instrumentalness'] = total[which( total$created_at == as.character(train[i,'created_at'])&& total$user_id == as.character(train[i,'user_id']) ),'instrumentalness']
    train[i,'liveness'] = total[which( total$created_at == as.character(train[i,'created_at'])&& total$user_id == as.character(train[i,'user_id']) ),'liveness']
    }

但是,对于i=3 ,我得到错误: Error in x[...] <- m : replacement has length zero 我怎样才能填充数据集中的值train即使它是一个空字符串? 同样,此实现(使用循环)非常慢。 有什么方法可以向量化或并行化代码以使其运行更快?

您想要的是实现的左联接,例如在dplyr软件包中。

library(dplyr)

df1 <- data.frame(
    a = 1:4
    , b = letters[1:4]
)

df2 <- data.frame(
    a = 1:3
    , c = LETTERS[1:3]
)

df1 %>%
    left_join(df2, by = "a")

这导致

  a b    c
1 1 a    A
2 2 b    B
3 3 c    C
4 4 d <NA>

%>%称为管道运算符。 您可以在“ R for Data Science”一书中了解有关dplyr软件包的更多信息,也可以在此处在线找到

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM