简体   繁体   中英

Faster looping in r

I have two data frames Test and User.

Test has 100 000 rows while User has 1 400 000 rows. I want to extract specific vectors from User data frame and merge this with Test data frame. Ex I want Income and Cat for every row in Test from User. Rows in Test is with repeated elements and I want any one value from User file. I want to keep the test file without removing duplicates.

Ex for Name A Income is 100 , Cat is M & L. Since M occurs first I need M.

> Test  
Name Income  Cat    
A  
B  
C  
D  
...  

User Cat Income  
A    M     100  
B    M     320  
C    U     400  
D    L     900  
A    L     100  
..  

I used for loop but takes lot of time. I do not want to use merge function.

for (i in 1:nrow(Test)
{
{ Test[i,"Cat"]<-User[which(User$Name==Test[i,"Name"]),"Cat"][1]}
 { Test[i,"Income"]<-User[which(User$Name==Test[i,"Name"]),"Income"][1]}}

I used merge as well but the overall count for Test file is more than 100k rows. It is appending extra elements.

I want a faster way to do by avoiding for loop and merge. Can someone suggest any apply family functions.

You can use match to find the first matching row (then vectorize the copying):

# Setup the data
User=data.frame(User=c('A','B','C','D','A'),Cat=c('M','M','U','L','L'),
                Income=c(100,320,400,900,100))
Test=data.frame(Name=c('A','B','C','D'))
Test$Income<-NA
Test$Cat<-NA

> Test
  Name Income Cat
1    A     NA  NA
2    B     NA  NA
3    C     NA  NA
4    D     NA  NA


## Copy only the first match to from User to Test
Test[,c("Income","Cat")]<-User[match(Test$Name,User$User),c("Income","Cat")]

> Test
  Name Income Cat
1    A    100   M
2    B    320   M
3    C    400   U
4    D    900   L

Using dplyr package you can do something like this:

library(dplyr)
df %>% group_by(Name) %>% slice(1)

For your example, you get:

Original data frame:

df
  Name Cat Income
1    A   M    100
2    B   M    320
3    C   U    400
4    D   L    900
5    A   L    100

Picking first occurrence:

df %>% group_by(Name) %>% slice(1)
Source: local data frame [4 x 3]
Groups: Name [4]

   Name   Cat Income
  (chr) (chr)  (int)
1     A     M    100
2     B     M    320
3     C     U    400
4     D     L    900

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM