Faster looping in r

Question

I have two data frames Test and User.

Test has 100 000 rows while User has 1 400 000 rows. I want to extract specific vectors from User data frame and merge this with Test data frame. Ex I want Income and Cat for every row in Test from User. Rows in Test is with repeated elements and I want any one value from User file. I want to keep the test file without removing duplicates.

Ex for Name A Income is 100 , Cat is M & L. Since M occurs first I need M.

> Test  
Name Income  Cat    
A  
B  
C  
D  
...  

User Cat Income  
A    M     100  
B    M     320  
C    U     400  
D    L     900  
A    L     100  
..

I used for loop but takes lot of time. I do not want to use merge function.

for (i in 1:nrow(Test)
{
{ Test[i,"Cat"]<-User[which(User$Name==Test[i,"Name"]),"Cat"][1]}
 { Test[i,"Income"]<-User[which(User$Name==Test[i,"Name"]),"Income"][1]}}

I used merge as well but the overall count for Test file is more than 100k rows. It is appending extra elements.

I want a faster way to do by avoiding for loop and merge. Can someone suggest any apply family functions.

Answer 1

You can use match to find the first matching row (then vectorize the copying):

# Setup the data
User=data.frame(User=c('A','B','C','D','A'),Cat=c('M','M','U','L','L'),
                Income=c(100,320,400,900,100))
Test=data.frame(Name=c('A','B','C','D'))
Test$Income<-NA
Test$Cat<-NA

> Test
  Name Income Cat
1    A     NA  NA
2    B     NA  NA
3    C     NA  NA
4    D     NA  NA


## Copy only the first match to from User to Test
Test[,c("Income","Cat")]<-User[match(Test$Name,User$User),c("Income","Cat")]

> Test
  Name Income Cat
1    A    100   M
2    B    320   M
3    C    400   U
4    D    900   L

Answer 2

Using dplyr package you can do something like this:

library(dplyr)
df %>% group_by(Name) %>% slice(1)

For your example, you get:

Original data frame:

df
  Name Cat Income
1    A   M    100
2    B   M    320
3    C   U    400
4    D   L    900
5    A   L    100

Picking first occurrence:

df %>% group_by(Name) %>% slice(1)
Source: local data frame [4 x 3]
Groups: Name [4]

   Name   Cat Income
  (chr) (chr)  (int)
1     A     M    100
2     B     M    320
3     C     U    400
4     D     L    900

Faster looping in r

Question

2 answers

solution1
1 ACCPTED 2016-01-28 19:37:56

solution2
0 2016-01-28 19:13:19

Faster looping in r

Question

2 answers

solution1 1 ACCPTED 2016-01-28 19:37:56

solution2 0 2016-01-28 19:13:19

solution1
1 ACCPTED 2016-01-28 19:37:56

solution2
0 2016-01-28 19:13:19