简体   繁体   中英

How to create a variable based on the number of unique values in another data frame?

This is a simplified example of what I want to do.

Dataset 1 (DF1) has data of apples (like the size or number of holes), and a second dataset (DF2) has information of worms found inside them, including color, and in which apple they were found. What I want to do is to add a variable in DF1 with the number of unique colors (of the worms) that exist in each apple.

DF1<-data.frame(x=c("A1","A2","A3","A4","A5"),y=c(3,26,5,27,5))
DF2<-data.frame(Q=c("A1","A1","A1","A1","A1","A1","A2","A2","A3","A3","A3","A4","A5","A5","A5","A5"),R=c("red","red","blue","yellow","yellow","blue","orange","orange","green","red","red","blue","blue", "purple","black","red"),S=c(4,5,3,5,4,3,5,4,3,5,4,3,5,4,3,5))

I am new in R, and when trying to solve it I thought of:

DF1$N.Colors<-length(unique(DF2$R[match(DF1$X,DF2$Q)]))

But it gives me back a new variable filled with 0s, instead of the wanted vector:

 DF1$N.Colors<-c(3,1,2,1,4)

I'd appreciate very much your help with it

This could be done by making use of join with the 'Q', 'x' columns of both dataset, count the unique values of 'R' and assign it to a new column in 'DF1'

library(data.table)
DF1$N.Colors <- setDT(DF2)[DF1, uniqueN(R), on = .(Q = x), by = .EACHI]$V1

Or using tidyverse

library(dplyr)
DF2 %>%
   group_by(x = Q) %>%
   summarise(N.Colors = n_distinct(R)) %>%
   right_join(DF1)

A base solution with aggregate() and merge() :

merge(DF1, aggregate(N.Colors ~ Q, list(N.Colors = DF2$R, Q = DF2$Q), function(x) length(unique(x))), all.x = T, by.x = "x", by.y = "Q")

#    x  y N.Colors
# 1 A1  3        3
# 2 A2 26        1
# 3 A3  5        2
# 4 A4 27        1
# 5 A5  5        4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM