[英]Create rows in a data frame based on other rows and column combination in R
I have a problem with a data frame in R, I have some data with two dimensions and one metric, but some combination of categories don't have data. 我在R中的数据框有问题,我有一些具有二维和一个度量的数据,但是某些类别的组合没有数据。 My data look like these:
我的数据如下所示:
interestAffinityCategory userGender users
1 Music Lovers male 198
2 Music Lovers female 190
3 News Junkies & Avid Readers male 134
4 News Junkies & Avid Readers female 115
5 Sports Fans male 109
6 Movie Lovers male 108
7 Technophiles male 93
8 TV Lovers male 88
9 TV Lovers female 79
10 Technophiles female 70
Example, Sport Fans, only have data for male gender. 例如,体育迷只有男性数据。 I need all the categories, even with a 0 value in the users column.
我需要所有类别,即使用户列中的值为0。 Like: Sport Fans, female, 0 How my data need to be: (line 8 and 6)
像:体育迷,女,0我的数据需要如何:(第8行和第6行)
interestAffinityCategory userGender users
1 Music Lovers male 198
2 Music Lovers female 190
3 News Junkies & Avid Readers male 134
4 News Junkies & Avid Readers female 115
5 Sports Fans male 109
6 Sports Fans female 0
7 Movie Lovers male 108
8 Movie Lovers female 0
9 Technophiles male 93
10 TV Lovers male 88
11 TV Lovers female 79
12 Technophiles female 70
I tried to find a solution, but I only find similar cases, but with only one dimension, and it didn't work for me. 我试图找到一个解决方案,但我只发现了类似的案例,但是只有一个维度,对我来说不起作用。
Ps.: This data is from the Google Analytics API, I want to get the top 10 categories, and make a graph with visits by gender, but for it, I need to show data for all mix of categories and gender, even with 0 visits. 附言:此数据来自Google Analytics(分析)API,我想获得排名前10位的类别,并制作一张按性别进行访问的图表,但为此,我需要显示所有类别和性别组合的数据,即使0访问。
You should use the complete
function from tidyr
. 您应该使用
tidyr
的complete
函数。 The first argument is your data, second and third are the columns that you want to find all possible comibnations (if you have more, you can just list them one by one), and fill
is a list with the default values to fill in. 第一个参数是数据,第二个和第三个是要查找所有可能的组合的列(如果有更多组合,则可以一个一个地列出),并且
fill
是一个列表,其中包含要填充的默认值。
complete(data, interestAffinityCategory, userGender, fill=list(users=0))
You could create a data frame of all combinations of categories with users
set to zero. 您可以创建一个类别所有组合的数据框,并将
users
设置为零。 Then you can combine the two data frames and for each combination of categories keep the maximum value for users. 然后,您可以组合两个数据框,并为类别的每个组合保留用户的最大值。
You can create a data frame with all combinations using expand.grid()
: 您可以使用
expand.grid()
创建具有所有组合的数据框:
all_levels_0 <- expand.grid(levels(data$interestAffinityCategory), levels(data$userGender))
all_levels_0$users <- 0
names(all_levels_0) <- names(data)
head(all_levels_0)
## interestAffinityCategory userGender users
## 1 Movie Lovers female 0
## 2 Music Lovers female 0
## 3 News Junkies & Avid Readers female 0
## 4 Sports Fans female 0
## 5 Technophiles female 0
## 6 Technophiles female 0
(This assumes that data$interestAffinityCategory
and data$userGender
are both factors. If they are characters, use unique()
instead of levels()
.) (这假定
data$interestAffinityCategory
和data$userGender
都是因素。如果它们是字符,请使用unique()
而不是levels()
。)
For the second step, I use the dplyr package: 对于第二步,我使用dplyr包:
library(dplyr)
all_levels <- bind_rows(data, all_levels_0) %>%
group_by(interestAffinityCategory, userGender) %>%
summarise(users = max(users))
head(all_levels)
## Source: local data frame [6 x 3]
## Groups: interestAffinityCategory [3]
##
## interestAffinityCategory userGender users
## (fctr) (fctr) (dbl)
## 1 Movie Lovers female 0
## 2 Movie Lovers male 108
## 3 Music Lovers female 190
## 4 Music Lovers male 198
## 5 News Junkies & Avid Readers female 115
## 6 News Junkies & Avid Readers male 134
If you prefer not to use dplyr, you can do the same with rbind()
and aggregate()
from base R: 如果你不喜欢使用dplyr,你可以用同样的
rbind()
和aggregate()
从基础R:
combined <- rbind(data, all_levels_0)
all_levels <- aggregate(users ~ interestAffinityCategory + userGender,
data = combined, FUN = max)
head(all_levels)
## interestAffinityCategory userGender users
## 1 Movie Lovers female 0
## 2 Music Lovers female 190
## 3 News Junkies & Avid Readers female 115
## 4 Sports Fans female 0
## 5 Technophiles female 70
## 6 Technophiles female 0
(This orders the rows differently, so the first few rows are not the same as in the dplyr example.) (这对行进行了不同的排序,因此前几行与dplyr示例中的行不同。)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.