根据R中的其他行和列组合在数据框中创建行

Question

I have a problem with a data frame in R, I have some data with two dimensions and one metric, but some combination of categories don't have data. 我在R中的数据框有问题，我有一些具有二维和一个度量的数据，但是某些类别的组合没有数据。 My data look like these: 我的数据如下所示：

          interestAffinityCategory userGender users
1                 Music Lovers       male   198
2                 Music Lovers     female   190
3  News Junkies & Avid Readers       male   134
4  News Junkies & Avid Readers     female   115
5                  Sports Fans       male   109
6                 Movie Lovers       male   108
7                 Technophiles       male    93
8                    TV Lovers       male    88
9                    TV Lovers     female    79
10                Technophiles     female    70

Example, Sport Fans, only have data for male gender. 例如，体育迷只有男性数据。 I need all the categories, even with a 0 value in the users column. 我需要所有类别，即使用户列中的值为0。 Like: Sport Fans, female, 0 How my data need to be: (line 8 and 6) 像：体育迷，女，0我的数据需要如何：（第8行和第6行）

      interestAffinityCategory userGender users
1                 Music Lovers       male   198
2                 Music Lovers     female   190
3  News Junkies & Avid Readers       male   134
4  News Junkies & Avid Readers     female   115
5                  Sports Fans       male   109
6                  Sports Fans     female   0
7                 Movie Lovers       male   108
8                 Movie Lovers     female   0
9                 Technophiles       male   93
10                    TV Lovers       male  88
11                    TV Lovers     female  79
12                Technophiles     female    70

I tried to find a solution, but I only find similar cases, but with only one dimension, and it didn't work for me. 我试图找到一个解决方案，但我只发现了类似的案例，但是只有一个维度，对我来说不起作用。

Ps.: This data is from the Google Analytics API, I want to get the top 10 categories, and make a graph with visits by gender, but for it, I need to show data for all mix of categories and gender, even with 0 visits. 附言：此数据来自Google Analytics（分析）API，我想获得排名前10位的类别，并制作一张按性别进行访问的图表，但为此，我需要显示所有类别和性别组合的数据，即使0访问。

Answer 1

You should use the complete function from tidyr . 您应该使用tidyr的complete函数。 The first argument is your data, second and third are the columns that you want to find all possible comibnations (if you have more, you can just list them one by one), and fill is a list with the default values to fill in. 第一个参数是数据，第二个和第三个是要查找所有可能的组合的列（如果有更多组合，则可以一个一个地列出），并且fill是一个列表，其中包含要填充的默认值。

complete(data, interestAffinityCategory, userGender, fill=list(users=0))

Answer 2

You could create a data frame of all combinations of categories with users set to zero. 您可以创建一个类别所有组合的数据框，并将users设置为零。 Then you can combine the two data frames and for each combination of categories keep the maximum value for users. 然后，您可以组合两个数据框，并为类别的每个组合保留用户的最大值。

You can create a data frame with all combinations using expand.grid() : 您可以使用expand.grid()创建具有所有组合的数据框：

all_levels_0 <- expand.grid(levels(data$interestAffinityCategory), levels(data$userGender))
all_levels_0$users <- 0
names(all_levels_0) <- names(data)
head(all_levels_0)
##        interestAffinityCategory  userGender users
## 1                  Movie Lovers      female     0
## 2                  Music Lovers      female     0
## 3   News Junkies & Avid Readers      female     0
## 4                   Sports Fans      female     0
## 5                  Technophiles      female     0
## 6                  Technophiles      female     0

(This assumes that data$interestAffinityCategory and data$userGender are both factors. If they are characters, use unique() instead of levels() .) （这假定data$interestAffinityCategory和data$userGender都是因素。如果它们是字符，请使用unique()而不是levels() 。）

For the second step, I use the dplyr package: 对于第二步，我使用dplyr包：

library(dplyr)
all_levels <- bind_rows(data, all_levels_0) %>%
              group_by(interestAffinityCategory, userGender) %>%
              summarise(users = max(users))
head(all_levels)
## Source: local data frame [6 x 3]
## Groups: interestAffinityCategory [3]
## 
##        interestAffinityCategory  userGender users
##                          (fctr)      (fctr) (dbl)
## 1                  Movie Lovers      female     0
## 2                  Movie Lovers        male   108
## 3                  Music Lovers      female   190
## 4                  Music Lovers        male   198
## 5   News Junkies & Avid Readers      female   115
## 6   News Junkies & Avid Readers        male   134

If you prefer not to use dplyr, you can do the same with rbind() and aggregate() from base R: 如果你不喜欢使用dplyr，你可以用同样的rbind()和aggregate()从基础R：

combined <- rbind(data, all_levels_0)
all_levels <- aggregate(users ~ interestAffinityCategory + userGender,
                        data = combined, FUN = max)
head(all_levels)
##        interestAffinityCategory  userGender users
## 1                  Movie Lovers      female     0
## 2                  Music Lovers      female   190
## 3   News Junkies & Avid Readers      female   115
## 4                   Sports Fans      female     0
## 5                  Technophiles      female    70
## 6                  Technophiles      female     0

(This orders the rows differently, so the first few rows are not the same as in the dplyr example.) （这对行进行了不同的排序，因此前几行与dplyr示例中的行不同。）

根据R中的其他行和列组合在数据框中创建行

问题描述

2 个解决方案

解决方案1
4 已采纳 2016-03-11 22:44:54

解决方案2
1 2016-03-11 21:39:34

根据R中的其他行和列组合在数据框中创建行

问题描述

2 个解决方案

解决方案1 4 已采纳 2016-03-11 22:44:54

解决方案2 1 2016-03-11 21:39:34

解决方案1
4 已采纳 2016-03-11 22:44:54

解决方案2
1 2016-03-11 21:39:34