简体   繁体   English

根据R中的其他行和列组合在数据框中创建行

[英]Create rows in a data frame based on other rows and column combination in R

I have a problem with a data frame in R, I have some data with two dimensions and one metric, but some combination of categories don't have data. 我在R中的数据框有问题,我有一些具有二维和一个度量的数据,但是某些类别的组合没有数据。 My data look like these: 我的数据如下所示:

          interestAffinityCategory userGender users
1                 Music Lovers       male   198
2                 Music Lovers     female   190
3  News Junkies & Avid Readers       male   134
4  News Junkies & Avid Readers     female   115
5                  Sports Fans       male   109
6                 Movie Lovers       male   108
7                 Technophiles       male    93
8                    TV Lovers       male    88
9                    TV Lovers     female    79
10                Technophiles     female    70

Example, Sport Fans, only have data for male gender. 例如,体育迷只有男性数据。 I need all the categories, even with a 0 value in the users column. 我需要所有类别,即使用户列中的值为0。 Like: Sport Fans, female, 0 How my data need to be: (line 8 and 6) 像:体育迷,女,0我的数据需要如何:(第8行和第6行)

      interestAffinityCategory userGender users
1                 Music Lovers       male   198
2                 Music Lovers     female   190
3  News Junkies & Avid Readers       male   134
4  News Junkies & Avid Readers     female   115
5                  Sports Fans       male   109
6                  Sports Fans     female   0
7                 Movie Lovers       male   108
8                 Movie Lovers     female   0
9                 Technophiles       male   93
10                    TV Lovers       male  88
11                    TV Lovers     female  79
12                Technophiles     female    70

I tried to find a solution, but I only find similar cases, but with only one dimension, and it didn't work for me. 我试图找到一个解决方案,但我只发现了类似的案例,但是只有一个维度,对我来说不起作用。

Ps.: This data is from the Google Analytics API, I want to get the top 10 categories, and make a graph with visits by gender, but for it, I need to show data for all mix of categories and gender, even with 0 visits. 附言:此数据来自Google Analytics(分析)API,我想获得排名前10位的类别,并制作一张按性别进行访问的图表,但为此,我需要显示所有类别和性别组合的数据,即使0访问。

You should use the complete function from tidyr . 您应该使用tidyrcomplete函数。 The first argument is your data, second and third are the columns that you want to find all possible comibnations (if you have more, you can just list them one by one), and fill is a list with the default values to fill in. 第一个参数是数据,第二个和第三个是要查找所有可能的组合的列(如果有更多组合,则可以一个一个地列出),并且fill是一个列表,其中包含要填充的默认值。

complete(data, interestAffinityCategory, userGender, fill=list(users=0))

You could create a data frame of all combinations of categories with users set to zero. 您可以创建一个类别所有组合的数据框,并将users设置为零。 Then you can combine the two data frames and for each combination of categories keep the maximum value for users. 然后,您可以组合两个数据框,并为类别的每个组合保留用户的最大值。

You can create a data frame with all combinations using expand.grid() : 您可以使用expand.grid()创建具有所有组合的数据框:

all_levels_0 <- expand.grid(levels(data$interestAffinityCategory), levels(data$userGender))
all_levels_0$users <- 0
names(all_levels_0) <- names(data)
head(all_levels_0)
##        interestAffinityCategory  userGender users
## 1                  Movie Lovers      female     0
## 2                  Music Lovers      female     0
## 3   News Junkies & Avid Readers      female     0
## 4                   Sports Fans      female     0
## 5                  Technophiles      female     0
## 6                  Technophiles      female     0

(This assumes that data$interestAffinityCategory and data$userGender are both factors. If they are characters, use unique() instead of levels() .) (这假定data$interestAffinityCategorydata$userGender都是因素。如果它们是字符,请使用unique()而不是levels() 。)

For the second step, I use the dplyr package: 对于第二步,我使用dplyr包:

library(dplyr)
all_levels <- bind_rows(data, all_levels_0) %>%
              group_by(interestAffinityCategory, userGender) %>%
              summarise(users = max(users))
head(all_levels)
## Source: local data frame [6 x 3]
## Groups: interestAffinityCategory [3]
## 
##        interestAffinityCategory  userGender users
##                          (fctr)      (fctr) (dbl)
## 1                  Movie Lovers      female     0
## 2                  Movie Lovers        male   108
## 3                  Music Lovers      female   190
## 4                  Music Lovers        male   198
## 5   News Junkies & Avid Readers      female   115
## 6   News Junkies & Avid Readers        male   134

If you prefer not to use dplyr, you can do the same with rbind() and aggregate() from base R: 如果你不喜欢使用dplyr,你可以用同样的rbind()aggregate()从基础R:

combined <- rbind(data, all_levels_0)
all_levels <- aggregate(users ~ interestAffinityCategory + userGender,
                        data = combined, FUN = max)
head(all_levels)
##        interestAffinityCategory  userGender users
## 1                  Movie Lovers      female     0
## 2                  Music Lovers      female   190
## 3   News Junkies & Avid Readers      female   115
## 4                   Sports Fans      female     0
## 5                  Technophiles      female    70
## 6                  Technophiles      female     0

(This orders the rows differently, so the first few rows are not the same as in the dplyr example.) (这对行进行了不同的排序,因此前几行与dplyr示例中的行不同。)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 通过其他行中的每个单元格组合在 R 中的数据框中添加单元格 - Add cells in data frame in R by every combination of cells in other rows 数据框R-减少以基于文本组合排除某些行 - Data frame R - Reduce to exclude certain rows based on text combination R:根据其他向量的值从数据框中删除行 - R: Deleting rows from a data frame based on values of other vector 如何使用mutate仅根据数据框其他行的子集创建新列? - How can I use mutate to create a new column based only on a subset of other rows of a data frame? 如何基于R中丢失的数据在数据框中创建新行 - How to create new rows in a data frame based on missing data in R R:如何根据数据框中的前几行为第90个分位数创建新列 - R: How to create a new column for 90th quantile based off previous rows in a data frame R - 根据满足另一个数据框中要求的行数创建新列 - R - Create a new column based on number of rows that satisfy requirements in another data frame 创建一个函数,根据两列的元素是否属于 R 中的列表来选择数据框中的行 - Create a function that selects rows in a data frame based on if the elements of two column belong to a list in R R:如何根据给定列的值删除数据框的行 - R: How to delete rows of a data frame based on the values of a given column R如何根据列的第一个字符删除数据框中的行 - R how to remove rows in a data frame based on the first character of a column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM