简体   繁体   English

根据列名计算数据框中的行总和

[英]Calculating row sums in data frame based on column names

I have a data frame with media spending for different media channels:我有一个数据框,其中包含不同媒体渠道的媒体支出:

TV <- c(200,500,700,1000)
Display <- c(30,33,47,55)
Social <- c(20,21,22,23)
Facebook <- c(30,31,32,33)
Print <- c(50,51,52,53)
Newspaper <- c(60,61,62,63)

df_media <- data.frame(TV,Display,Social,Facebook, Print, Newspaper)

My goal is to calculate the row sums of specific columns based on their name.我的目标是根据名称计算特定列的行总和。 For example: Per definition Facebook falls into the category of Social, so I want to add the Facebook column to the Social column and just have the Social column left.例如:根据定义 Facebook 属于 Social 类别,所以我想将 Facebook 列添加到 Social 列中,只留下 Social 列。 The same goes for Newspaper which should be added to Print and so on.报纸也是如此,应该添加到打印等。

The challenge is that the names and the number of columns that belong to one category change from data set to data set, eg the next data set could contain Social, Facebook and Instagram which should be all summed up to Social.挑战在于,属于一个类别的列的名称和数量会随着数据集的变化而变化,例如,下一个数据集可能包含 Social、Facebook 和 Instagram,这些都应该归结为 Social。

There is a list of rules, which define which media types (column names) belong to each other, but I have to admit that I'm a bit clueless and can only think about a long set of if commands right now, but I hope there is a better solution.有一个规则列表,定义了哪些媒体类型(列名)属于彼此,但我不得不承认我有点无能为力,现在只能考虑一长串if命令,但我希望有一个更好的解决方案。

I'm thinking about putting all the names that belong to each other in vectors and use them to find and summarize the relevant columns, but I have no idea, how to execute this.我正在考虑将所有属于彼此的名称放入向量中,并使用它们来查找和汇总相关列,但我不知道如何执行此操作。 Any help is appreciated.任何帮助表示赞赏。

You could something along those lines, which allows columns to not be part of every data set (with intersect and setdiff ).你可以沿着这些思路做一些事情,这允许列不是每个数据集的一部分(使用intersectsetdiff )。

  1. Define a set of rules, ie those columns that are going to be united/grouped together.定义一组规则,即将合并/组合在一起的那些列。

  2. Create a vector d of the remaining columns创建剩余列的向量d

  3. Compute the rowSums of every subset of the data set defined in the rules计算规则中定义的数据集的每个子集的rowSums

  4. append the remaining columns append其余列

  5. cbind the columns of the list using do.call .使用do.call cbind列表的列。

#Rules
rules = list(social = c("Social", "Facebook", "Instagram"),
             printed = c("Print", "Newspaper"))
d <- setdiff(colnames(df_media), unlist(rules)) #columns that are not going to be united

#data frame 
lapply(rules, function(x) rowSums(df_media[, intersect(colnames(df_media), x)])) |>
  append(df_media[, d]) |>
  do.call(cbind.data.frame, args = _)
  social printed   TV Display
1     50     110  200      30
2     52     112  500      33
3     54     114  700      47
4     56     116 1000      55

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM