简体   繁体   English

通过聚合 R 中的多个列来创建一个新列

[英]Create a new column by aggregating multiple columns in R

Background背景

I have a dataset, df, where I would like to aggregate multiple columns and create a new column.我有一个数据集 df,我想在其中聚合多个列并创建一个新列。 I need to multiply Type, Span and Population columns and create a new Output column我需要将 Type、Span 和 Population 列相乘并创建一个新的 Output 列

ID       Status      Type     Span   State   Population

A        Yes         2        70%    Ga      10000

Desired output所需 output

ID        Status     Type      Span   State   Population   Output

A         Yes        2         70%    Ga      10000        14000      

dput输入

structure(list(ID = structure(1L, .Label = "A ", class = "factor"), 
Status = structure(1L, .Label = "Yes", class = "factor"), 
Type = 2L, Span = structure(1L, .Label = "70%", class = "factor"), 
State = structure(1L, .Label = "Ga", class = "factor"), Population = 10000L), class = "data.frame", 
row.names = c(NA, 
-1L))

This is what I have tried这是我尝试过的

 df %>% 
 mutate(Output = Type * Span * Population)

Here, we are creating a new column based on the inputs from different column.在这里,我们正在根据来自不同列的输入创建一个新列。 We can just use mutate to get the Span percent of Population and multiply by 'Type'.我们可以使用mutate来获得PopulationSpan百分比并乘以“类型”。 Note that 'Span' is not numeric, as it is having % , so we extract the numeric part with parse_number divide by 100, then multiply with Population along with the 'Type'请注意,'Span' 不是数字,因为它有% ,所以我们用parse_number除以 100 提取数字部分,然后乘以 Population 和 'Type'

library(dplyr)
df %>%
  mutate(Output = Type * Population * readr::parse_number(as.character(Span))/100)
#   ID Status Type Span State Population Output
#1 A     Yes    2  70%    Ga      10000  14000

If the columns 'Type', 'Population' are not numeric, it is better to convert to numeric with as.numeric(as.character(df$Type)) and for 'Population' (assuming they are factor class).如果“类型”、“人口”列不是数字,最好使用as.numeric(as.character(df$Type))和“人口”(假设它们是factor类)转换为numeric Another option is type.convert(df, as.is = TRUE) and then work on that modified class dataset另一个选项是type.convert(df, as.is = TRUE)然后处理修改后的 class 数据集

We can remove the '%' sign using sub , convert to numeric and multiply values.我们可以使用sub删除'%'符号,转换为数字并乘以值。

This can be done in base R as:这可以在基础 R 中完成,如下所示:

df$output <- with(df, Type * as.numeric(sub('%', '', Span)) * Population/100)
df

#  ID Status Type Span State Population  output
#1 A     Yes    2  70%    Ga      10000   14000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM