[英]Create a new column by aggregating multiple columns in R
Background背景
I have a dataset, df, where I would like to aggregate multiple columns and create a new column.我有一个数据集 df,我想在其中聚合多个列并创建一个新列。 I need to multiply Type, Span and Population columns and create a new Output column我需要将 Type、Span 和 Population 列相乘并创建一个新的 Output 列
ID Status Type Span State Population
A Yes 2 70% Ga 10000
Desired output所需 output
ID Status Type Span State Population Output
A Yes 2 70% Ga 10000 14000
dput输入
structure(list(ID = structure(1L, .Label = "A ", class = "factor"),
Status = structure(1L, .Label = "Yes", class = "factor"),
Type = 2L, Span = structure(1L, .Label = "70%", class = "factor"),
State = structure(1L, .Label = "Ga", class = "factor"), Population = 10000L), class = "data.frame",
row.names = c(NA,
-1L))
This is what I have tried这是我尝试过的
df %>%
mutate(Output = Type * Span * Population)
Here, we are creating a new column based on the inputs from different column.在这里,我们正在根据来自不同列的输入创建一个新列。 We can just use mutate
to get the Span
percent of Population
and multiply by 'Type'.我们可以使用mutate
来获得Population
的Span
百分比并乘以“类型”。 Note that 'Span' is not numeric, as it is having %
, so we extract the numeric part with parse_number
divide by 100, then multiply with Population along with the 'Type'请注意,'Span' 不是数字,因为它有%
,所以我们用parse_number
除以 100 提取数字部分,然后乘以 Population 和 'Type'
library(dplyr)
df %>%
mutate(Output = Type * Population * readr::parse_number(as.character(Span))/100)
# ID Status Type Span State Population Output
#1 A Yes 2 70% Ga 10000 14000
If the columns 'Type', 'Population' are not numeric, it is better to convert to numeric
with as.numeric(as.character(df$Type))
and for 'Population' (assuming they are factor
class).如果“类型”、“人口”列不是数字,最好使用as.numeric(as.character(df$Type))
和“人口”(假设它们是factor
类)转换为numeric
。 Another option is type.convert(df, as.is = TRUE)
and then work on that modified class dataset另一个选项是type.convert(df, as.is = TRUE)
然后处理修改后的 class 数据集
We can remove the '%'
sign using sub
, convert to numeric and multiply values.我们可以使用sub
删除'%'
符号,转换为数字并乘以值。
This can be done in base R as:这可以在基础 R 中完成,如下所示:
df$output <- with(df, Type * as.numeric(sub('%', '', Span)) * Population/100)
df
# ID Status Type Span State Population output
#1 A Yes 2 70% Ga 10000 14000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.