[英]How to find the max and min values of string rows of a dataframe in R?
For each row of my data, I want to get the min and max values which are originally stored as a character.对于我的每一行数据,我想获取最初存储为字符的最小值和最大值。 For example, consider the following data:
例如,考虑以下数据:
df <- data.frame(id=c(1:3),
yr=c("2000,2009,1999,2022","2019,2018,2006,2007","1998,2012,2000,2020"))
Output needed: Output 需要:
id yr min_yr max_yr
1 2000,2009,1999,2022 1999 2022
2 2019,2018,2006,2007 2006 2019
3 1998,2012,2000,2020 1998 2020
This will work also for years like 860
, 1543
, 2023
, ...这也适用于
860
、 1543
、 2023
等年份,...
df[c("min_yr", "max_yr")] <-
t(sapply(strsplit(df$yr, ","), \(x) range(as.numeric(x))))
df
# id yr min_yr max_yr
#1 1 2000,2009,1999,2022 1999 2022
#2 2 2019,2018,2006,2007 2006 2019
#3 3 1998,2012,2000,2020 1998 2020
Here's one-liner in base R that also works on any number.这是基数 R 中的一行代码,它也适用于任何数字。
df[c('min_yr', 'max_yr')] <- t(sapply(df$yr, \(x) range(scan(text=x, sep = ','))))
Resulting in导致
df
#> id yr min_yr max_yr
#> 1 1 2000,2009,1999,2022 1999 2022
#> 2 2 2019,2018,2006,2007 2006 2019
#> 3 3 1998,2012,2000,2020 1998 2020
df$min_yr=as.numeric(unlist(lapply(strsplit(df$yr,","),min)))
df$max_yr=as.numeric(unlist(lapply(strsplit(df$yr,","),max)))
id yr min_yr max_yr
1 1 2000,2009,1999,2022 1999 2022
2 2 2019,2018,2006,2007 2006 2019
3 3 1998,2012,2000,2020 1998 2020
using dplyr
and purrr
:使用
dplyr
和purrr
:
library(dplyr)
library(purrr)
mutate(df, strsplit(yr, ",") |>
map(as.numeric) |>
map(range) |>
map_dfr(setNames, c("min", "max")))
##> id yr min max
##> 1 1 2000,2009,1999,2022 1999 2022
##> 2 2 2019,2018,2006,2007 2006 2019
##> 3 3 1998,2012,2000,2020 1998 2020
library(stringr) library(dplyr) df %>% rowwise() %>% mutate(min_yr = min(as.numeric(str_split_1(yr, ","))), max_yr = max(as.numeric(str_split_1(yr, ",")))) id yr min_yr max_yr <int> <chr> <dbl> <dbl> 1 1 2000,2009,1999,2022 1999 2022 2 2 2019,2018,2006,2007 2006 2019 3 3 1998,2012,2000,2020 1998 2020
Using pmin/pmax
from base R
- read the yr column with read.csv
to create a data.frame and then use pmin/pmax使用
base R
中的pmin/pmax
- 使用 read.csv 读取 yr 列以创建read.csv
,然后使用 pmin/pmax
d1 <- read.csv(text = df$yr, header = FALSE)
df$min_yr <- do.call(pmin, d1)
df$max_yr <- do.call(pmax, d1)
-output -输出
> df
id yr min_yr max_yr
1 1 2000,2009,1999,2022 1999 2022
2 2 2019,2018,2006,2007 2006 2019
3 3 1998,2012,2000,2020 1998 2020
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.