簡體   English   中英

如何在R中找到dataframe的字符串行的最大值和最小值?

[英]How to find the max and min values of string rows of a dataframe in R?

對於我的每一行數據,我想獲取最初存儲為字符的最小值和最大值。 例如,考慮以下數據:

df <- data.frame(id=c(1:3),
                 yr=c("2000,2009,1999,2022","2019,2018,2006,2007","1998,2012,2000,2020"))

Output 需要:

id                   yr  min_yr    max_yr
1   2000,2009,1999,2022    1999      2022
2   2019,2018,2006,2007    2006      2019
3   1998,2012,2000,2020    1998      2020

這也適用於86015432023等年份,...

df[c("min_yr", "max_yr")] <-
   t(sapply(strsplit(df$yr, ","), \(x) range(as.numeric(x))))

df
#  id                  yr min_yr max_yr
#1  1 2000,2009,1999,2022   1999   2022
#2  2 2019,2018,2006,2007   2006   2019
#3  3 1998,2012,2000,2020   1998   2020

這是基數 R 中的一行代碼,它也適用於任何數字。

df[c('min_yr', 'max_yr')] <- t(sapply(df$yr, \(x) range(scan(text=x, sep = ','))))

導致

df
#>   id                  yr min_yr max_yr
#> 1  1 2000,2009,1999,2022   1999   2022
#> 2  2 2019,2018,2006,2007   2006   2019
#> 3  3 1998,2012,2000,2020   1998   2020
df$min_yr=as.numeric(unlist(lapply(strsplit(df$yr,","),min)))
df$max_yr=as.numeric(unlist(lapply(strsplit(df$yr,","),max)))

  id                  yr min_yr max_yr
1  1 2000,2009,1999,2022   1999   2022
2  2 2019,2018,2006,2007   2006   2019
3  3 1998,2012,2000,2020   1998   2020

使用dplyrpurrr

library(dplyr)
library(purrr)
mutate(df, strsplit(yr, ",") |>
           map(as.numeric) |>
           map(range) |>
           map_dfr(setNames, c("min", "max")))

##>   id                  yr  min  max
##> 1  1 2000,2009,1999,2022 1999 2022
##> 2  2 2019,2018,2006,2007 2006 2019
##> 3  3 1998,2012,2000,2020 1998 2020

library(stringr) library(dplyr) df %>% rowwise() %>% mutate(min_yr = min(as.numeric(str_split_1(yr, ","))), max_yr = max(as.numeric(str_split_1(yr, ",")))) id yr min_yr max_yr <int> <chr> <dbl> <dbl> 1 1 2000,2009,1999,2022 1999 2022 2 2 2019,2018,2006,2007 2006 2019 3 3 1998,2012,2000,2020 1998 2020

使用base R中的pmin/pmax - 使用 read.csv 讀取 yr 列以創建read.csv ,然后使用 pmin/pmax

d1 <- read.csv(text = df$yr, header = FALSE)
df$min_yr <- do.call(pmin, d1)
df$max_yr <- do.call(pmax, d1)

-輸出

> df
  id                  yr min_yr max_yr
1  1 2000,2009,1999,2022   1999   2022
2  2 2019,2018,2006,2007   2006   2019
3  3 1998,2012,2000,2020   1998   2020

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM