简体   繁体   English

R:从列中的每个字符值中提取最高的数值

[英]R: Extracting the highest numeric value from each character value in a column

I have a character field in a dataframe that contains numbers eg (0.5,3.5,7.8,2.4). 我在包含数字的数据框中有一个字符字段,例如(0.5,3.5,7.8,2.4)。

For every record I am trying to extract the largest value from the string and put it in a new column. 对于每条记录,我都尝试从字符串中提取最大值并将其放在新列中。

eg 例如

x  csi
1  0.5, 6.7, 2.3   
2  9.5, 2.6, 1.1
3  0.7, 2.3, 5.1
4  4.1, 2.7, 4.7

The desired output would be: 所需的输出将是:

x  csi            csi_max
1  0.5, 6.7, 2.3  6.7
2  9.5, 2.6, 1.1  9.5
3  0.7, 2.3, 5.1  5.1
4  4.1, 2.7, 4.7  4.7

I have had various attempts ...with my latest attempt being the following - which provides the maximum csi score from the entire column rather than from the individual row's csi numbers... 我进行了各种尝试...以下是我的最新尝试-它提供了整个列而不是单个行的csi编号的最大csi得分...

library(stringr)
numextract <- function(string){ 
  str_extract(string, "\\-*\\d+\\.*\\d*")
} 
df$max_csi <- max(numextract(df$csi))

Thank you 谢谢

We can use tidyverse 我们可以使用tidyverse

library(dplyr)
library(tidyr)
df1  %>% 
    separate_rows(csi) %>%
    group_by(x) %>% 
    summarise(csi_max = max(csi)) %>%
    left_join(df1, .)
#  x           csi csi_max
#1 1 0.5, 6.7, 2.3     6.7
#2 2 9.5, 2.6, 1.1     9.5
#3 3 0.7, 2.3, 5.1     5.1
#4 4 4.1, 2.7, 4.7     4.7

Or this can be done with pmax from base R after separating the 'csi' column into a data.frame with read.table 或者这可以在将'csi'列分离为具有read.tabledata.frame之后,使用base R pmax完成

df1$csi_max <- do.call(pmax, read.table(text=df1$csi, sep=","))

Hope this helps! 希望这可以帮助!

df$csi_max <- sapply(df$csi, function(x) max(as.numeric(unlist(strsplit(as.character(x), split=",")))))

Output is: 输出为:

  x           csi csi_max
1 1 0.5, 6.7, 2.3     6.7
2 2 9.5, 2.6, 1.1     9.5
3 3 0.7, 2.3, 5.1     5.1
4 4 4.1, 2.7, 4.7     4.7


#sample data
> dput(df)
structure(list(x = 1:4, csi = structure(c(1L, 4L, 2L, 3L), .Label = c("0.5, 6.7, 2.3", 
"0.7, 2.3, 5.1", "4.1, 2.7, 4.7", "9.5, 2.6, 1.1"), class = "factor")), .Names = c("x", 
"csi"), class = "data.frame", row.names = c(NA, -4L))


Edit: 编辑:
As suggested by @RichScriven, the more efficient way could be 正如@RichScriven所建议的那样,更有效的方法可能是

df$csi_max <- sapply(strsplit(as.character(df$csi), ","), function(x) max(as.numeric(x)))

A solution using the package. 使用包的解决方案。

library(splitstackshape)

dat$csi_max <- apply(cSplit(dat, "csi")[, -1], 1, max)
dat
#   x           csi csi_max
# 1 1 0.5, 6.7, 2.3     6.7
# 2 2 9.5, 2.6, 1.1     9.5
# 3 3 0.7, 2.3, 5.1     5.1
# 4 4 4.1, 2.7, 4.7     4.7

DATA 数据

dat <- read.table(text = "x  csi
1  '0.5, 6.7, 2.3'   
                  2  '9.5, 2.6, 1.1'
                  3  '0.7, 2.3, 5.1'
                  4  '4.1, 2.7, 4.7'",
                  header = TRUE, stringsAsFactors = FALSE)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM