简体   繁体   中英

R: Extracting the highest numeric value from each character value in a column

I have a character field in a dataframe that contains numbers eg (0.5,3.5,7.8,2.4).

For every record I am trying to extract the largest value from the string and put it in a new column.

eg

x  csi
1  0.5, 6.7, 2.3   
2  9.5, 2.6, 1.1
3  0.7, 2.3, 5.1
4  4.1, 2.7, 4.7

The desired output would be:

x  csi            csi_max
1  0.5, 6.7, 2.3  6.7
2  9.5, 2.6, 1.1  9.5
3  0.7, 2.3, 5.1  5.1
4  4.1, 2.7, 4.7  4.7

I have had various attempts ...with my latest attempt being the following - which provides the maximum csi score from the entire column rather than from the individual row's csi numbers...

library(stringr)
numextract <- function(string){ 
  str_extract(string, "\\-*\\d+\\.*\\d*")
} 
df$max_csi <- max(numextract(df$csi))

Thank you

We can use tidyverse

library(dplyr)
library(tidyr)
df1  %>% 
    separate_rows(csi) %>%
    group_by(x) %>% 
    summarise(csi_max = max(csi)) %>%
    left_join(df1, .)
#  x           csi csi_max
#1 1 0.5, 6.7, 2.3     6.7
#2 2 9.5, 2.6, 1.1     9.5
#3 3 0.7, 2.3, 5.1     5.1
#4 4 4.1, 2.7, 4.7     4.7

Or this can be done with pmax from base R after separating the 'csi' column into a data.frame with read.table

df1$csi_max <- do.call(pmax, read.table(text=df1$csi, sep=","))

Hope this helps!

df$csi_max <- sapply(df$csi, function(x) max(as.numeric(unlist(strsplit(as.character(x), split=",")))))

Output is:

  x           csi csi_max
1 1 0.5, 6.7, 2.3     6.7
2 2 9.5, 2.6, 1.1     9.5
3 3 0.7, 2.3, 5.1     5.1
4 4 4.1, 2.7, 4.7     4.7


#sample data
> dput(df)
structure(list(x = 1:4, csi = structure(c(1L, 4L, 2L, 3L), .Label = c("0.5, 6.7, 2.3", 
"0.7, 2.3, 5.1", "4.1, 2.7, 4.7", "9.5, 2.6, 1.1"), class = "factor")), .Names = c("x", 
"csi"), class = "data.frame", row.names = c(NA, -4L))


Edit:
As suggested by @RichScriven, the more efficient way could be

df$csi_max <- sapply(strsplit(as.character(df$csi), ","), function(x) max(as.numeric(x)))

A solution using the package.

library(splitstackshape)

dat$csi_max <- apply(cSplit(dat, "csi")[, -1], 1, max)
dat
#   x           csi csi_max
# 1 1 0.5, 6.7, 2.3     6.7
# 2 2 9.5, 2.6, 1.1     9.5
# 3 3 0.7, 2.3, 5.1     5.1
# 4 4 4.1, 2.7, 4.7     4.7

DATA

dat <- read.table(text = "x  csi
1  '0.5, 6.7, 2.3'   
                  2  '9.5, 2.6, 1.1'
                  3  '0.7, 2.3, 5.1'
                  4  '4.1, 2.7, 4.7'",
                  header = TRUE, stringsAsFactors = FALSE)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM