简体   繁体   English

将字符串列值转换为数字并在 R 中的这些数字值中找到最大值

[英]convert string column values to numeric and find maximum in those numeric values in R

I have a column called "XYZ" (XYZ is one of the column in my data frame) in data frame and this "XYZ" column is a string type.我在数据框中有一列名为“XYZ”(XYZ 是我的数据框中的列之一),这个“XYZ”列是字符串类型。 values of the "XYZ" column is like below “XYZ”列的值如下所示

example:例子:

   XYZ
new_value_1
new_value_2
new_value_4
new_value_3

I have to get the last digit(which is a number) and convert that into number and finds the maximum among those number.我必须得到最后一位数字(这是一个数字)并将其转换为数字并找到这些数字中的最大值。 After finding maximum number in that column I need to generate a sequence from that maximum number till n rows.在该列中找到最大数后,我需要从该最大数到 n 行生成一个序列。

For example from the above "XYZ" every string has digit at the end I have to get the last digit which is number and finds the maximum in those numbers, in this case maximum is 4 after finding maximum I have to mutate id column and id will starts from next number to the maximum number.例如,从上面的“XYZ”中,每个字符串的末尾都有数字,我必须得到最后一位数字,并在这些数字中找到最大值,在这种情况下,找到最大值后最大值为 4 我必须改变 id 列和 id将从下一个数字开始到最大数字。

output:输出:

 XYZ             ID
new_value_1      5
new_value_2      6
new_value_4      7
new_value_3      8

In the future, please make a minimally reproducible input data set using dput.将来,请使用 dput 制作一个可重现的输入数据集。 I've recreated the data set for convenience.为方便起见,我重新创建了数据集。

Using the dplyr package for ease:使用dplyr包轻松:

library(dplyr)
raw_data <- data.frame("XYZ"= c("new_value_1","new_value_2","new_value_3","new_value_4"))

##get the max value
max_value <- max(sapply(raw_data$XYZ, function(x){as.numeric(strsplit(x, "_")[[1]][3])}))

#make the resulting data

final_data <- raw_data %>% mutate(ID = (max_value+1):(max_value+nrow(raw_data)))

Let me know if dplyr is not allowed.如果不允许使用 dplyr,请告诉我。

Here is a base R way.这是一个基本的R方式。 It uses a regex to extract the last digit or digits and seq.int to create a sequence like the sequence in the question.它使用正则表达式来提取最后一位或多位数字,并使用seq.int创建一个类似于问题中的序列的序列。

m <- max(as.integer(sub("^[^[:digit:]]*([[:digit:]]+$)", "\\1", df1$XYZ)))
df1$ID <- m + seq.int(nrow(df1))

df1
#          XYZ ID
#1 new_value_1  5
#2 new_value_2  6
#3 new_value_4  7
#4 new_value_3  8

Data数据

df1 <- read.table(text = "
   XYZ
new_value_1
new_value_2
new_value_4
new_value_3
", header = TRUE)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM