简体   繁体   English

替换R中字符串中的元素

[英]Replacing elements within a string in R

I have a row in a data frame in R that is made up of sequences of the undetermined length of 0s 1s and 2s as characters. 我在R的数据框中有一行,它由长度不确定的0s 1s和2s作为字符的序列组成。 So "01", "010", "201", "102", "00012"... things like this. 所以“ 01”,“ 010”,“ 201”,“ 102”,“ 00012” ...这样的事情。

I'd like to find a way to determine if the last character in the string is NUMERICALLY the largest. 我想找到一种方法来确定字符串中的最后一个字符是否在数值上最大。 It's important that I keep the row in the data frame as characters for other purposes. 将行保留在数据框中作为字符用于其他目的很重要。 So basically I want to take substr(x, nchar(x), nchar(x)) and determine if it, as a number, is the largest of the numbers in the character string. 因此,基本上我想使用substr(x, nchar(x), nchar(x))并确定它是否为数字,是字符串中最大的数字。

I'm super lost as to how to do this since I'm not all that familiar with regular expressions and I have to back and forth between treating elements like characters and numbers. 我对如何执行此操作迷失了,因为我对正则表达式并不十分熟悉,而且我不得不在处理字符和数字等元素之间来回切换。

Thanks in advance. 提前致谢。

~Maureen 〜莫琳

Let df be the name of the dataframe and the row with the string sequences "01", "010", "201", "102", "00012" is No.2. df为数据帧的名称,字符串序列为“ 01”,“ 010”,“ 201”,“ 102”,“ 00012”的行为No.2。 You can get a vector that answers the question if the last character in the string is NUMERICALLY the largest giving this: 如果字符串中的最后一个字符在数值上最大,则可以得到一个向量来回答这个问题:

sapply(strsplit(as.character(df[2,]),""),function(x) x[length(x)] >= max(x))
[1]  TRUE FALSE FALSE  TRUE TRUE

One way would be 一种方法是

p <- as.numeric(strsplit("0120102","")[[1]])
if (max(p) == p[length(p)]) {
   print("yes")
}

Actually you can ignore as.numeric() since "2" > "1" > "0": 实际上,您可以忽略as.numeric(),因为“ 2”>“ 1”>“ 0”:

p <- strsplit("0120102", "")[[1]]

If you wanted to apply this to your data.frame A: 如果要将其应用于data.frame A:

apply(A, c(1,2), function(z) {p<-strsplit(z, "")[[1]];(max(p) == p[length(p)])})

正则表达式为[0-9] $以获得最后一个数字,其余逻辑取决于您所开发的环境。

I think you're best bet will be to look at how regex works in the R language: 我认为您最好的选择是看看regex在R语言中的工作方式:

http://www.regular-expressions.info/rlanguage.html

Like Dan Heberden said in the above post, you'll need to tokenize the string you gave as an example in your post, and then grep( ...? ) the tokens for the regex "[0-9]$". 就像上面的帖子中的Dan Heberden所说的那样,您需要对在示例中给出的字符串进行标记化,然后grep(...?)将正则表达式“ [0-9] $”的标记化。 By the way, with regex, you can treat everything as characters, so you shouldn't have to shuttle back and forth between numeric and character mode, except for when you take the results of the grep function and parse it to numeric form for your comparison. 顺便说一下,使用正则表达式,您可以将所有内容都视为字符,因此,除了在获取grep函数的结果并将其解析为数字形式时,您不必在数字和字符模式之间来回穿梭比较。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM