[英]How to remove fixed digits in column in R that also has words?
I have a vector that has a series of numbers and words. 我有一个包含一系列数字和单词的向量。
df <- as.character(c(1234, "Other", 5678, "Abstain"))
I would like to remove the last two digits of the numbers without affecting the words in the string. 我想删除数字的最后两位而不影响字符串中的单词。
df <- as.character(c(12, "Other", 56, "Abstain"))
Probably a bit more robust/versatile/safe than the solution suggested by @r2evans in the comments. 可能比@ r2evans在评论中建议的解决方案更加健壮/多功能/安全。
gsub( "(\\d{2,})\\d{2}$", "\\1", df)
what it does: 它能做什么:
pattern = "(^\\\\d{2,})\\\\d{2}$"
模式 = "(^\\\\d{2,})\\\\d{2}$"
^
matches the start of the string ^
匹配字符串的开头 \\\\d{2,}
matches any substring of at least two digits (delete the comma of you only want to match strings of the exact length of 4 digits) \\\\d{2,}
匹配至少两位数字的任何子字符串(删除您的逗号,只希望匹配长度为4位数字的字符串) (^\\\\d{2,})
the round brackets define the start from the string and the following repetition of minimal two digits as a group. (^\\\\d{2,})
圆括号定义了从字符串开始以及随后的重复的最少两位数字的组合。 \\\\d{2}
a repetition of exactly two digits \\\\d{2}
精确地重复两位数 $
matches the end of a string $
匹配字符串的结尾 in short: it matches any string that exits solely of digits, that starts with a minimum of two digits, andd ends with two digits (so the minimum length of the digit string = 4) 简而言之:它匹配任何以数字结尾的字符串,该字符串以至少两位数字开头,以d结尾两位数字(因此,数字字符串的最小长度= 4)
replacement = "\\\\1"
替换 = "\\\\1"
(^\\\\d{2,})
) from the above described pattern. 替换上述模式中第一个定义组( (^\\\\d{2,})
)中的整个匹配字符串。 df <- c(123, "Other", 5678, "Abstain", "b12345", 123456, "123aa345")
gsub("(^\\d{2,})\\d{2}$", "\\1", df)
#[1] "123" "Other" "56" "Abstain" "b12345" "1234" "123aa345"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.