简体   繁体   English

如何删除也有单词的R列中的固定数字?

[英]How to remove fixed digits in column in R that also has words?

I have a vector that has a series of numbers and words. 我有一个包含一系列数字和单词的向量。

df <- as.character(c(1234, "Other", 5678, "Abstain"))

I would like to remove the last two digits of the numbers without affecting the words in the string. 我想删除数字的最后两位而不影响字符串中的单词。

df <- as.character(c(12, "Other", 56, "Abstain"))

Probably a bit more robust/versatile/safe than the solution suggested by @r2evans in the comments. 可能比@ r2evans在评论中建议的解决方案更加健壮/多功能/安全。

gsub( "(\\d{2,})\\d{2}$", "\\1", df)

what it does: 它能做什么:

pattern = "(^\\\\d{2,})\\\\d{2}$" 模式 = "(^\\\\d{2,})\\\\d{2}$"

  • ^ matches the start of the string ^匹配字符串的开头
  • \\\\d{2,} matches any substring of at least two digits (delete the comma of you only want to match strings of the exact length of 4 digits) \\\\d{2,}匹配至少两位数字的任何子字符串(删除您的逗号,只希望匹配长度为4位数字的字符串)
  • (^\\\\d{2,}) the round brackets define the start from the string and the following repetition of minimal two digits as a group. (^\\\\d{2,})圆括号定义了从字符串开始以及随后的重复的最少两位数字的组合。
  • \\\\d{2} a repetition of exactly two digits \\\\d{2}精确地重复两位数
  • $ matches the end of a string $匹配字符串的结尾

in short: it matches any string that exits solely of digits, that starts with a minimum of two digits, andd ends with two digits (so the minimum length of the digit string = 4) 简而言之:它匹配任何以数字结尾的字符串,该字符串以至少两位数字开头,以d结尾两位数字(因此,数字字符串的最小长度= 4)

replacement = "\\\\1" 替换 = "\\\\1"

  • replaces the entire matches string woth the first defind group ( (^\\\\d{2,}) ) from the above described pattern. 替换上述模式中第一个定义组( (^\\\\d{2,}) )中的整个匹配字符串。

sample data 样本数据

df <- c(123, "Other", 5678, "Abstain", "b12345", 123456, "123aa345")

gsub("(^\\d{2,})\\d{2}$", "\\1", df)
#[1] "123"      "Other"    "56"       "Abstain"  "b12345"   "1234"     "123aa345" 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM