简体   繁体   English

R Function 查找并删除具有给定数值范围的字符串

[英]R Function to find and remove a string with its given numerical range

I have a column named "edited_address" in a df named CB_Edit.我在名为 CB_Edit 的 df 中有一个名为“edited_address”的列。 In "edited_address" column there are a multitude of addresses.在“edited_address”列中有许多地址。 Some addresses include a variable ("L#") which I want to completely remove from all of them which possess it.一些地址包含一个变量(“L#”),我想从所有拥有它的地址中完全删除它。 For instance:例如:

edited_address:编辑地址:

100 S Smith Street
200 S Smith L100 Street
300 S Smith Street
400 S L1 Smith Street
500 S Smith Street L999
600 N Jacobs Blvd
900 L53 Cascades Street

I want to remove the "L#" from the column.我想从列中删除“L#”。 There are two problems.有两个问题。 The first is that the L followed by the number range anywhere from 0-9999.第一个是 L 后跟数字范围从 0-9999 的任何地方。 The second is that the L# can be anywhere in the column cell.第二个是 L# 可以位于列单元格中的任何位置。 Let me know what I can do, thank you!让我知道我能做什么,谢谢!

I have tried several gsub functions, expecting to capture everything in the range.我尝试了几个 gsub 函数,希望捕获范围内的所有内容。 That did not happen.那没有发生。

We can use the following approach with sub() , for a base R option:对于基本 R 选项,我们可以将以下方法与sub()结合使用:

df$edited_address <- gsub("^\\s+|\\s+$", "", sub("\\s*\\bL\\d{1,4}\\b\\s*", " ", df$edited_address))
df

      edited_address
1 100 S Smith Street
2 200 S Smith Street
3 300 S Smith Street
4 400 S Smith Street
5 500 S Smith Street

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM