[英]R Removing words from a string in a dataframe
Let's say I have the following dataset:假设我有以下数据集:
Date_Received = c("Addition 1/2/2018", "Swimming Pool 1/8/2018", "Abandonment 1/9/2018", "Existing Approval 3/14/2018", "Holding Tank 5/11/2018")
Date_Approved = c("1/2/2018", "1/8/2018", "1/9/2018", "SB 3/21/2018", "JW 5/11/2018")
And I want to removed the characters
before the date
in the Date_Received
column, so that I can later convert it to date
type data format using lubridate
.我想删除
Date_Received
列中date
之前的characters
,以便以后可以使用lubridate
将其转换为date
类型数据格式。
I tried using the following code but it only removes the first
uppercase alphabet.我尝试使用以下代码,但它只删除了
first
大写字母。
How can I fix this?我怎样才能解决这个问题?
Desired Output:期望的输出:
Date_Received Date_Approved
1/2/2018 1/2/2018
1/8/2018 1/8/2018
1/9/2018 1/9/2018
3/14/2018 SB 3/21/2018
5/11/2018 JW 5/11/2018
Code代码
library(tidyverse)
df = data.frame(Date_Received, Date_Approved)
df= df%>% mutate(Date.Received = trimws(Date_Received, whitespace = "[A-Z]*\\s*")) %>% filter(nzchar(Date.Received))
We can use trimws
, which has a whitespace argument (as you used in your code) that can be used to specify the whitespace.我们可以使用
trimws
,它有一个空格参数(如您在代码中使用的那样),可用于指定空格。
library(dplyr)
df %>%
mutate(Date_Received = trimws(Date_Received, "left", "\\D"))
Or with str_replace_all
:或使用
str_replace_all
:
library(stringr)
df %>%
mutate(Date_Received = str_replace_all(Date_Received, "^\\D+", ""))
Output输出
Date_Received Date_Approved
1 1/2/2018 1/2/2018
2 1/8/2018 1/8/2018
3 1/9/2018 1/9/2018
4 3/14/2018 SB 3/21/2018
5 5/11/2018 JW 5/11/2018
Another option using sub
:使用
sub
的另一个选项:
df$Date_Received <- sub("^\\D+", "", df$Date_Received)
Keep life simple:让生活简单:
Date_Received = c("Addition 1/2/2018", "Swimming Pool 1/8/2018", "Abandonment 1/9/2018", "Existing Approval 3/14/2018", "Holding Tank 5/11/2018")
stringr::word(Date_Received, -1)
[1] "1/2/2018" "1/8/2018" "1/9/2018" "3/14/2018" "5/11/2018"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.