My data looks like this:
L/S Price
$555,000Previous Price: $575,000
$865,000Previous Price: $875,000
$995,000
$1,325,000Previous Price: $1,459,000
The result I want is this:
555000
865000
995000
1325000
The best regex I could come up with was ([0-9,])+
but that has several problems, such as also matching the "Previous Price" which is just noise. I was including the comma in my regex so that I can match the entire price, even though I need to remove the comma eventually.
Alternately, I am thinking that I can select the part I DON'T want with something like ([a-zA-Z]).+
then remove it, though I'm having trouble implementing this.
Here's a dput
:
> dput(mls_res$`L/S Price`[1:4])
c("$555,000Previous Price: $575,000", "$865,000Previous Price: $875,000",
"$995,000 ", "$1,325,000Previous Price: $1,459,000")
With library stringr
, you can do something like this:
library(stringr)
df <- c('$555,000Previous Price: $575,000', '$865,000Previous Price: $875,000', '$995,000', '$1,325,000Previous Price: $1,459,000')
as.numeric(gsub('\\$|,', '', str_extract(df, '^\\$[0-9,]*')))
This seems simple and involves no packages. It removes P and everything thereafter and then removes all non-digits from what is left. Finally it converts that to numeric.
as.numeric(gsub("\\D", "", sub("P.*", "", s)))
## [1] 555000 865000 995000 1325000
If the last digit may be followed by some other letter than P then replace P with [[:alpha:]]
.
Note: We used this input:
s <- c("$555,000Previous Price: $575,000", "$865,000Previous Price: $875,000",
"$995,000 ", "$1,325,000Previous Price: $1,459,000")
We can either use capture groups ( (...)
) to capture the numeric elements from the string and then replace it with backreference of the captured group
as.numeric(gsub("^\\D*([0-9]+),*([0-9]+),([0-9]+).*", "\\1\\2\\3", str1))
#[1] 555000 865000 995000 1325000
Or just match the non-numeric characters and replace it with ""
.
as.numeric(gsub("[$,]|[[:alpha:]]+.*", "", str1))
#[1] 555000 865000 995000 1325000
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.