I need to match author and time from string in R.
test = "Postedby BeauHDon Friday November 24, 2017 @10:30PM from the cost-effective dept."
I am currently using gsub()
to find the desired output.
Expected output would be:
#author
"BeauHDon"
#Month
"November"
#Date
24
#Time
22:30
I got to gsub("Postedby (.*).*", "\\\\1", test)
but the output is
"BeauHDon Friday November 24, 2017 @10:30PM from the cost-effective dept."
Also I understand time
requires more more coding after extracting 10:30
.
Is it possible to add 12
if next two string is PM
?
Thank you.
We can extract using capturing as a group (assuming that the patterns are as shown in the example). Here the pattern is to match one or more non-white spaces ( \\\\S+
) followed by spaces ( \\\\s+
) from the start ( ^
) of the string, followed by word which we capture in a group ( \\\\w+
), followed by capturing word after we skip the next word and space, then get the numbers ( (\\\\d+)
) and the time that follows the @
v1 <- scan(text=sub("^\\S+\\s+(\\w+)\\s+\\w+\\s+(\\w+)\\s+(\\d+)[^@]+@(\\S+).*",
"\\1,\\2,\\3,\\4", test), what = "", sep=",", quiet = TRUE)
As the last entry is time, we can convert it to datetime with strptime
and change the format
, assign it to the last element
v1[4] <- format(strptime(v1[4], "%I:%M %p"), "%H:%M")
If needed, set the names of the element with author, Month etc.
names(v1) <- c("#author", "#Month", "#Date", "#Time")
v1
# #author #Month #Date #Time
#"BeauHDon" "November" "24" "22:30"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.