Input is
the row 3 and row 5 had incorrtct format, if I want
sale_date | produst_model | store_code |
---|---|---|
20210208 | ASUS_DE552 | AAE_08072 |
20210305 | ASUS_AC693 | AAE_08072 |
20210107 | ASUS_DE551 | AAR_7461 |
20210325 | ASUS_DB341 | CMHT_654 |
20210227 | ASUS_HG0982 | BR_981 |
If this table have 20,000 rows, Do I have more efficiency way to check every row is match rule?
From looking at the data posted my hunch is that the strings in the three columns were at some point extracted from a composite string such as 20210227_ASUS_HG0982_BR_981
but the extraction seems to have gone wrong in some places. If this assumption is correct then I would recommend going back to the original strings and fixing the extraction, for example like this using the extract
function:
library(tidyverse)
data.frame(original) %>%
extract(original,
into = c("sale_date", "produst_model", "store_code"),
regex = "(\\d+)_(\\w+\\d+)_(\\w+)")
sale_date produst_model store_code
1 20210227 ASUS_HG0982 BR_981
Data:
original = "20210227_ASUS_HG0982_BR_981"
Obviously, the regex here is based only on a single string and will likely have to be adapted as soon as you have more strings.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.