[英]R - grab the exact 8 digits number in a string and transform it
I have 2 problems in extracting and transforming data using R. Here's the dataset: 我在使用R提取和转换数据时遇到两个问题。这是数据集:
messageID | msg
1111111111 | hey id 18271801, fix it asap
2222222222 | please fix it soon id12901991 and 91222911. dissapointed
3333333333 | wow $300 expensive man, come on
4444444444 | number 2837169119 test
The problem is: 问题是:
as.matrix(unlist(apply(df[2],1,function(x){regmatches(x,gregexpr('([0-9]){8}', x))})))
. 。
However, with this line of code, message 444... is included because is contains more than 8 digits number. 但是,在此代码行中,包含消息444 ...,因为它包含多于8位数字。
message_id | customer_ID 1111111111 | 18271801 2222222222 | 12901991 2222222222 | 91222911
I don't know how to efficiently transform the data. 我不知道如何有效地转换数据。 The output of
dput(df)
: dput(df)
的输出:
structure(list(id = c(1111111111, 2222222222, 3333333333, 4444444444 ), msg = c("hey id 18271801, fix it asap", "please fix it soon id12901991 and 91222911. dissapointed", "wow $300 expensive man, come on", "number 2837169119 test")), .Names = c("id", "msg"), row.names = c(NA, 4L), class = "data.frame")
Thanks 谢谢
Use rebus
to create your regular expression, and stringr
to extract the matches. 使用
rebus
创建正则表达式,并使用stringr
提取匹配项。
You may need to play with the exact form of the regular expression. 您可能需要使用正则表达式的确切形式。 This code works on your examples, but you'll probably need to adapt it for your dataset.
这段代码适用于您的示例,但是您可能需要对其进行调整以适合您的数据集。
library(rebus)
library(stringr)
# Create regex
rx <- negative_lookbehind(DGT) %R%
dgt(8) %R%
negative_lookahead(DGT)
rx
## <regex> (?<!\d)[\d]{8}(?!\d)
# Extract the IDs
extracted_ids <- str_extract_all(df$msg, perl(rx))
# Stuff the IDs into a data frame.
data.frame(
messageID = rep(
df$id,
vapply(extracted_ids, length, integer(1))
),
extractedID = unlist(extracted_ids, use.names = FALSE)
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.