R - grab the exact 8 digits number in a string and transform it

Question

I have 2 problems in extracting and transforming data using R. Here's the dataset:

messageID | msg
1111111111 | hey id 18271801, fix it asap
2222222222 | please fix it soon id12901991 and 91222911. dissapointed
3333333333 | wow $300 expensive man, come on
4444444444 | number 2837169119 test

The problem is:

I want to grab the number with only 8 digits length. In the dataset above, message id 3333...(300 - 3 digits) and 4444...(2837169119 - 10 digits) should not included. And here's my best shot so far:

 as.matrix(unlist(apply(df[2],1,function(x){regmatches(x,gregexpr('([0-9]){8}', x))})))

.
However, with this line of code, message 444... is included because is contains more than 8 digits number.

Transform the data to another form like this:

 message_id | customer_ID 1111111111 | 18271801 2222222222 | 12901991 2222222222 | 91222911

I don't know how to efficiently transform the data. The output of dput(df) :

 structure(list(id = c(1111111111, 2222222222, 3333333333, 4444444444 ), msg = c("hey id 18271801, fix it asap", "please fix it soon id12901991 and 91222911. dissapointed", "wow $300 expensive man, come on", "number 2837169119 test")), .Names = c("id", "msg"), row.names = c(NA, 4L), class = "data.frame")

Thanks

Answer 1

Use rebus to create your regular expression, and stringr to extract the matches.

You may need to play with the exact form of the regular expression. This code works on your examples, but you'll probably need to adapt it for your dataset.

library(rebus)
library(stringr)

# Create regex
rx <- negative_lookbehind(DGT) %R%
  dgt(8) %R%  
  negative_lookahead(DGT)
rx
## <regex> (?<!\d)[\d]{8}(?!\d)

# Extract the IDs
extracted_ids <- str_extract_all(df$msg, perl(rx))

# Stuff the IDs into a data frame.
data.frame(
  messageID = rep(
    df$id, 
    vapply(extracted_ids, length, integer(1))
  ),
  extractedID = unlist(extracted_ids, use.names = FALSE)
)

R - grab the exact 8 digits number in a string and transform it

Question

1 answers

solution1
1 ACCPTED 2015-03-22 06:45:04

R - grab the exact 8 digits number in a string and transform it

Question

1 answers

solution1 1 ACCPTED 2015-03-22 06:45:04

solution1
1 ACCPTED 2015-03-22 06:45:04