简体   繁体   English

使用R中的正则表达式从字符串中提取信息

[英]extract information from string using regex in R

I have data like this i want to extract some information from x and y 我有这样的数据,我想从x和y中提取一些信息

x= "{\"device_codename\": \"nikel\", \"brand\": \"Xiaomi\"}" 
y= {"percent_incoming_nighttime": 0.88, "percent_outgoing_daytime": 9.29}

The result 结果

device_codename   brand     percent_incoming_nighttime percent_outgoing_daytime
nikel             Xiaomi    0.88                       9.29

I have tired using grep but iam getting errors any suggestion? 我已经厌倦了使用grep,但是我收到任何建议的错误?

grep("device_codename", x, perl=TRUE, value=TRUE)

This is possibly JSON format. 这可能是JSON格式。 There are tools to handle those. 有处理这些的工具。

library(jsonlite)

x = "{\"device_codename\": \"nikel\", \"brand\": \"Xiaomi\"}" 
y = '{"percent_incoming_nighttime": 0.88, "percent_outgoing_daytime": 9.29}'

> unlist(fromJSON(x))
device_codename           brand 
        "nikel"        "Xiaomi" 
> unlist(fromJSON(y))
percent_incoming_nighttime   percent_outgoing_daytime 
                      0.88                       9.29

After removing the braces ( {} ) and double quotes with gsub , read the substring after the : using read.csv into a data.frame and then change the column names with the substring ie before the : gsub删除大括号( {} )和双引号后,在读取之后的子字符串:使用read.csvdata.frame ,然后使用子字符串更改列名称,即在:之前:

v1 <- gsub('"|[{}]', "", c(x, y))
out <- read.csv(text=paste(gsub("\\w+:\\s+", "", v1), collapse=", "),
       header=FALSE, stringsAsFactors = FALSE)
colnames(out) <- unlist(regmatches(v1, gregexpr("\\w+(?=:)", v1, perl = TRUE)))


out
#  device_codename   brand percent_incoming_nighttime percent_outgoing_daytime
#1           nikel  Xiaomi                       0.88                     9.29

NOTE: No external packages used 注意:不使用外部软件包


Or using RJSONIO and tidyverse 或使用RJSONIOtidyverse

library(tidyverse)
library(RJSONIO)
list(x, y) %>%
    map(~ fromJSON(.x) %>% 
            as.list %>%
            as_tibble) %>%
       bind_cols
# A tibble: 1 x 4
#  device_codename brand  percent_incoming_nighttime percent_outgoing_daytime
#  <chr>           <chr>                       <dbl>                    <dbl>
#1 nikel           Xiaomi                       0.88                     9.29

data 数据

x <- "{\"device_codename\": \"nikel\", \"brand\": \"Xiaomi\"}"
y <- "{\"percent_incoming_nighttime\": 0.88, \"percent_outgoing_daytime\": 9.29}"

completed jsonlite solution (Roman Luštrik) 完整的jsonlite解决方案(RomanLuštrik)

library(jsonlite)
library(dplyr)

xx_x= "{\"device_codename\": \"nikel\", \"brand\": \"Xiaomi\"}" 
xx_y= "{\"percent_incoming_nighttime\": 0.88, \"percent_outgoing_daytime\": 9.29}"

c(jsonlite::fromJSON(xx_x), jsonlite::fromJSON(xx_y)) %>% 
  reshape2::melt() %>% mutate(myrow = 1) %>% 
  spread(L1, value)

result 结果

  myrow  brand device_codename percent_incoming_nighttime percent_outgoing_daytime
1     1 Xiaomi           nikel                       0.88                     9.29

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM