R：从KeyValue列表中提取不同的模式

Question

I have a dataset which looks similar to: 我有一个类似于的数据集：

quest<-data.frame(city=c("Atlanta","New York","Atlanta","Tampa"), key_value=c("rev=63;code=ATL;qty=1;zip=45987","rev=10.60|34;qty=1|2;zip=12686|12694;code=NY","code=ATL;rev=12;qty=1;zip=74268","rev=3|24|8;qty=1|6|3;code=TPA;zip=33684|36842|30254"))

which corresponds to: 对应于：

    city                                           key_value
1  Atlanta                     rev=63;code=ATL;qty=1;zip=45987
2 New York        rev=10.60|34;qty=1|2;zip=12686|12694;code=NY
3  Atlanta                     code=ATL;rev=12;qty=1;zip=74268
4    Tampa rev=3|24|8;qty=1|6|3;code=TPA;zip=33684|36842|30254

I am trying to extract only one of the key value pattern ("code") out of the data which looks like the below: 我正在尝试从看起来像下面的数据中仅提取键值模式（“代码”）之一：

      city code
1  Atlanta  ATL
2 New York   NY
3  Atlanta  ATL
4    Tampa  TPA

Answer 1

We can do this with Regex using a positive lookbehind 我们可以使用正则表达式来使用Regex做到这一点

quest$code <- gsub(".*(?<=code=)(\\w+)(;|$).*", "\\1", quest$key_value, perl = TRUE)

.* - Match up to our lookbehind .* -与我们的后代相匹配

(?<=code=) - match the place in the string where the preceding characters are "code=" (?<=code=) -匹配字符串中前面的字符为“ code =“的位置

(\\\\w+) - match the code and capture it in group one. (\\\\w+) -匹配代码并将其捕获到第一组中。

(;|$) - match a semi-colon or the end of the string (in the case of NY there is no semi-colon afterwards) (;|$) -匹配分号或字符串的末尾（对于NY，此后没有分号）

.* - match the remainder of the string .* -匹配字符串的其余部分

      city                                           key_value code
1  Atlanta                     rev=63;code=ATL;qty=1;zip=45987  ATL
2 New York        rev=10.60|34;qty=1|2;zip=12686|12694;code=NY   NY
3  Atlanta                     code=ATL;rev=12;qty=1;zip=74268  ATL
4    Tampa rev=3|24|8;qty=1|6|3;code=TPA;zip=33684|36842|30254  TPA

Live example 现场例子

https://regex101.com/r/UM7Cim/4 https://regex101.com/r/UM7Cim/4

Answer 2

You can use strcapture which returns the captured parts of regexes: 您可以使用strcapture返回捕获的正则表达式部分：

cbind(quest, 
   strcapture(
     "code=([^;]*)",
     quest$key_value,
     data.frame(code=character())))

the regex "code=([^;]*)" looks for the text code= and then captures everything that isn't a semicolon. regex "code=([^;]*)"查找文本code= ，然后捕获所有不是分号的内容。 The data frame argument specifies the name and type of the returned value. 数据框参数指定返回值的名称和类型。 Here I use cbind to return a data frame with an extra column. 在这里，我使用cbind返回带有额外列的数据帧。

> cbind(quest, strcapture("code=([^;]*)",quest$key_value,data.frame(code=character())))
      city                                           key_value code
1  Atlanta                     rev=63;code=ATL;qty=1;zip=45987  ATL
2 New York        rev=10.60|34;qty=1|2;zip=12686|12694;code=NY   NY
3  Atlanta                     code=ATL;rev=12;qty=1;zip=74268  ATL
4    Tampa rev=3|24|8;qty=1|6|3;code=TPA;zip=33684|36842|30254  TPA

R：从KeyValue列表中提取不同的模式

问题描述

2 个解决方案

解决方案1
2 2017-11-30 22:48:28

解决方案2
2 已采纳 2017-11-30 23:21:17

R：从KeyValue列表中提取不同的模式

问题描述

2 个解决方案

解决方案1 2 2017-11-30 22:48:28

解决方案2 2 已采纳 2017-11-30 23:21:17

解决方案1
2 2017-11-30 22:48:28

解决方案2
2 已采纳 2017-11-30 23:21:17