在 python 中使用正则表达式提取两个字符串之间的文本

Question

I am trying to extract a value corresponding to value in the below dataset using regex.我正在尝试使用正则表达式提取与以下数据集中的值相对应的值。 Given below is how my data looks like:下面给出的是我的数据的样子：

[{'self': 'text123', 'value': 'Keyword 1', 'id': '201'},
 {'self': 'text234', 'value': 'Keyword 2', 'id': '202'}, 
 {'self': 'text456', 'value': 'Keyword 3', 'id': '203'}, 
 {'self': 'text789', 'value': 'Keywork 4', 'id': '204'}]

This is what I tried:这是我试过的：

re.findall(r'value (.*?) id', data)

The above code throws an error TypeError: expected string or bytes-like object上面的代码抛出错误TypeError: expected string or bytes-like object

Expected output:预计 output：

Keyword 1, Keyword 2, Keyword 3, Keyword 4

Answer 1

This would probably work better if done with a json deserializer but if you really want to use a regex I tried this one and it worked.如果使用 json 反序列化器，这可能会更好，但如果你真的想使用正则表达式，我试过这个并且它有效。 It is super clunky but it works.它非常笨重，但它确实有效。

\'value\': '(.*?)', \'id\'

Full code:完整代码：

import re

data = "[{'self': 'text123', 'value': 'Keyword 1', 'id': '201'}, \
 {'self': 'text234', 'value': 'Keyword 2', 'id': '202'}, \
 {'self': 'text456', 'value': 'Keyword 3', 'id': '203'}, \
 {'self': 'text789', 'value': 'Keywork 4', 'id': '204'}]"

print(re.findall(r"\'value\': '(.*?)', \'id\'", data))

Answer 2

If you have dictionaries stored in a column as a string, you still could access their values without regex:如果您将字典作为字符串存储在列中，您仍然可以在没有正则表达式的情况下访问它们的值：

def extract_keyword(s):
    result = []
    for d in eval(s):
        result.append(d["value"])
    return ", ".join(result)


df = pd.DataFrame({
    "col": ["""[{'self': 'text123', 'value': 'Keyword 1', 'id': '201'},
 {'self': 'text234', 'value': 'Keyword 2', 'id': '202'}, 
 {'self': 'text456', 'value': 'Keyword 3', 'id': '203'}, 
 {'self': 'text789', 'value': 'Keywork 4', 'id': '204'}]"""]
})


df["col"].apply(extract_keyword)

0    Keyword 1, Keyword 2, Keyword 3, Keywork 4
Name: col, dtype: object

在 python 中使用正则表达式提取两个字符串之间的文本

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-07-29 11:08:30

解决方案2
0 2020-07-29 11:18:32

在 python 中使用正则表达式提取两个字符串之间的文本

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-07-29 11:08:30

解决方案2 0 2020-07-29 11:18:32

解决方案1
1 已采纳 2020-07-29 11:08:30

解决方案2
0 2020-07-29 11:18:32