[英]Extracting text between two strings using regex in python
I am trying to extract a value corresponding to value in the below dataset using regex.我正在尝试使用正则表达式提取与以下数据集中的值相对应的值。 Given below is how my data looks like:
下面给出的是我的数据的样子:
[{'self': 'text123', 'value': 'Keyword 1', 'id': '201'},
{'self': 'text234', 'value': 'Keyword 2', 'id': '202'},
{'self': 'text456', 'value': 'Keyword 3', 'id': '203'},
{'self': 'text789', 'value': 'Keywork 4', 'id': '204'}]
This is what I tried:这是我试过的:
re.findall(r'value (.*?) id', data)
The above code throws an error TypeError: expected string or bytes-like object
上面的代码抛出错误
TypeError: expected string or bytes-like object
Expected output:预计 output:
Keyword 1, Keyword 2, Keyword 3, Keyword 4
This would probably work better if done with a json deserializer but if you really want to use a regex I tried this one and it worked.如果使用 json 反序列化器,这可能会更好,但如果你真的想使用正则表达式,我试过这个并且它有效。 It is super clunky but it works.
它非常笨重,但它确实有效。
\'value\': '(.*?)', \'id\'
Full code:完整代码:
import re
data = "[{'self': 'text123', 'value': 'Keyword 1', 'id': '201'}, \
{'self': 'text234', 'value': 'Keyword 2', 'id': '202'}, \
{'self': 'text456', 'value': 'Keyword 3', 'id': '203'}, \
{'self': 'text789', 'value': 'Keywork 4', 'id': '204'}]"
print(re.findall(r"\'value\': '(.*?)', \'id\'", data))
If you have dictionaries stored in a column as a string, you still could access their values without regex:如果您将字典作为字符串存储在列中,您仍然可以在没有正则表达式的情况下访问它们的值:
def extract_keyword(s):
result = []
for d in eval(s):
result.append(d["value"])
return ", ".join(result)
df = pd.DataFrame({
"col": ["""[{'self': 'text123', 'value': 'Keyword 1', 'id': '201'},
{'self': 'text234', 'value': 'Keyword 2', 'id': '202'},
{'self': 'text456', 'value': 'Keyword 3', 'id': '203'},
{'self': 'text789', 'value': 'Keywork 4', 'id': '204'}]"""]
})
df["col"].apply(extract_keyword)
0 Keyword 1, Keyword 2, Keyword 3, Keywork 4
Name: col, dtype: object
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.