简体   繁体   English

在 python 中使用正则表达式提取两个字符串之间的文本

[英]Extracting text between two strings using regex in python

I am trying to extract a value corresponding to value in the below dataset using regex.我正在尝试使用正则表达式提取与以下数据集中的值相对应的值。 Given below is how my data looks like:下面给出的是我的数据的样子:

[{'self': 'text123', 'value': 'Keyword 1', 'id': '201'},
 {'self': 'text234', 'value': 'Keyword 2', 'id': '202'}, 
 {'self': 'text456', 'value': 'Keyword 3', 'id': '203'}, 
 {'self': 'text789', 'value': 'Keywork 4', 'id': '204'}]

This is what I tried:这是我试过的:

re.findall(r'value (.*?) id', data)

The above code throws an error TypeError: expected string or bytes-like object上面的代码抛出错误TypeError: expected string or bytes-like object

Expected output:预计 output:

Keyword 1, Keyword 2, Keyword 3, Keyword 4 

This would probably work better if done with a json deserializer but if you really want to use a regex I tried this one and it worked.如果使用 json 反序列化器,这可能会更好,但如果你真的想使用正则表达式,我试过这个并且它有效。 It is super clunky but it works.它非常笨重,但它确实有效。

\'value\': '(.*?)', \'id\'

Full code:完整代码:

import re

data = "[{'self': 'text123', 'value': 'Keyword 1', 'id': '201'}, \
 {'self': 'text234', 'value': 'Keyword 2', 'id': '202'}, \
 {'self': 'text456', 'value': 'Keyword 3', 'id': '203'}, \
 {'self': 'text789', 'value': 'Keywork 4', 'id': '204'}]"

print(re.findall(r"\'value\': '(.*?)', \'id\'", data))

If you have dictionaries stored in a column as a string, you still could access their values without regex:如果您将字典作为字符串存储在列中,您仍然可以在没有正则表达式的情况下访问它们的值:

def extract_keyword(s):
    result = []
    for d in eval(s):
        result.append(d["value"])
    return ", ".join(result)


df = pd.DataFrame({
    "col": ["""[{'self': 'text123', 'value': 'Keyword 1', 'id': '201'},
 {'self': 'text234', 'value': 'Keyword 2', 'id': '202'}, 
 {'self': 'text456', 'value': 'Keyword 3', 'id': '203'}, 
 {'self': 'text789', 'value': 'Keywork 4', 'id': '204'}]"""]
})


df["col"].apply(extract_keyword)
0    Keyword 1, Keyword 2, Keyword 3, Keywork 4
Name: col, dtype: object

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM