从python中的字符串中提取特定的模式

Question

I have below data in a column of Dataframe(Contains approx 100 Rows). 我在Dataframe列中包含以下数据（包含大约100行）。

Need to Extract CK string (CK-36799-1523333) from DF for each row. 需要从DF中为每一行提取CK字符串（CK-36799-1523333）。

Note: receipt_id is not fixed.Ck data may contains in some different variable. 注意：received_id不是固定的。Ck数据可能包含一些不同的变量。

Data: 数据：

{"currency":"US","Cost":129,"receipt_id":"CK-36799-1523333","af_customer_user_id":"33738413"}

{"currency":"INR","Cost":429,"receipt_id":"CK-33711-15293046","af_customer_user_id":"33738414"}

{"currency":"US","Cost":229,"receipt_id":"CK-36798-1523333","af_customer_user_id":"33738423"}

{"currency":"INR","Cost":829,"receipt_id":"CK-33716-152930456","af_customer_user_id":"33738214"}

  {"currency":"INR","Cost":829,"order_id":"CK-33716-152930456","af_customer_user_id":"33738214"}

  {"currency":"INR","Cost":829,"suborder_id":"CK-33716-152930456","af_customer_user_id":"33738214"}

Result 结果

CK-36799-1523333
CK-33711-15293046
CK-36798-1523333
CK-33716-152930456

I tried str.find('CK-') function but Not getting Expected result.Need Suggestions 我尝试了str.find（'CK-'）函数，但没有得到预期的结果。

Answer 1

Use Series.str.extract : 使用Series.str.extract ：

df['new'] = df['col'].str.extract(r"(CK\-\d+\-\d+)", expand=False).fillna('no match')
print (df)
                                                 col                 new
0  {"currency":"US","Cost":129,"receipt_id":"CK-3...    CK-36799-1523333
1  {"currency":"INR","Cost":429,"receipt_id":"CK-...   CK-33711-15293046
2  {"currency":"US","Cost":229,"receipt_id":"CK-3...    CK-36798-1523333
3  {"currency":"INR","Cost":829,"receipt_id":"CK-...  CK-33716-152930456
4    {"currency":"INR","Cost":829,"order_id":"CK-...  CK-33716-152930456
5    {"currency":"INR","Cost":829,"suborder_id":"...  CK-33716-152930456

Another solution is loop by dictionaries and select first match, if not exist, add default value: 另一个解决方案是按字典循环并选择第一个匹配项（如果不存在），添加默认值：

import ast

f = lambda x: next(iter(v for v in ast.literal_eval(x).values() 
                        if str(v).startswith('CK-')), 'no match')
df['new'] = df['col'].apply(f)

Answer 2

Try using regular expressions 尝试使用正则表达式

import re

...
for line in data:
    res = re.findall(r"CK\-[0-9]+\-[0-9]+", line)
    if len(res) != 0:
        print(res[0])

Answer 3

Let suppose this is a csv file then we can find it like this code. 假设这是一个csv文件，那么我们可以像下面的代码一样找到它。

import re

pattern = re.compile(r'CK-36799-1523333)')
ck_list = []

with open('ck.csv', 'r') as f:  ## where ck.csv is the file you shared above
    for i in f:
        if pattern.search(i):
            ck_list.append(i.split(',')[0].strip())

从python中的字符串中提取特定的模式

问题描述

3 个解决方案

解决方案1
0 2019-03-06 11:58:43

解决方案2
0 2019-03-06 12:18:53

解决方案3
0 2019-03-06 13:37:02

从python中的字符串中提取特定的模式

问题描述

3 个解决方案

解决方案1 0 2019-03-06 11:58:43

解决方案2 0 2019-03-06 12:18:53

解决方案3 0 2019-03-06 13:37:02

解决方案1
0 2019-03-06 11:58:43

解决方案2
0 2019-03-06 12:18:53

解决方案3
0 2019-03-06 13:37:02