[英]Extract specific Pattern From a String in python
I have below data in a column of Dataframe(Contains approx 100 Rows). 我在Dataframe列中包含以下数据(包含大约100行)。
Need to Extract CK string (CK-36799-1523333) from DF for each row. 需要从DF中为每一行提取CK字符串(CK-36799-1523333)。
Data: 数据:
{"currency":"US","Cost":129,"receipt_id":"CK-36799-1523333","af_customer_user_id":"33738413"}
{"currency":"INR","Cost":429,"receipt_id":"CK-33711-15293046","af_customer_user_id":"33738414"}
{"currency":"US","Cost":229,"receipt_id":"CK-36798-1523333","af_customer_user_id":"33738423"}
{"currency":"INR","Cost":829,"receipt_id":"CK-33716-152930456","af_customer_user_id":"33738214"}
{"currency":"INR","Cost":829,"order_id":"CK-33716-152930456","af_customer_user_id":"33738214"}
{"currency":"INR","Cost":829,"suborder_id":"CK-33716-152930456","af_customer_user_id":"33738214"}
Result 结果
CK-36799-1523333
CK-33711-15293046
CK-36798-1523333
CK-33716-152930456
I tried str.find('CK-') function but Not getting Expected result.Need Suggestions 我尝试了str.find('CK-')函数,但没有得到预期的结果。
Use Series.str.extract
: 使用
Series.str.extract
:
df['new'] = df['col'].str.extract(r"(CK\-\d+\-\d+)", expand=False).fillna('no match')
print (df)
col new
0 {"currency":"US","Cost":129,"receipt_id":"CK-3... CK-36799-1523333
1 {"currency":"INR","Cost":429,"receipt_id":"CK-... CK-33711-15293046
2 {"currency":"US","Cost":229,"receipt_id":"CK-3... CK-36798-1523333
3 {"currency":"INR","Cost":829,"receipt_id":"CK-... CK-33716-152930456
4 {"currency":"INR","Cost":829,"order_id":"CK-... CK-33716-152930456
5 {"currency":"INR","Cost":829,"suborder_id":"... CK-33716-152930456
Another solution is loop by dictionaries and select first match, if not exist, add default value: 另一个解决方案是按字典循环并选择第一个匹配项(如果不存在),添加默认值:
import ast
f = lambda x: next(iter(v for v in ast.literal_eval(x).values()
if str(v).startswith('CK-')), 'no match')
df['new'] = df['col'].apply(f)
Try using regular expressions 尝试使用正则表达式
import re
...
for line in data:
res = re.findall(r"CK\-[0-9]+\-[0-9]+", line)
if len(res) != 0:
print(res[0])
Let suppose this is a csv file then we can find it like this code. 假设这是一个csv文件,那么我们可以像下面的代码一样找到它。
import re
pattern = re.compile(r'CK-36799-1523333)')
ck_list = []
with open('ck.csv', 'r') as f: ## where ck.csv is the file you shared above
for i in f:
if pattern.search(i):
ck_list.append(i.split(',')[0].strip())
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.