[英]Extract all the character between space after a keyword in python dataframe
I have a dataframe with a string contains "order".我有一个 dataframe 字符串包含“订单”。 The order doesn't follow any format as it contains special character.
该订单不遵循任何格式,因为它包含特殊字符。 I want to extract order number with special character with it.
我想用它提取带有特殊字符的订单号。 The idea is to extract all the character after the keyword "order" till the next space in one query.
这个想法是提取关键字“order”之后的所有字符,直到一个查询中的下一个空格。
message Model
order 769707-134432 has reached EARLY. LG
Delivered : order 1765412456 Samsung
No New Updates : order RS1765123404 Sony
order #769707-41213-4355 is EARLY Dell
No New Updates : order 3FA1765404 Samsung
order #76923407 has reached EARLY LG
No New Updates : order R-176543123 Sony
Recheduled : order 100251283_415731301 Sony
order #9T_0312330 delivered Dell
order #000090223532 has arrived at pickup. LG
I required output should be我要求 output 应该是
message order Model
order 769707-134432 has reached EARLY 769707-134432 LG
Delivered : order 1765412456 1765412456 Samsung
No New Updates : order RS1765123404 RS1765123404 Sony
order #769707-41213-4355 is EARLY 769707-41213-4355 Dell
No New Updates : order 3FA1765404 3FA1765404 Samsung
order #76923407 has reached EARLY 76923407 LG
No New Updates : order R-176543123 R-176543123 Sony
Recheduled : order 100251283_415731301 1002283_4157301 Sony
order #9T_0312330 delivered 9T_0312330 Dell
order #000090223532 has arrived at pickup 000090223532 LG
When I tried using Regex, I am getting #000090223532 has
, 769707-
, 3FA
当我尝试使用正则表达式时,我得到
#000090223532 has
, 769707-
, 3FA
Using str.replace
we can try:使用
str.replace
我们可以尝试:
data["order"]= data["message"].str.replace("^.*\border #?(\S+)\b.*$", "\1")
In my opinion the cleanest way would be to use str.extract
:在我看来,最干净的方法是使用
str.extract
:
import pandas as pd
df = pd.DataFrame(dct)
df['order'] = df['message'].str.extract(r'order\s+\#?(\S+)')
print(df)
This yields这产生
message model order
0 order 769707-134432 has reached EARLY. LG 769707-134432
1 Delivered : order 1765412456 Samsung 1765412456
2 No New Updates : order RS1765123404 Sony RS1765123404
3 order #769707-41213-4355 is EARLY Dell 769707-41213-4355
4 No New Updates : order 3FA1765404 Samsung 3FA1765404
5 order #76923407 has reached EARLY LG 76923407
6 No New Updates : order R-176543123 Sony R-176543123
7 Recheduled : order 100251283_415731301 Sony 100251283_415731301
8 order #9T_0312330 delivered Dell 9T_0312330
9 order #000090223532 has arrived at pickup. LG 000090223532
See a demo for the expression on regex101.com .请参阅regex101.com上的表达式演示。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.