[英]Cut string in dataframe column until certain string but including
I have similar data as the following: 我有以下类似数据:
df = pd.DataFrame({'pagePath':['/my/retour/details/n8hWu7iWtuRXzSvDvCAUZRAlPda6LM/',
'/my/orders/details/151726/',
'/my/retours/retourmethod/']})
print(df)
pagePath
0 /my/retour/details/n8hWu7iWtuRXzSvDvCAUZRAlPda...
1 /my/orders/details/151726/
2 /my/retours/retourmethod/
What I want to do is to cut the string until (but including) details
我想要做的是将字符串切割到(但包括)
details
为止
Expected output 预期产量
pagePath
0 /my/retour/details/
1 /my/orders/details/
2 /my/retours/retourmethod/
The following works , but its slow 以下工作 ,但速度慢
df['pagePath'] = np.where(df.pagePath.str.contains('details'),
df.pagePath.apply(lambda x: x[0:x.find('details')+8]),
df.pagePath)
print(df)
pagePath
0 /my/retour/details/
1 /my/orders/details/
2 /my/retours/retourmethod/
I tried regex , but could only get it to work excluding : 我试过正则表达式 ,但只能使其工作,但不包括 :
df['pagePath'] = np.where(df.pagePath.str.contains('details'),
df.pagePath.str.extract('(.+?(?=details))'),
df.pagePath)
print(df)
pagePath
0 /my/retour/
1 /my/orders/
2 NaN
Plus the regex code returns NaN
, when the row does not contain details
当行不包含
details
时,正则表达式代码将返回NaN
So I feel there's an easier and more elegant way for this. 因此,我觉得有一种更简单,更优雅的方法。 How would I write a regex code to solve my problem?
如何编写正则表达式代码来解决我的问题? Or is my solution already sufficient?
还是我的解决方案已经足够了?
Would you like to try str.extract
您想尝试
str.extract
('/'+df.pagePath.str.extract('/(.*)details')+'details')[0].fillna(df.pagePath)
Out[130]:
0 /my/retour/details
1 /my/orders/details
2 /my/retours/retourmethod/
Name: 0, dtype: object
All you need to do is provide a fallback in the regex for when there is no 'details'
: 您需要做的就是在没有
'details'
在正则表达式中提供一个备用项:
>>> df.pagePath.str.extract('(.+?details/?|.*)')
0
0 /my/retour/details/
1 /my/orders/details/
2 /my/retours/retourmethod/
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.