简体   繁体   English

在数据框列中剪切字符串,直到某些字符串但包括

[英]Cut string in dataframe column until certain string but including

I have similar data as the following: 我有以下类似数据:

df = pd.DataFrame({'pagePath':['/my/retour/details/n8hWu7iWtuRXzSvDvCAUZRAlPda6LM/', 
                               '/my/orders/details/151726/', 
                               '/my/retours/retourmethod/']})
print(df)
                                            pagePath
0  /my/retour/details/n8hWu7iWtuRXzSvDvCAUZRAlPda...
1                         /my/orders/details/151726/
2                          /my/retours/retourmethod/

What I want to do is to cut the string until (but including) details 我想要做的是将字符串切割到(但包括) details为止

Expected output 预期产量

                    pagePath
0  /my/retour/details/
1  /my/orders/details/
2  /my/retours/retourmethod/

The following works , but its slow 以下工作 ,但速度慢

df['pagePath'] = np.where(df.pagePath.str.contains('details'),
                          df.pagePath.apply(lambda x: x[0:x.find('details')+8]), 
                          df.pagePath)

print(df)

                    pagePath
0        /my/retour/details/
1        /my/orders/details/
2  /my/retours/retourmethod/

I tried regex , but could only get it to work excluding : 我试过正则表达式 ,但只能使其工作,但不包括

df['pagePath'] = np.where(df.pagePath.str.contains('details'),
                          df.pagePath.str.extract('(.+?(?=details))'), 
                          df.pagePath)

print(df)
      pagePath
0  /my/retour/
1  /my/orders/
2          NaN

Plus the regex code returns NaN , when the row does not contain details 当行不包含details时,正则表达式代码将返回NaN

So I feel there's an easier and more elegant way for this. 因此,我觉得有一种更简单,更优雅的方法。 How would I write a regex code to solve my problem? 如何编写正则表达式代码来解决我的问题? Or is my solution already sufficient? 还是我的解决方案已经足够了?

Would you like to try str.extract 您想尝试str.extract

('/'+df.pagePath.str.extract('/(.*)details')+'details')[0].fillna(df.pagePath)
Out[130]: 
0           /my/retour/details
1           /my/orders/details
2    /my/retours/retourmethod/
Name: 0, dtype: object

All you need to do is provide a fallback in the regex for when there is no 'details' : 您需要做的就是在没有'details'在正则表达式中提供一个备用项:

>>> df.pagePath.str.extract('(.+?details/?|.*)')

                           0
0        /my/retour/details/
1        /my/orders/details/
2  /my/retours/retourmethod/

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在 dataframe 的列中的某个 position 中添加字符串 - Add string in a certain position in column in dataframe 检查Dataframe是否存在某个字符串,并返回找到该字符串的列的列标题 - Check Dataframe for certain string and return the column headers of the columns that string is found in 如何在Pandas数据帧中进行包括空格分隔符的列字符串连接? - How to do column string concatenation including space separator in Pandas dataframe? 在 Pandas dataframe 列中搜索包含标点符号的确切字符串? - Search Pandas dataframe column for exact string including punctuation? 熊猫-在字符串列中某个字符之后“剪切”所有内容并将其粘贴到列的开头 - Pandas - 'cut' everything after a certain character in a string column and paste it in the beginning of the column 如何从 dataframe 中的一列中剪切 substring 并与另一列中的字符串连接? - How to cut a substring from one column in dataframe and concat with string in another column? 仅对pandas数据框中的某些列名称进行字符串操作 - String operations on only certain column names in pandas dataframe 如何从 pandas dataframe 中的列中删除某些字符串 - How to remove certain string from column in pandas dataframe 如何在dataframe列中的某些字符后提取整个字符串部分? - How to extract entire part of string after certain character in dataframe column? 从pandas数据帧中的整列中删除某些字符串 - Remove certain string from entire column in pandas dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM