在python中的2个字符串之间提取子字符串

Question

I have a python dataframe with a string column that I want to separate into several more columns. 我有一个带有字符串列的python数据框，我想将其分成更多列。

Some rows of the DF look like this: DF的某些行如下所示：

COLUMN

ORDP//NAME/iwantthispart/REMI/MORE TEXT
/REMI/SOMEMORETEXT
/ORDP//NAME/iwantthispart/ADDR/SOMEADRESS
/BENM//NAME/iwantthispart/REMI/SOMEMORETEXT

So basically i want everything after '/NAME/' and up to the next '/'. 所以基本上我想在'/ NAME /'之后到下一个'/'之间的所有内容。 However. 然而。 Not every row has the '/NAME/iwantthispart/' field, as can be seen in the second row. 如第二行所示，并非每一行都有“ / NAME / iwantthispart /”字段。

I've tried using split functions, but ended up with the wrong results. 我试过使用分割函数，但结果错误。

mt['COLUMN'].apply(lambda x: x.split('/NAME/')[-1])

This just gave me everything after the /NAME/ part, and in the cases that there was no /NAME/ it returned the full string to me. 这只是给我/ NAME /部分之后的所有内容，并且在没有/ NAME /的情况下，它会将完整的字符串返回给我。

Does anyone have some tips or solutions? 有人有一些技巧或解决方案吗？ Help is much appreciated! 非常感谢帮助！ (the bullets are to make it more readable and are not actually in the data). （项目符号是为了使其更具可读性，并且实际上不在数据中）。

Answer 1

You could use str.extract to extract the pattern of choice, using a regex: 您可以使用正则表达式使用str.extract提取选择的模式：

# Generally, to match all word characters:
df.COLUMN.str.extract('NAME/(\w+)')

OR 要么

# More specifically, to match everything up to the next slash:
df.COLUMN.str.extract('NAME/([^/]*)')

Both of which returns: 两者都返回：

0    iwantthispart
1              NaN
2    iwantthispart
3    iwantthispart

Answer 2

These two lines will give you the second word regardless if the first word is name or not 这两行将为您提供第二个单词，无论第一个单词是否是名称

mt["column"]=mt["column"].str.extract(r"(\w+/\w+/)")
mt["column"].str.extract(r"(\/\w+)")

This will give the following result as a column in pandas dataframe: 作为熊猫数据框中的一列，这将给出以下结果：

/iwantthispart
/SOMEMORETEXT
/iwantthispart
/iwantthispart

and incase you are only interested in the lines that contain NAME this will work for you just fine: 并且如果您只对包含NAME的行感兴趣，那么对您来说就可以了：

mt["column"]=mt["column"].str.extract(r"(\NAME/\w+/)")
mt["column"].str.extract(r"(\/\w+)")

This will give the following result: 这将产生以下结果：

/iwantthispart
/NaN
/iwantthispart
/iwantthispar

在python中的2个字符串之间提取子字符串

问题描述

2 个解决方案

解决方案1
2 2018-07-21 16:02:11

解决方案2
0 2018-07-21 16:16:25

在python中的2个字符串之间提取子字符串

问题描述

2 个解决方案

解决方案1 2 2018-07-21 16:02:11

解决方案2 0 2018-07-21 16:16:25

解决方案1
2 2018-07-21 16:02:11

解决方案2
0 2018-07-21 16:16:25