在数据框中查找和替换半常见字符串？

Question

I am attempting to find a semi-common occurring string and remove all other data in the column.我试图找到一个半常见的字符串并删除列中的所有其他数据。 Pandas and Re have been imported. Pandas 和 Re 已被导入。 For instance, I have dataframe...例如，我有数据框...

>>>df
COLUMN COUNT   DATA
           1   this row RA-123: data 8b43a
           2   here RA-5372: data 94h63c

I need to keep just the RA-'number that follows' and remove everything before and after.我只需要保留 RA-'后面的数字'并删除之前和之后的所有内容。 The numbers that follow are not always the same length and the 'RA-' string does not always occur in the same position.后面的数字并不总是相同的长度，并且“RA-”字符串并不总是出现在相同的位置。 There is a colon after every instance that can be used as a delimiter.每个实例后面都有一个可用作分隔符的冒号。

I tried this (a friend wrote the regex search piece for me because I am not familiar with it).我试过这个（一个朋友为我写了正则表达式搜索片，因为我不熟悉它）。

df.assign(DATA= df['DATA'].str.extract(re.search('RA[^:]+')))

But python returned但是蟒蛇回来了

TypeError: search() missing 1 required positional argument: 'string'

What am I missing here?我在这里缺少什么？ Thanks in advance!提前致谢！

Answer 1

You should use acapturing group with extract:您应该使用带有提取物的捕获组：

df['DATA'].str.extract(r'(RA-\d+)')

Here, (RA-\\d+) is a capturing group matching RA , then a hyphen and then one or more digits.这里， (RA-\\d+)是一个匹配RA的捕获组，然后是一个连字符，然后是一个或多个数字。

You may use your own pattern, but you still need to wrap it with capturing parentheses, r'(RA[^:]+)' .您可以使用自己的模式，但您仍然需要使用捕获括号r'(RA[^:]+)'将其包裹起来。

Answer 2

Looking at the docs , you don't need the re.search method.查看docs ，您不需要re.search方法。 You just call df[DATA] = df['DATA'].str.extract(r'RA[^:]+'))你只需调用df[DATA] = df['DATA'].str.extract(r'RA[^:]+'))

Answer 3

As I mentioned earlier, no need for re here.正如我之前提到的，这里不需要re 。

Other answers addressed well how to use extract directly.其他答案很好地解决了如何直接使用extract 。 However, to answer your specificly, if you really want to use re , the way to go is to use re.compile instead of re.search .但是，要具体回答您的问题，如果您真的想使用re ，那么要走的路是使用re.compile而不是re.search 。

df.assign(DATA= df['DATA'].str.extract(re.compile(regex_str)))

在数据框中查找和替换半常见字符串？

问题描述

3 个解决方案

解决方案1
3 已采纳 2019-04-08 15:37:32

解决方案2
1 2019-04-08 15:36:59

解决方案3
0 2019-04-08 15:41:17

在数据框中查找和替换半常见字符串？

问题描述

3 个解决方案

解决方案1 3 已采纳 2019-04-08 15:37:32

解决方案2 1 2019-04-08 15:36:59

解决方案3 0 2019-04-08 15:41:17

解决方案1
3 已采纳 2019-04-08 15:37:32

解决方案2
1 2019-04-08 15:36:59

解决方案3
0 2019-04-08 15:41:17