简体   繁体   English

在数据框中查找和替换半常见字符串?

[英]Find and replace semi-common strings in dataframe?

I am attempting to find a semi-common occurring string and remove all other data in the column.我试图找到一个半常见的字符串并删除列中的所有其他数据。 Pandas and Re have been imported. Pandas 和 Re 已被导入。 For instance, I have dataframe...例如,我有数据框...

>>>df
COLUMN COUNT   DATA
           1   this row RA-123: data 8b43a
           2   here RA-5372: data 94h63c

I need to keep just the RA-'number that follows' and remove everything before and after.我只需要保留 RA-'后面的数字'并删除之前和之后的所有内容。 The numbers that follow are not always the same length and the 'RA-' string does not always occur in the same position.后面的数字并不总是相同的长度,并且“RA-”字符串并不总是出现在相同的位置。 There is a colon after every instance that can be used as a delimiter.每个实例后面都有一个可用作分隔符的冒号。

I tried this (a friend wrote the regex search piece for me because I am not familiar with it).我试过这个(一个朋友为我写了正则表达式搜索片,因为我不熟悉它)。

df.assign(DATA= df['DATA'].str.extract(re.search('RA[^:]+')))

But python returned但是蟒蛇回来了

TypeError: search() missing 1 required positional argument: 'string'

What am I missing here?我在这里缺少什么? Thanks in advance!提前致谢!

You should use acapturing group with extract:您应该使用带有提取物的捕获组:

df['DATA'].str.extract(r'(RA-\d+)')

Here, (RA-\\d+) is a capturing group matching RA , then a hyphen and then one or more digits.这里, (RA-\\d+)是一个匹配RA的捕获组,然后是一个连字符,然后是一个或多个数字。

You may use your own pattern, but you still need to wrap it with capturing parentheses, r'(RA[^:]+)' .您可以使用自己的模式,但您仍然需要使用捕获括号r'(RA[^:]+)'将其包裹起来。

Looking at the docs , you don't need the re.search method.查看docs ,您不需要re.search方法。 You just call df[DATA] = df['DATA'].str.extract(r'RA[^:]+'))你只需调用df[DATA] = df['DATA'].str.extract(r'RA[^:]+'))

As I mentioned earlier, no need for re here.正如我之前提到的,这里不需要re

Other answers addressed well how to use extract directly.其他答案很好地解决了如何直接使用extract However, to answer your specificly, if you really want to use re , the way to go is to use re.compile instead of re.search .但是,要具体回答您的问题,如果您真的想使用re ,那么要走的路是使用re.compile而不是re.search

df.assign(DATA= df['DATA'].str.extract(re.compile(regex_str)))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM