简体   繁体   English

应用 function 不适用于数据框列

[英]Apply function is not working on a data-frame column

I am trying to remove special characters like ",",".","-"(except comma) from the "Actors" column of my pandas data-frame.我正在尝试从我的 pandas 数据帧的“演员”列中删除特殊字符,如“,”,“。”,“-”(逗号除外)。 For this I use the apply method on the "Actors" column为此,我使用“演员”列上的应用方法

df['Actors']= df['Actors'].apply(lambda x : x.lower().replace("[^a-zA-Z,]","",)
df['Actors'].head()

The output of the above snippet is shown below and we can see no special characters have been replaced:上面代码片段的 output 如下所示,我们可以看到没有特殊字符被替换:

1    tim robbins, morgan freeman, bob gunton, willi...
2    marlon brando, al pacino, james caan, richard ...
3    al pacino, robert duvall, diane keaton, robert...
4    christian bale, heath ledger, aaron eckhart, m...
5    martin balsam, john fiedler, lee j. cobb, e.g....
Name: Actors, dtype: object

But when I try resolving the above issue using the snippet below, the code works:但是当我尝试使用下面的代码片段解决上述问题时,代码有效:

df['Actors'] = df['Actors'].str.lower().str.replace("[^a-zA-Z,]","")
df['Actors'].head()

1    timrobbins,morganfreeman,bobgunton,williamsadler
2    marlonbrando,alpacino,jamescaan,richardscastel...
3    alpacino,robertduvall,dianekeaton,robertdeniro
4    christianbale,heathledger,aaroneckhart,michael...
5    martinbalsam,johnfiedler,leejcobb,egmarshall
Name: Actors, dtype: object

I want to know what is it with the apply function that it doesn't work properly while replacing characters?我想知道apply function在替换字符时不能正常工作是怎么回事?

You call apply on series, so x in the lambda is a single string of each row of the series.您在系列上调用apply ,因此 lambda 中的x是系列每一行的单个字符串。 So, x.lower().replace is python replace .所以, x.lower().replace是 python replace Python replace doesn't support regex. Python replace不支持正则表达式。 so it considers "[^a-zA-Z,]" as a whole string and it looks for that substring in each x .因此它将"[^a-zA-Z,]"视为一个完整的字符串,并在每个x中查找 substring 。 It couldn't find it so nothing got replaced.它找不到它,所以什么都没有被替换。

On the other hand, Pandas str.replace default option is regex=True , so it considers "[^a-zA-Z,]" as a regex pattern and replaces everything properly另一方面, Pandas str.replace默认选项是regex=True ,因此它认为"[^a-zA-Z,]"作为正则表达式模式并正确替换所有内容

It does not work because you do a replace on a string, formally you do str.replace("[^a-zA-Z,]","",) .它不起作用,因为您对字符串进行了替换,正式地执行str.replace("[^a-zA-Z,]","",) Your sting do not contain those characters [^a-zA-Z,] so nothing is removed.您的刺痛不包含这些字符[^a-zA-Z,]因此不会删除任何内容。 If you prefer, python do interpret those characters as regex, but simply as string elements.如果您愿意,python 会将这些字符解释为正则表达式,而只是作为字符串元素。

To work you should do it like this, it's just to answer your question because the preferred way to do it is with your second exemple.要工作,您应该这样做,这只是为了回答您的问题,因为首选的方法是使用您的第二个示例。

remove = re.compile(r"[^a-zA-Z,]")
df['Actors']= df['Actors'].apply(lambda x : re.sub(remove, "", x.lower()))

Herw are some documentation: Herw 是一些文档:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM