简体   繁体   English

无论正则表达式标志如何,str.contains() 都会返回错误的答案

[英]str.contains() returns the wrong answer regardless of regex flag

I have a pandas series with dtype object that contains curency strings, like '$ 50000'.我有一个包含货币字符串的 dtype 对象的熊猫系列,例如“$ 50000”。 Not all of them are in dollars, some for example are 'GBP 43000' etc. I am trying to use Pandas to figure out which contain $ and which don't.并非所有的都是美元,例如一些是“GBP 43000”等。我正在尝试使用 Pandas 来确定哪些包含 $ 哪些不包含。

The series is called valid_usa_gross .该系列称为valid_usa_gross To find out which rows in it contain $, first I tried this要找出其中的哪些行包含 $,首先我尝试了这个

valid_usa_gross_USD = valid_usa_gross.str.contains('$')

the series it returned had True everywhere, so I assumed all rows had $, but I was wrong.它返回的系列到处都是True ,所以我假设所有行都有 $,但我错了。 By inspecting the CSV file, I found entries with different currencies.通过检查 CSV 文件,我发现了不同货币的条目。 I then discovered that, unless I specify regex=False , the string I pass to contains() will be interpreted as regex, and God knows what $ may mean in that case.然后我发现,除非我指定regex=False ,否则我传递给contains()的字符串将被解释为正则表达式,天知道 $ 在这种情况下可能意味着什么。 So I tried所以我试过了

valid_usa_gross_USD = valid_usa_gross.str.contains('$', regex=False)

which resulted in a series with False everywhere.这导致了一个到处都是False的系列。 That is also incorrect, because the vast majority of the entries in valid_usa_gross_USD do contain a $ symbol.这也是不正确的,因为valid_usa_gross_USD的绝大多数条目确实包含 $ 符号。 I even tried to escape the $ or add an 'r' in front of the search string, the result is always wrong (either all True or all False ).我什至试图转义 $ 或在搜索字符串前添加一个 'r',结果总是错误的(要么全为True要么全为False )。

What am I doing wrong?我究竟做错了什么?

Looks like the problem was the encoding.看起来问题出在编码上。 I was reading the whole CSV as我正在阅读整个 CSV 作为

imdb_raw = pd.read_csv(dataset_path)

and that didn't throw any errors, so I had no reason to suspect anything.这并没有抛出任何错误,所以我没有理由怀疑任何事情。 However, reading it as但是,将其读为

imdb_raw = pd.read_csv(dataset_path, encoding='ISO-8859-1)

results in the $ correctly displayed.导致 $ 正确显示。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM