[英]How to find the value_count of a specific string in each column of the dataframe
I want to find the number of times that the string '\N'
appearing in each column of the dataframe df .我想找到字符串
'\N'
出现在 dataframe df的每一列中的次数。
I've tried this:我试过这个:
for col in df.columns:
print(df[col].value_counts()['\N'])
And the system returns the error like并且系统返回错误,例如
unicode error unicode cannot decode in the position 0-1
unicode 错误 unicode 无法在 position 中解码 0-1
Do you know how to solve it?你知道怎么解决吗?
The backslash () character is used to escape characters that otherwise have a special meaning, such as newline, backslash itself, or the quote character ( see python lexical analysis )反斜杠 () 字符用于转义具有特殊含义的字符,例如换行符、反斜杠本身或引号字符( 参见 python 词法分析)
Assume this df:假设这个df:
a b
0 \N 1
1 \N 4
2 K \N
Using your code will yield:使用您的代码将产生:
for col in df.columns:
print(df[col].value_counts()['\N'])
File "<ipython-input-83-64eb7c05f66f>", line 2
print(df[col].value_counts()['\N'])
^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: malformed \N character escape
If you add an extra backlash, you will get:如果添加额外的反冲,您将获得:
for col in df.columns:
print(f"{col} has",df[col].value_counts()['\\N']," \\N in it")
a has 2 \N in it
b has 1 \N in it
You can also see this clearly if you use df.to_dict()
:如果您使用
df.to_dict()
,您也可以清楚地看到这一点:
>>> df.to_dict()
Out[901]: {'a': {0: '\\N', 1: '\\N', 2: 'K'}, 'b': {0: '1', 1: '4', 2: '\\N'}}
^ ^ ^
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.