How to find the value_count of a specific string in each column of the dataframe

Question

I want to find the number of times that the string '\N' appearing in each column of the dataframe df .

I've tried this:

for col in df.columns: 
   print(df[col].value_counts()['\N'])

And the system returns the error like

unicode error unicode cannot decode in the position 0-1

Do you know how to solve it?

Answer 1

The backslash () character is used to escape characters that otherwise have a special meaning, such as newline, backslash itself, or the quote character ( see python lexical analysis )

Assume this df:

    a   b
0  \N   1
1  \N   4
2   K  \N

Using your code will yield:

for col in df.columns:    
    print(df[col].value_counts()['\N'])

  File "<ipython-input-83-64eb7c05f66f>", line 2
    print(df[col].value_counts()['\N'])
                                ^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: malformed \N character escape

If you add an extra backlash, you will get:

for col in df.columns:    
          print(f"{col} has",df[col].value_counts()['\\N']," \\N in it")

a has 2  \N in it
b has 1  \N in it

You can also see this clearly if you use df.to_dict() :

>>> df.to_dict()
Out[901]: {'a': {0: '\\N', 1: '\\N', 2: 'K'}, 'b': {0: '1', 1: '4', 2: '\\N'}}
                      ^         ^                                         ^

How to find the value_count of a specific string in each column of the dataframe

Question

1 answers

solution1
1 2021-05-20 10:41:59

How to find the value_count of a specific string in each column of the dataframe

Question

1 answers

solution1 1 2021-05-20 10:41:59

solution1
1 2021-05-20 10:41:59