简体   繁体   中英

Pandas fillna exception for 'NA' string

sample_file.txt

6|test|3|4
5|test||8
9|test|NA|12

Script

import pandas as pd
df = pd.read_csv('sample_file.txt', dtype='str', sep='|', names=['upc_cd', 'chr_typ', 'chr_vl','chr_vl_typ'])
df["chr_vl"].fillna("NOT AVLBL", inplace = True)
print(df)

Current output

upc_cd chr_typ     chr_vl chr_vl_typ
0      6    test          3          4
1      5    test  NOT AVLBL          8
2      9    test  NOT AVLBL         12

Required output

upc_cd chr_typ     chr_vl chr_vl_typ
0      6    test          3          4
1      5    test  NOT AVLBL          8
2      9    test         NA         12

Basically I need NA as it is in the output same time it should replace null values with the specific text 'NOT AVLBL' Tried replace method as well, but couldn't get the desired output

Pandas read_csv functiomn already defines a set of strings that will be interpreted as NaNs when you load a csv file. Here you have the option to either extend that list with other strings or to also completely overwrite it. In your case you have to overwrite it, as NA is one of the default values used by pandas. To do so, you could try something like

df = pd.read_csv('sample_file.txt', dtype='str', sep='|',
                 names=['upc_cd', 'chr_typ', 'chr_vl','chr_vl_typ'],
                 na_values=[''], keep_default_na=False)
...

This will only interpret the empty string as NA as we have set keep_default_na to False and have only given '' as a NA value with na_values argument. If you want to learn more, have a look at the pandas docs .

Pandas read_csv is a bit too clever here. The problem is that many strings are commonly used to identify missing values in CSV files.

According to official documentation

... By default the following values are interpreted as NaN: '', '#N/A', '#N/AN/A', '#NA', '-1.#IND', '-1.#QNAN', '-NaN', '-nan', '1.#IND', '1.#QNAN', '', 'N/A', 'NA', 'NULL', 'NaN', 'n/a', 'nan', 'null'.

So your dataframe does contain an NaN and fillna normally fills it.

To only accept the empty string as NaN, you have to both set na_values to '' and keep_default_na to false:

df = pd.read_csv('sample_file.txt', dtype='str', sep='|',
                 names=['upc_cd', 'chr_typ', 'chr_vl','chr_vl_typ'],
                 na_values='', keep_default_na=False)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM