简体   繁体   中英

Remove 'nan' from Dictionary of list

My data contain columns with empty rows that are read by pandas as nan . I want to create a dictionary of list from this data. However, some list contains nan and I want to remove it.

If I use dropna() in data.dropna().to_dict(orient='list') , this will remove all the rows that contains at least one nan , thefore I lose data.

Col1 Col2  Col3
a     x     r
b     y     v
c           x
            z



data = pd.read_csv(sys.argv[2], sep = ',')
dict = data.to_dict(orient='list')

Current output:
dict = {Col1: ['a','b','c',nan], Col2: ['x', 'y',nan,nan], Col3: ['r', 'v', 'x', 'z']}

Desire Output:
dict = {Col1: ['a','b','c'], Col2: ['x', 'y'], Col3: ['r', 'v', 'x', 'z']}

My goal: get the dictionary of a list, with nan remove from the list.

Not sure exactly the format you're expecting, but you can use list comprehension and itertuples to do this.

First create some data.

import pandas as pd
import numpy as np

data = pd.DataFrame.from_dict({'Col1': (1, 2, 3), 'Col2': (4, 5, 6), 'Col3': (7, 8, np.nan)})
print(data)

Giving a data frame of:

   Col1  Col2  Col3
0     1     4   7.0
1     2     5   8.0
2     3     6   NaN

And then we create the dictionary using the iterator.

dict_1 = {x[0]: [y for y in x[1:] if not pd.isna(y)] for x in data.itertuples(index=True) }

print(dict_1)
>>>{0: [1, 4, 7.0], 1: [2, 5, 8.0], 2: [3, 6]}

To do the same for the columns is even easier:

dict_2 = {data[column].name: [y for y in data[column] if not pd.isna(y)] for column in data}

print(dict_2)
>>>{'Col1': [1, 2, 3], 'Col2': [4, 5, 6], 'Col3': [7.0, 8.0]}

I am not sure if I understand your question correctly, but if I do and what you want is to replace the nan with a value so as not to lose your data then what you are looking for is pandas.DataFrame.fillna function. You mentioned the original value is an empty row, so filling the nan with data.fillna('') which fills it with empty string.

EDIT : After providing the desired output, the answer to your question changes a bit. What you'll need to do is to use dict comprehension with list comprehension to build said dictionary, looping by column and filtering nan . I see that Andrew already provided the code to do this in his answer so have a look there.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM