简体   繁体   中英

Transform pd.DataFrame in a list of lists while replacing 'np.nan' values for empty text ' '

I made a function that transforms my pandas DF into a list of lists, so I can use it to interact with Google API, that is why it has to be a list of lists.

The issue I am having here is that I have a lot of np.nan values that I would like to replace for blank/empty spaces before they get transformed into a list. Basically because when they get into the list they are replaced for 'nan' strings.

I only want to get rid of those np.nan , and keep all the rest of the data intact

This is the function where I am transforming the DF into list:

def updated_values_list(df):
    updated_values = df.T.reset_index().values.T.tolist()
    return [[str(j) for j in i] for i in updated_values]

list = updated_values_list(df)

Actual outputs:

>> list[0]
['header1', 'header2' ... 'headern'] # this one is ok
>> list[1]
['val1', 'val2', 'nan', 'nan', ...] # my actual output

The expected output

>> list[1]
['val1', 'val2', '', '', ...] # the output I want

You can use the method df.fillna . Since your data seems to be numeric and you want to replace it by a string, you can do the following.

df = df.astype(object).fillna('')

You shoud run this as the first command inside your function, before converting the whole dataframe to your list of lists.

Pandas methods can be very slow for simple tasks because of unnecessary overhead. As we can see: -

df = pd.DataFrame({'a':[6.5]*30000 + [np.nan]*30000, 'b':[6.5]*30000 + [np.nan]*30000})

def solution1(df):
    updated_values = df.astype(object).fillna('').T.reset_index().values.T.tolist()
    return [[str(j) for j in i] for i in updated_values]

def solution2(df):
    updated_values = df.T.reset_index().values.T.tolist()
    return [[str(j) if not (not isinstance(j, str) and np.isnan(j)) else '' for j in i] for i in updated_values]

%timeit solution1(df)

1.92 s ± 96.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit solution2(df)

284 ms ± 23 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM