简体   繁体   中英

Nan in pd.DataFrame (simmetrical matrix)

I've got a dataframe like this one. I'd like to remove the nans and shift up the cells. Then add a date column and set it as index.

                ciao      google    microsoft
Search Volume   368000    NaN       NaN
Search Volume   368000    NaN       NaN
Search Volume   450000    NaN       NaN
Search Volume   450000    NaN       NaN
Search Volume   450000    NaN       NaN
Search Volume   450000    NaN       NaN
Search Volume   NaN       37200000  NaN
Search Volume   NaN       37200000  NaN
Search Volume   NaN       37200000  NaN
Search Volume   NaN       37200000  NaN
Search Volume   NaN       37200000  NaN
Search Volume   NaN       37200000  NaN
Search Volume   NaN       NaN       135000
Search Volume   NaN       NaN       135000
Search Volume   NaN       NaN       110000
Search Volume   NaN       NaN       110000
Search Volume   NaN       NaN       110000
Search Volume   NaN       NaN       110000

The output should be like:

date = ['20140115', '20140215', '20140315', '20140415', '20140515', '20140615']

date        ciao    google      microsoft
20140115    368000  37200000    135000
20140215    368000  37200000    135000
20140315    450000  37200000    110000
20140415    450000  37200000    110000
20140515    450000  37200000    110000
20140615    450000  37200000    110000

Looks simple but I don't know how to do it. Thanks

This should work:

denulled = {col: df.loc[df[col].notnull(),col].values for col in df.columns}

df_out = pd.DataFrame(denulled, index=date)

You can also use dropna on the columns as series

df1=pd.DataFrame(data=[df[i].dropna().values for i in df.columns]).T
df1.index=dates

My proposition is:

pd.DataFrame(data={ colName: df[colName].dropna().values for colName in df.columns },
    index=['20140115', '20140215', '20140315', '20140415', '20140515', '20140615'])

The main point is a dictionary comprehension , executed for each column.

dropna removes NaN items and values allows to free oneself from index values.

One tricky solution cause by you have duplicate index

pd.concat([df[x].dropna() for x in df.columns],1)
Out[24]: 
                  ciao      google  microsoft
SearchVolume  368000.0  37200000.0   135000.0
SearchVolume  368000.0  37200000.0   135000.0
SearchVolume  450000.0  37200000.0   110000.0
SearchVolume  450000.0  37200000.0   110000.0
SearchVolume  450000.0  37200000.0   110000.0
SearchVolume  450000.0  37200000.0   110000.0

you could use apply with dropna:

df = df.apply(lambda x: pd.Series(x.dropna().values)).fillna('')
df['date'] = date
print(df)

output:

     ciao      google   microsoft  date     
 368000.0  37200000.0   135000.0   20140115 
 368000.0  37200000.0   135000.0   20140215 
 450000.0  37200000.0   110000.0   20140315 
 450000.0  37200000.0   110000.0   20140415 
 450000.0  37200000.0   110000.0   20140515 
 450000.0  37200000.0   110000.0   20140615 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM