I've got a dataframe like this one. I'd like to remove the nans and shift up the cells. Then add a date column and set it as index.
ciao google microsoft
Search Volume 368000 NaN NaN
Search Volume 368000 NaN NaN
Search Volume 450000 NaN NaN
Search Volume 450000 NaN NaN
Search Volume 450000 NaN NaN
Search Volume 450000 NaN NaN
Search Volume NaN 37200000 NaN
Search Volume NaN 37200000 NaN
Search Volume NaN 37200000 NaN
Search Volume NaN 37200000 NaN
Search Volume NaN 37200000 NaN
Search Volume NaN 37200000 NaN
Search Volume NaN NaN 135000
Search Volume NaN NaN 135000
Search Volume NaN NaN 110000
Search Volume NaN NaN 110000
Search Volume NaN NaN 110000
Search Volume NaN NaN 110000
The output should be like:
date = ['20140115', '20140215', '20140315', '20140415', '20140515', '20140615']
date ciao google microsoft
20140115 368000 37200000 135000
20140215 368000 37200000 135000
20140315 450000 37200000 110000
20140415 450000 37200000 110000
20140515 450000 37200000 110000
20140615 450000 37200000 110000
Looks simple but I don't know how to do it. Thanks
This should work:
denulled = {col: df.loc[df[col].notnull(),col].values for col in df.columns}
df_out = pd.DataFrame(denulled, index=date)
You can also use dropna on the columns as series
df1=pd.DataFrame(data=[df[i].dropna().values for i in df.columns]).T
df1.index=dates
My proposition is:
pd.DataFrame(data={ colName: df[colName].dropna().values for colName in df.columns },
index=['20140115', '20140215', '20140315', '20140415', '20140515', '20140615'])
The main point is a dictionary comprehension , executed for each column.
dropna removes NaN items and values allows to free oneself from index values.
One tricky solution cause by you have duplicate index
pd.concat([df[x].dropna() for x in df.columns],1)
Out[24]:
ciao google microsoft
SearchVolume 368000.0 37200000.0 135000.0
SearchVolume 368000.0 37200000.0 135000.0
SearchVolume 450000.0 37200000.0 110000.0
SearchVolume 450000.0 37200000.0 110000.0
SearchVolume 450000.0 37200000.0 110000.0
SearchVolume 450000.0 37200000.0 110000.0
you could use apply with dropna:
df = df.apply(lambda x: pd.Series(x.dropna().values)).fillna('')
df['date'] = date
print(df)
output:
ciao google microsoft date
368000.0 37200000.0 135000.0 20140115
368000.0 37200000.0 135000.0 20140215
450000.0 37200000.0 110000.0 20140315
450000.0 37200000.0 110000.0 20140415
450000.0 37200000.0 110000.0 20140515
450000.0 37200000.0 110000.0 20140615
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.