Nan in pd.DataFrame (simmetrical matrix)

Question

I've got a dataframe like this one. I'd like to remove the nans and shift up the cells. Then add a date column and set it as index.

                ciao      google    microsoft
Search Volume   368000    NaN       NaN
Search Volume   368000    NaN       NaN
Search Volume   450000    NaN       NaN
Search Volume   450000    NaN       NaN
Search Volume   450000    NaN       NaN
Search Volume   450000    NaN       NaN
Search Volume   NaN       37200000  NaN
Search Volume   NaN       37200000  NaN
Search Volume   NaN       37200000  NaN
Search Volume   NaN       37200000  NaN
Search Volume   NaN       37200000  NaN
Search Volume   NaN       37200000  NaN
Search Volume   NaN       NaN       135000
Search Volume   NaN       NaN       135000
Search Volume   NaN       NaN       110000
Search Volume   NaN       NaN       110000
Search Volume   NaN       NaN       110000
Search Volume   NaN       NaN       110000

The output should be like:

date = ['20140115', '20140215', '20140315', '20140415', '20140515', '20140615']

date        ciao    google      microsoft
20140115    368000  37200000    135000
20140215    368000  37200000    135000
20140315    450000  37200000    110000
20140415    450000  37200000    110000
20140515    450000  37200000    110000
20140615    450000  37200000    110000

Looks simple but I don't know how to do it. Thanks

Answer 1

This should work:

denulled = {col: df.loc[df[col].notnull(),col].values for col in df.columns}

df_out = pd.DataFrame(denulled, index=date)

Answer 2

You can also use dropna on the columns as series

df1=pd.DataFrame(data=[df[i].dropna().values for i in df.columns]).T
df1.index=dates

Answer 3

My proposition is:

pd.DataFrame(data={ colName: df[colName].dropna().values for colName in df.columns },
    index=['20140115', '20140215', '20140315', '20140415', '20140515', '20140615'])

The main point is a dictionary comprehension , executed for each column.

dropna removes NaN items and values allows to free oneself from index values.

Answer 4

One tricky solution cause by you have duplicate index

pd.concat([df[x].dropna() for x in df.columns],1)
Out[24]: 
                  ciao      google  microsoft
SearchVolume  368000.0  37200000.0   135000.0
SearchVolume  368000.0  37200000.0   135000.0
SearchVolume  450000.0  37200000.0   110000.0
SearchVolume  450000.0  37200000.0   110000.0
SearchVolume  450000.0  37200000.0   110000.0
SearchVolume  450000.0  37200000.0   110000.0

Answer 5

you could use apply with dropna:

df = df.apply(lambda x: pd.Series(x.dropna().values)).fillna('')
df['date'] = date
print(df)

output:

     ciao      google   microsoft  date     
 368000.0  37200000.0   135000.0   20140115 
 368000.0  37200000.0   135000.0   20140215 
 450000.0  37200000.0   110000.0   20140315 
 450000.0  37200000.0   110000.0   20140415 
 450000.0  37200000.0   110000.0   20140515 
 450000.0  37200000.0   110000.0   20140615

Nan in pd.DataFrame (simmetrical matrix)

Question

5 answers

solution1
0 2019-03-25 16:54:22

solution2
0 2019-03-25 17:00:39

solution3
0 2019-03-25 17:02:46

solution4
0 2019-03-25 17:03:14

solution5
0 ACCPTED 2019-03-25 17:06:28

Nan in pd.DataFrame (simmetrical matrix)

Question

5 answers

solution1 0 2019-03-25 16:54:22

solution2 0 2019-03-25 17:00:39

solution3 0 2019-03-25 17:02:46

solution4 0 2019-03-25 17:03:14

solution5 0 ACCPTED 2019-03-25 17:06:28

solution1
0 2019-03-25 16:54:22

solution2
0 2019-03-25 17:00:39

solution3
0 2019-03-25 17:02:46

solution4
0 2019-03-25 17:03:14

solution5
0 ACCPTED 2019-03-25 17:06:28