I have a pandas dataframe with multiple columns and I want to "flatten" it to just two columns - one with column name and the other with values. Eg
df1 = pd.DataFrame({'A':[1,2],'B':[2,3], 'C':[3,4]})
How can I convert it to look like:
df2 = pd.DataFrame({'column name': ['A','A','B','B','C','C'], 'value': [1,2,2,3,3,4]})
You can stack
to stack all column values into a single, column, then drop the first level index calling reset_index
, overwrite the column names with the ones you desire and then finally sort using sort_values
:
In [37]:
df2 = df1.stack().reset_index(level=0, drop=True).reset_index()
df2.columns = ['column name', 'value']
df2.sort_values(['column name', 'value'], inplace=True)
df2
Out[37]:
column name value
0 A 1
3 A 2
1 B 2
4 B 3
2 C 3
5 C 4
You can reshape by stack
to MultiIndex
Series
and then reset_index
with sort_values
:
df2 = df1.stack().reset_index(level=0, drop=True).reset_index().sort_values('index')
df2.columns = ['column name','value']
print (df2)
column name value
0 A 1
3 A 2
1 B 2
4 B 3
2 C 3
5 C 4
One row solution with rename
column index
to column name
:
df2 = df1.stack()
.reset_index(level=0, drop=True)
.reset_index(name='value')
.sort_values(['index'])
.rename(columns={'index':'column name'})
print (df2)
column name value
0 A 1
3 A 2
1 B 2
4 B 3
2 C 3
5 C 4
If need sort by both columns:
df2 = df1.stack().reset_index(level=0, drop=True).reset_index().sort_values(['index',0])
df2.columns = ['column name','value']
print (df2)
column name value
0 A 1
3 A 2
1 B 2
4 B 3
2 C 3
5 C 4
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.