I have a dataframe containing some data, which I want to transform, so that the values of one column define the new columns.
>>> import pandas as pd
>>> df = pd.DataFrame([['a','a','b','b'],[6,7,8,9]]).T
>>> df
A B
0 a 6
1 a 7
2 b 8
3 b 9
The values of the column A
shall be the column names of the new dataframe. The result of the transformation should look like this:
a b
0 6 8
1 7 9
What I came up with so far didn't work completely:
>>> pd.DataFrame({ k : df.loc[df['A'] == k, 'B'] for k in df['A'].unique() })
a b
0 6 NaN
1 7 NaN
2 NaN 8
3 NaN 9
Besides this being incorrect, I guess there probably is a more efficient way anyway. I'm just really having a hard time understanding how to handle things with pandas.
You were almost there but you need the .values
as the list of array and then provide the column names.
pd.DataFrame(pd.DataFrame({ k : df.loc[df['A'] == k, 'B'].values for k in df['A'].unique() }), columns=df['A'].unique())
Output:
a b
0 6 8
1 7 9
Use set_index
, groupby
, cumcount
, and unstack
:
(df.set_index(['A', df.groupby('A').cumcount()])['B']
.unstack(0)
.rename_axis([None], axis=1))
Output:
a b
0 6 8
1 7 9
Using a dictionary comprehension with groupby
:
res = pd.DataFrame({col: vals.loc[:, 1].values for col, vals in df.groupby(0)})
print(res)
a b
0 6 8
1 7 9
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.