I have a dataframe like this:
id other_id_1 other_id_2 other_id_3
1 100 101 102
2 200 201 202
3 300 301 302
I want this:
id other_id
1 100
1 101
1 102
2 200
2 201
2 202
3 300
3 301
3 302
I can get my desired output easily like this:
to_keep = {}
for idx in df.index:
identifier = df.loc[idx]['id']
to_keep[identifier] = []
for col in ['other_id_1', 'other_id_2', 'other_id_3']:
row_val = df.loc[idx][col]
to_keep[identifier].append(row_val)
Which gives me this:
{1: [100, 101, 102], 2: [200, 201, 202], 3: [300, 301, 302]}
I can easily write that to a file. I am struggling to do this in native pandas, however. I would imagine this seeming transposition would be more straightforward, but am struggling...
Well, if you haven't already, set id
as the index:
>>> df
id other_id_1 other_id_2 other_id_3
0 1 100 101 102
1 2 200 201 202
2 3 300 301 302
>>> df.set_index('id', inplace=True)
>>> df
other_id_1 other_id_2 other_id_3
id
1 100 101 102
2 200 201 202
3 300 301 302
Then, you can simply use pd.concat
:
>>> df = pd.concat([df[col] for col in df])
>>> df
id
1 100
2 200
3 300
1 101
2 201
3 301
1 102
2 202
3 302
dtype: int64
And if you need the values sorted:
>>> df.sort_values()
id
1 100
1 101
1 102
2 200
2 201
2 202
3 300
3 301
3 302
dtype: int64
>>>
By using pd.wide_to_long
:
pd.wide_to_long(df,'other_id_',i='id',j='drop').reset_index().drop('drop',axis=1).sort_values('id')
Out[36]:
id other_id_
0 1 100
3 1 101
6 1 102
1 2 200
4 2 201
7 2 202
2 3 300
5 3 301
8 3 302
or unstack
df.set_index('id').unstack().reset_index().drop('level_0',1).rename(columns={0:'other_id'})
Out[43]:
id other_id
0 1 100
1 2 200
2 3 300
3 1 101
4 2 201
5 3 301
6 1 102
7 2 202
8 3 302
If id
isn't the index, set it first:
df = df.set_index('id')
df
other_id_1 other_id_2 other_id_3
id
1 100 101 102
2 200 201 202
3 300 301 302
Now, call the pd.DataFrame
constructor. You'll have to tile the index using np.repeat
.
df_new = pd.DataFrame({'other_id' : df.values.reshape(-1,)},
index=np.repeat(df.index, len(df.columns)))
df_new
other_id
id
1 100
1 101
1 102
2 200
2 201
2 202
3 300
3 301
3 302
One more (or rather two):)
pd.melt(df, id_vars='id', value_vars=['other_id_1', 'other_id_2', 'other_id_3'], value_name='other_id')\
.drop('variable', 1).sort_values(by = 'id')
Option 2:
df.set_index('id').stack().reset_index(1,drop = True).reset_index()\
.rename(columns = {0:'other_id'})
Both ways you get
id other_id
0 1 100
1 1 101
2 1 102
3 2 200
4 2 201
5 2 202
6 3 300
7 3 301
8 3 302
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.