将数据框中的多个列展平为单个列

Question

我有这样的数据帧：

id    other_id_1    other_id_2    other_id_3
1     100           101           102
2     200           201           202
3     300           301           302

我要这个：

id    other_id
1     100
1     101
1     102
2     200
2     201
2     202
3     300
3     301
3     302

我可以像这样轻松地获得我想要的输出：

to_keep = {}
for idx in df.index:
    identifier = df.loc[idx]['id']
    to_keep[identifier] = []
    for col in ['other_id_1', 'other_id_2', 'other_id_3']:
        row_val = df.loc[idx][col]
        to_keep[identifier].append(row_val)

这给了我这个：

{1: [100, 101, 102], 2: [200, 201, 202], 3: [300, 301, 302]}

我可以轻松地将其写入文件。 然而，我正在努力在本地熊猫中做到这一点。 我会想象这种看似转换会更直接，但我正在努力......

Answer 1

好吧，如果你还没有，请将id设置为索引：

>>> df
   id  other_id_1  other_id_2  other_id_3
0   1         100         101         102
1   2         200         201         202
2   3         300         301         302
>>> df.set_index('id', inplace=True)
>>> df
    other_id_1  other_id_2  other_id_3
id
1          100         101         102
2          200         201         202
3          300         301         302

然后，您可以简单地使用pd.concat ：

>>> df = pd.concat([df[col] for col in df])
>>> df
id
1    100
2    200
3    300
1    101
2    201
3    301
1    102
2    202
3    302
dtype: int64

如果您需要排序的值：

>>> df.sort_values()
id
1    100
1    101
1    102
2    200
2    201
2    202
3    300
3    301
3    302
dtype: int64
>>>

Answer 2

使用pd.wide_to_long ：

pd.wide_to_long(df,'other_id_',i='id',j='drop').reset_index().drop('drop',axis=1).sort_values('id')
    Out[36]: 
       id  other_id_
    0   1        100
    3   1        101
    6   1        102
    1   2        200
    4   2        201
    7   2        202
    2   3        300
    5   3        301
    8   3        302

或者unstack

df.set_index('id').unstack().reset_index().drop('level_0',1).rename(columns={0:'other_id'})

Out[43]: 
   id  other_id
0   1       100
1   2       200
2   3       300
3   1       101
4   2       201
5   3       301
6   1       102
7   2       202
8   3       302

Answer 3

如果id不是索引，请先将其设置为：

df = df.set_index('id')

df

    other_id_1  other_id_2  other_id_3
id                                    
1          100         101         102
2          200         201         202
3          300         301         302

现在，调用pd.DataFrame构造函数。 您必须使用np.repeat平铺索引。

df_new = pd.DataFrame({'other_id' : df.values.reshape(-1,)}, 
                         index=np.repeat(df.index, len(df.columns)))
df_new

    other_id
id          
1        100
1        101
1        102
2        200
2        201
2        202
3        300
3        301
3        302

Answer 4

一个（或更确切地说两个）:)

pd.melt(df, id_vars='id', value_vars=['other_id_1', 'other_id_2', 'other_id_3'], value_name='other_id')\
.drop('variable', 1).sort_values(by = 'id')

选项2：

df.set_index('id').stack().reset_index(1,drop = True).reset_index()\ 
.rename(columns = {0:'other_id'})

两种方式你得到

    id  other_id
0   1   100
1   1   101
2   1   102
3   2   200
4   2   201
5   2   202
6   3   300
7   3   301
8   3   302

将数据框中的多个列展平为单个列

问题描述

4 个解决方案

解决方案1
3 已采纳 2017-09-26 20:37:34

解决方案2
3 2017-09-26 20:42:14

解决方案3
1 2017-09-26 20:38:33

解决方案4
1 2017-09-26 20:49:48

将数据框中的多个列展平为单个列

问题描述

4 个解决方案

解决方案1 3 已采纳 2017-09-26 20:37:34

解决方案2 3 2017-09-26 20:42:14

解决方案3 1 2017-09-26 20:38:33

解决方案4 1 2017-09-26 20:49:48

解决方案1
3 已采纳 2017-09-26 20:37:34

解决方案2
3 2017-09-26 20:42:14

解决方案3
1 2017-09-26 20:38:33

解决方案4
1 2017-09-26 20:49:48