使用Pandas将列重整为行

Question

i have a df 我有一个df

id    name   value
1      abc     10
1      qwe     23
1      zxc     12
2      sdf     10
2      wed     23
2      abc     12
2      mnb     11

i want to reshape this dataframe into: 我想将此数据框重塑为：

id    n1    n2    n3    n4
 1    abc   qwe   zxc    0
 2    sdf   wed   abc   mnb

we can see that there are 3 rows for id=1 and 4 rows for id=2. 我们可以看到id = 1的行有3行，id = 2的行有4行。 Therefor replace the last column n4=0 incase of such happenings. 因此，在发生这种情况时，请替换最后一列n4 = 0。

This is test dataframe, it may happen that, for a id there might by 1-2 rows too. 这是测试数据帧，可能会发生，对于一个id可能也有1-2行。

it is something like we do in R - dcast. 就像我们在R-dcast中所做的一样。 how can we do this in pandas? 我们如何在熊猫中做到这一点？

Answer 1

Possibly Overkill 可能过度杀伤

f, u = pd.factorize(df.id.values)
b = np.bincount(f)
n, m = u.size, b.max()
c = np.arange(f.size) - np.arange(n).repeat(b) * (m - 1)

v = np.zeros((n, m), dtype=object)
v[f, c] = df.name.values

pd.DataFrame(
    v, pd.Index(u, name='id'),
    ['n{}'.format(i) for i in range(1, m + 1)]
).reset_index()

   id   n1   n2   n3   n4
0   1  abc  qwe  zxc    0
1   2  sdf  wed  abc  mnb

Answer 2

You could go the str route and use some regex replacement and splitting after the groupby . 您可以走str路线，在groupby之后使用一些正则表达式替换和拆分。

df.groupby('id').name.apply(lambda x: str(list(x)))\
          .str.replace("[\[\],']", "")\
          .str.split(expand=True).fillna(0)\
          .rename(columns = lambda x: 'n{}'.format(x + 1))

     n1   n2   n3   n4
id                    
1   abc  qwe  zxc    0
2   sdf  wed  abc  mnb

Answer 3

You can use set_index with cumcount for counts per groups for new columns names and reshape by unstack , last rename columns: 您可以将set_index与cumcount一起用于新列名称的每组计数，并通过unstack ，last重命名列进行重塑：

df = (df.set_index(['id', df.groupby('id').cumcount()])['name']
       .unstack(fill_value=0)
       .rename(columns = lambda x: 'n{}'.format(x + 1))
       .reset_index())
print (df)
   id   n1   n2   n3   n4
0   1  abc  qwe  zxc    0
1   2  sdf  wed  abc  mnb

Solution with DataFrame constructor, is necessary no NaN values in original data: 使用DataFrame构造函数的解决方案，必须在原始数据中没有NaN值：

df1 = df.groupby('id')['name'].apply(list)
print (df1)
id
1         [abc, qwe, zxc]
2    [sdf, wed, abc, mnb]
Name: name, dtype: object

df = (pd.DataFrame(df1.values.tolist(), index=df1.index)
        .fillna(0)
        .rename(columns = lambda x: 'n{}'.format(x + 1))
        .reset_index())
print (df)
   id   n1   n2   n3   n4
0   1  abc  qwe  zxc    0
1   2  sdf  wed  abc  mnb

And solution with GroupBy.apply and Series constructor: 以及使用GroupBy.apply和Series构造函数的解决方案：

df1 = (df.groupby('id')['name'].apply(lambda x: pd.Series(x.values, index=range(1,len(x)+1)))
        .unstack(fill_value=0)
        .add_prefix('n')
        .reset_index())
print (df1)

   id   n1   n2   n3   n4
0   1  abc  qwe  zxc    0
1   2  sdf  wed  abc  mnb

Answer 4

By using dfply package it is possible to do like R's dcast . 通过使用dfply包，可以像R的dcast一样进行操作。

# for Python3 only
pip install dfply

Use the spread function of dfply . 使用dfply的spread功能。

import pandas as pd
from io import StringIO
from dfply import *

csv = StringIO("""id,name,value
1,abc,10
1,qwe,23
1,zxc,12
2,sdf,10
2,wed,23
2,abc,12
2,mnb,11""")
df = pd.read_csv(csv)

df['sequence'] = df.groupby('id').cumcount()
df = df[["id", "sequence", "name"]] >> spread(X.sequence, X.name)
df = df.set_index("id").fillna(0).rename(columns = lambda x: 'n{}'.format(x + 1)).reset_index()
print(df)
#    id   n1   n2   n3   n4
# 0   1  abc  qwe  zxc    0
# 1   2  sdf  wed  abc  mnb

使用Pandas将列重整为行

问题描述

4 个解决方案

解决方案1
2 2017-09-26 08:03:35

解决方案2
2 2017-09-26 08:29:36

解决方案3
1 已采纳 2017-09-26 07:41:53

解决方案4
1 2017-09-26 08:37:19

使用Pandas将列重整为行

问题描述

4 个解决方案

解决方案1 2 2017-09-26 08:03:35

解决方案2 2 2017-09-26 08:29:36

解决方案3 1 已采纳 2017-09-26 07:41:53

解决方案4 1 2017-09-26 08:37:19

解决方案1
2 2017-09-26 08:03:35

解决方案2
2 2017-09-26 08:29:36

解决方案3
1 已采纳 2017-09-26 07:41:53

解决方案4
1 2017-09-26 08:37:19