如何在 pandas datafrmae 中每 n 行展平

Question

I would like to flat every n rows in a pandas dataframe, eg, if n=2 then an example would be like我想在 pandas dataframe 中每隔n行展平，例如，如果n=2 ，那么一个例子就像

df = pd.DataFrame([[1,2], [3,4], [5,6], [7,8], [9,10], [11,12]])
df.columns = ['a', 'b']
target_df =  pd.DataFrame([[1,2, 3, 4], [5,6, 7, 8], [9,10, 11, 12]])
target_df.columns = ['a1', 'b1', 'a2', 'b2']
print(df, '\n\n', target_df)

    a   b
0   1   2
1   3   4
2   5   6
3   7   8
4   9  10
5  11  12 

    a1  b1  a2  b2
0   1   2   3   4
1   5   6   7   8
2   9  10  11  12

Is there any fast way to do so?有什么快速的方法吗？ Note that the length of the dataframe and n could be arbitrarily large, so hardcoding n is not a good option.请注意，dataframe 和n的长度可以任意大，因此硬编码n不是一个好的选择。

Answer 1

First we get all even and uneven index numbers.首先，我们得到所有偶数和奇数索引号。 Then we select these with loc and concat them over axis=1 :然后我们concat将它们与loc并在axis=1上连接它们：

grp1 = df.index%2 == 0 # uneven index
grp2  = df.index%2 == 1 # even index

df = pd.concat([
    df.loc[grp1].reset_index(drop=True), df.loc[grp2].reset_index(drop=True)
], axis=1)

   a   b   a   b
0  1   2   3   4
1  5   6   7   8
2  9  10  11  12

Answer 2

You can use numpy hstack,您可以使用 numpy hstack，

simple solution:简单的解决方案：

n = 2
np.hstack((df.values[::n],df.values[1::n]))

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

Convert the above to dataframe using,将上述转换为 dataframe 使用，

n = 2
pd.DataFrame(np.hstack((df.values[::n],df.values[1::n])))

    0   1   2   3
0   1   2   3   4
1   5   6   7   8
2   9   10  11  12

If you want to handle variable n, try如果要处理变量 n，请尝试

n = 3
l = []
for i in range(n):
    l.append(df.values[i::n])

pd.DataFrame(np.hstack((l)))

Note: This still requires n to be a factor of len(df)注意：这仍然需要 n 是 len(df) 的一个因子

Answer 3

Construct multiindex and assign to index and unstack构造多索引并分配给索引和unstack

n = 2
iix = pd.MultiIndex.from_arrays([np.arange(df.shape[0]) // n, 
                                (np.arange(df.shape[0]) % n)+1])

df1 = df.set_index(iix).unstack().sort_index(level=1, axis=1)

Out[211]:
   a   b   a   b
   1   1   2   2
0  1   2   3   4
1  5   6   7   8
2  9  10  11  12

If you don't want multiindex columns, you may flatten it如果您不想要多索引列，则可以将其展平

df1.columns = df1.columns.map('{0[0]}{0[1]}'.format)

Out[213]:
   a1  b1  a2  b2
0   1   2   3   4
1   5   6   7   8
2   9  10  11  12

If you want other n , change value of n如果您想要其他n ，请更改n的值

n = 3
iix = pd.MultiIndex.from_arrays([np.arange(df.shape[0]) // n,
                               (np.arange(df.shape[0]) % n)+1])
df1 = df.set_index(iix).unstack().sort_index(level=1, axis=1)
df1.columns = df1.columns.map('{0[0]}{0[1]}'.format)

Out[215]:
   a1  b1  a2  b2  a3  b3
0   1   2   3   4   5   6
1   7   8   9  10  11  12

Answer 4

Just use the underlying numpy array directly:直接使用底层的numpy数组即可：

import pandas as pd

df = pd.DataFrame(
    [[1,2], [3,4], [5,6], [7,8], [9,10], [11,12]],
    columns=["a", "b"]
)
df_2 = pd.DataFrame(
    df.values.reshape([-1, 4]),
    columns = ["a1", "b1", "a2", "b2"]
)

df looks like: df看起来像：

df_2 looks like: df_2看起来像：

   a1  b1  a2  b2
0   1   2   3   4
1   5   6   7   8
2   9  10  11  12

For a generic solution:对于通用解决方案：

def concat_rows(df, n):
    new_cols = [
        f"{col}{i}"
        for i in range(1, n+1)
        for col in df.columns
    ]
    n_cols = len(df.columns)
    new_df = pd.DataFrame(
        df.values.reshape([-1, n_cols*n]),
        columns=new_cols
    )
    return new_df

df_2 = concat_rows(df, 2)
df_3 = concat_rows(df, 3)

df_2 looks as before. df_2看起来和以前一样。 df_3 looks like: df_3看起来像：

   a1  b1  a2  b2  a3  b3
0   1   2   3   4   5   6
1   7   8   9  10  11  12

如何在 pandas datafrmae 中每 n 行展平

问题描述

4 个解决方案

解决方案1
1 2019-10-22 23:32:18

解决方案2
1 2019-10-22 23:53:28

解决方案3
0 2019-10-22 23:38:03

解决方案4
0 已采纳 2019-10-22 23:40:54

如何在 pandas datafrmae 中每 n 行展平

问题描述

4 个解决方案

解决方案1 1 2019-10-22 23:32:18

解决方案2 1 2019-10-22 23:53:28

解决方案3 0 2019-10-22 23:38:03

解决方案4 0 已采纳 2019-10-22 23:40:54

解决方案1
1 2019-10-22 23:32:18

解决方案2
1 2019-10-22 23:53:28

解决方案3
0 2019-10-22 23:38:03

解决方案4
0 已采纳 2019-10-22 23:40:54