简体   繁体   English

Pandas DataFrame 按列标题将编号列拆分为行

[英]Pandas DataFrame split numbered columns into rows by column title

I have a DataFrame that looks something like this:我有一个看起来像这样的 DataFrame:

df = pd.DataFrame({
    'A': [0, 1, 2, 3, 4],
    'B': ['a', 'b', 'c', 'd', 'e'],
    'V1': [0.0, 0.1, 0.2, 0.3, 0.4],
    'V2': [1.0, 1.1, 1.2, 1.3, 1.4],
    'V3': [2.0, 2.1, 2.2, 2.3, 2.4],
    'X': ['alpha', 'beta', 'gamma', 'delta', 'epsilon'],
})
   A  B   V1   V2   V3        X
0  0  a  0.0  1.0  2.0    alpha
1  1  b  0.1  1.1  2.1     beta
2  2  c  0.2  1.2  2.2    gamma
3  3  d  0.3  1.3  2.3    delta
4  4  e  0.4  1.4  2.4  epsilon

I'd like to use the number in the V columns to spread this out into a long-form table.我想使用 V 列中的数字将其展开到一个长表格中。 The number in the column label (1 for V1, 2 for V2 etc) would become a new column value in a column named "V Number" or whatever and the value would be the only "V" in that row.列标签中的数字(V1 为 1,V2 为 2 等)将成为名为“V Number”或其他名称的列中的新列值,并且该值将是该行中唯一的“V”。 Something like this (I've hidden the index here as I don't care about that):像这样的东西(我在这里隐藏了索引,因为我不在乎):

A  B  V Number    V        X
0  a      1     0.0    alpha     # Old first row, V1 value
0  a      2     1.0    alpha     # Old first row, V2 value
0  a      3     2.0    alpha     # Old first row, V3 value
1  b      1     0.1     beta     # Old second row, V1 value
1  b      2     1.1     beta     # etc...
1  b      3     2.1     beta
2  c      1     0.2    gamma
2  c      2     1.2    gamma
2  c      3     2.2    gamma
3  d      1     0.3    delta
3  d      2     1.3    delta
3  d      3     2.3    delta
4  e      1     0.4  epsilon
4  e      2     1.4  epsilon
4  e      3     2.4  epsilon

In the real DataFrame there are over 40 "V" columns, over 100 other columns and several thousand rows, so a reasonable simple and fast method would be nice!在真正的 DataFrame 中,有超过 40 个“V”列,超过 100 个其他列和几千行,所以一个合理的简单快速的方法会很好! In case it helps, the column names are easy to isolate (they're actually called eg Test Voltage (3) ', but I shortened them for the purposes of the example) with something like [i for i in df.columns if 'Test Voltage' in i] .如果有帮助,列名很容易隔离(它们实际上被称为例如Test Voltage (3) ',但我为了示例的目的将它们缩短了)类似于[i for i in df.columns if 'Test Voltage' in i]

Has anyone got any ideas of a straightforward way to do this?有没有人有任何简单的方法来做到这一点? I've tried searching for lots of methods, but keep just finding ways to split columns with lists in the cells.我已经尝试搜索了很多方法,但一直在寻找将单元格中包含列表的列拆分的方法。

Try with wide_to_long尝试用wide_to_long

out = pd.wide_to_long(df,['V'],i=['A','B','X'],j='number').reset_index()
Out[23]: 
    A  B        X  number    V
0   0  a    alpha       1  0.0
1   0  a    alpha       2  1.0
2   0  a    alpha       3  2.0
3   1  b     beta       1  0.1
4   1  b     beta       2  1.1
5   1  b     beta       3  2.1
6   2  c    gamma       1  0.2
7   2  c    gamma       2  1.2
8   2  c    gamma       3  2.2
9   3  d    delta       1  0.3
10  3  d    delta       2  1.3
11  3  d    delta       3  2.3
12  4  e  epsilon       1  0.4
13  4  e  epsilon       2  1.4
14  4  e  epsilon       3  2.4

Use melt :使用melt

>>> df.melt(id_vars=['A', 'B', 'X'], var_name='V Number', value_name='V')
    A  B        X V Number    V
0   0  a    alpha       V1  0.0
1   1  b     beta       V1  0.1
2   2  c    gamma       V1  0.2
3   3  d    delta       V1  0.3
4   4  e  epsilon       V1  0.4
5   0  a    alpha       V2  1.0
6   1  b     beta       V2  1.1
7   2  c    gamma       V2  1.2
8   3  d    delta       V2  1.3
9   4  e  epsilon       V2  1.4
10  0  a    alpha       V3  2.0
11  1  b     beta       V3  2.1
12  2  c    gamma       V3  2.2
13  3  d    delta       V3  2.3
14  4  e  epsilon       V3  2.4

You can also use .stack() , as follows:您还可以使用.stack() ,如下所示:

(df.set_index(['A', 'B', 'X'])
   .rename_axis(columns='V Number')
   .stack()
   .reset_index(name='V')
)

Result:结果:

    A  B        X V Number    V
0   0  a    alpha       V1  0.0
1   0  a    alpha       V2  1.0
2   0  a    alpha       V3  2.0
3   1  b     beta       V1  0.1
4   1  b     beta       V2  1.1
5   1  b     beta       V3  2.1
6   2  c    gamma       V1  0.2
7   2  c    gamma       V2  1.2
8   2  c    gamma       V3  2.2
9   3  d    delta       V1  0.3
10  3  d    delta       V2  1.3
11  3  d    delta       V3  2.3
12  4  e  epsilon       V1  0.4
13  4  e  epsilon       V2  1.4
14  4  e  epsilon       V3  2.4

If you want the V Number column to have only the number, you can use:如果您希望V Number列只有数字,您可以使用:

df2 = (df.set_index(['A', 'B', 'X'])
         .rename_axis(columns='V Number')
         .stack()
         .reset_index(name='V')
      )

df2['V Number'] = df2['V Number'].str[1:]

Result:结果:

print(df2)

    A  B        X V Number    V
0   0  a    alpha        1  0.0
1   0  a    alpha        2  1.0
2   0  a    alpha        3  2.0
3   1  b     beta        1  0.1
4   1  b     beta        2  1.1
5   1  b     beta        3  2.1
6   2  c    gamma        1  0.2
7   2  c    gamma        2  1.2
8   2  c    gamma        3  2.2
9   3  d    delta        1  0.3
10  3  d    delta        2  1.3
11  3  d    delta        3  2.3
12  4  e  epsilon        1  0.4
13  4  e  epsilon        2  1.4
14  4  e  epsilon        3  2.4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM