[英]Pandas DataFrame split numbered columns into rows by column title
I have a DataFrame that looks something like this:我有一个看起来像这样的 DataFrame:
df = pd.DataFrame({
'A': [0, 1, 2, 3, 4],
'B': ['a', 'b', 'c', 'd', 'e'],
'V1': [0.0, 0.1, 0.2, 0.3, 0.4],
'V2': [1.0, 1.1, 1.2, 1.3, 1.4],
'V3': [2.0, 2.1, 2.2, 2.3, 2.4],
'X': ['alpha', 'beta', 'gamma', 'delta', 'epsilon'],
})
A B V1 V2 V3 X
0 0 a 0.0 1.0 2.0 alpha
1 1 b 0.1 1.1 2.1 beta
2 2 c 0.2 1.2 2.2 gamma
3 3 d 0.3 1.3 2.3 delta
4 4 e 0.4 1.4 2.4 epsilon
I'd like to use the number in the V columns to spread this out into a long-form table.我想使用 V 列中的数字将其展开到一个长表格中。 The number in the column label (1 for V1, 2 for V2 etc) would become a new column value in a column named "V Number" or whatever and the value would be the only "V" in that row.
列标签中的数字(V1 为 1,V2 为 2 等)将成为名为“V Number”或其他名称的列中的新列值,并且该值将是该行中唯一的“V”。 Something like this (I've hidden the index here as I don't care about that):
像这样的东西(我在这里隐藏了索引,因为我不在乎):
A B V Number V X
0 a 1 0.0 alpha # Old first row, V1 value
0 a 2 1.0 alpha # Old first row, V2 value
0 a 3 2.0 alpha # Old first row, V3 value
1 b 1 0.1 beta # Old second row, V1 value
1 b 2 1.1 beta # etc...
1 b 3 2.1 beta
2 c 1 0.2 gamma
2 c 2 1.2 gamma
2 c 3 2.2 gamma
3 d 1 0.3 delta
3 d 2 1.3 delta
3 d 3 2.3 delta
4 e 1 0.4 epsilon
4 e 2 1.4 epsilon
4 e 3 2.4 epsilon
In the real DataFrame there are over 40 "V" columns, over 100 other columns and several thousand rows, so a reasonable simple and fast method would be nice!在真正的 DataFrame 中,有超过 40 个“V”列,超过 100 个其他列和几千行,所以一个合理的简单快速的方法会很好! In case it helps, the column names are easy to isolate (they're actually called eg
Test Voltage (3)
', but I shortened them for the purposes of the example) with something like [i for i in df.columns if 'Test Voltage' in i]
.如果有帮助,列名很容易隔离(它们实际上被称为例如
Test Voltage (3)
',但我为了示例的目的将它们缩短了)类似于[i for i in df.columns if 'Test Voltage' in i]
。
Has anyone got any ideas of a straightforward way to do this?有没有人有任何简单的方法来做到这一点? I've tried searching for lots of methods, but keep just finding ways to split columns with lists in the cells.
我已经尝试搜索了很多方法,但一直在寻找将单元格中包含列表的列拆分的方法。
Try with wide_to_long
尝试用
wide_to_long
out = pd.wide_to_long(df,['V'],i=['A','B','X'],j='number').reset_index()
Out[23]:
A B X number V
0 0 a alpha 1 0.0
1 0 a alpha 2 1.0
2 0 a alpha 3 2.0
3 1 b beta 1 0.1
4 1 b beta 2 1.1
5 1 b beta 3 2.1
6 2 c gamma 1 0.2
7 2 c gamma 2 1.2
8 2 c gamma 3 2.2
9 3 d delta 1 0.3
10 3 d delta 2 1.3
11 3 d delta 3 2.3
12 4 e epsilon 1 0.4
13 4 e epsilon 2 1.4
14 4 e epsilon 3 2.4
Use melt
:使用
melt
:
>>> df.melt(id_vars=['A', 'B', 'X'], var_name='V Number', value_name='V')
A B X V Number V
0 0 a alpha V1 0.0
1 1 b beta V1 0.1
2 2 c gamma V1 0.2
3 3 d delta V1 0.3
4 4 e epsilon V1 0.4
5 0 a alpha V2 1.0
6 1 b beta V2 1.1
7 2 c gamma V2 1.2
8 3 d delta V2 1.3
9 4 e epsilon V2 1.4
10 0 a alpha V3 2.0
11 1 b beta V3 2.1
12 2 c gamma V3 2.2
13 3 d delta V3 2.3
14 4 e epsilon V3 2.4
You can also use .stack()
, as follows:您还可以使用
.stack()
,如下所示:
(df.set_index(['A', 'B', 'X'])
.rename_axis(columns='V Number')
.stack()
.reset_index(name='V')
)
Result:结果:
A B X V Number V
0 0 a alpha V1 0.0
1 0 a alpha V2 1.0
2 0 a alpha V3 2.0
3 1 b beta V1 0.1
4 1 b beta V2 1.1
5 1 b beta V3 2.1
6 2 c gamma V1 0.2
7 2 c gamma V2 1.2
8 2 c gamma V3 2.2
9 3 d delta V1 0.3
10 3 d delta V2 1.3
11 3 d delta V3 2.3
12 4 e epsilon V1 0.4
13 4 e epsilon V2 1.4
14 4 e epsilon V3 2.4
If you want the V Number
column to have only the number, you can use:如果您希望
V Number
列只有数字,您可以使用:
df2 = (df.set_index(['A', 'B', 'X'])
.rename_axis(columns='V Number')
.stack()
.reset_index(name='V')
)
df2['V Number'] = df2['V Number'].str[1:]
Result:结果:
print(df2)
A B X V Number V
0 0 a alpha 1 0.0
1 0 a alpha 2 1.0
2 0 a alpha 3 2.0
3 1 b beta 1 0.1
4 1 b beta 2 1.1
5 1 b beta 3 2.1
6 2 c gamma 1 0.2
7 2 c gamma 2 1.2
8 2 c gamma 3 2.2
9 3 d delta 1 0.3
10 3 d delta 2 1.3
11 3 d delta 3 2.3
12 4 e epsilon 1 0.4
13 4 e epsilon 2 1.4
14 4 e epsilon 3 2.4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.