[英]transpose multiple columns Pandas dataframe
I'm trying to reshape a dataframe, but I'm not able to get the results I need.我正在尝试重塑数据框,但无法获得所需的结果。 The dataframe looks like this:
数据框如下所示:
m r s p O W N
1 4 3 1 2.81 3.70 3.03
1 4 4 1 2.14 2.82 2.31
1 4 5 1 1.47 1.94 1.59
1 4 3 2 0.58 0.78 0.60
1 4 4 2 0.67 0.00 0.00
1 4 5 2 1.03 2.45 1.68
1 4 3 3 1.98 1.34 1.81
1 4 4 3 0.00 0.04 0.15
1 4 5 3 0.01 0.00 0.26
I need to reshape the dataframe so it will look like this:我需要重塑数据框,使其看起来像这样:
m r s p O W N p O W N p O W N
1 4 3 1 2.81 3.70 3.03 2 0.58 0.78 0.60 3 1.98 1.34 1.81
1 4 4 1 2.14 2.82 2.31 2 0.67 0.00 0.00 3 0.00 0.04 0.15
1 4 5 1 1.47 1.94 1.59 2 1.03 2.45 1.68 3 0.01 0.00 0.26
I tried to use the pivot_table
function我尝试使用
pivot_table
函数
df.pivot_table(index=['m','r','s'], columns=['p'], values=['O','W','N'])
but I'm not able to get quite what I want.但我无法得到我想要的。 Does anyone know how to do this?
有谁知道如何做到这一点?
As someone who fancies himself as pretty handy with pandas, the pivot_table
and melt
functions are confusing to me.作为一个自认为对
pivot_table
非常方便的人, pivot_table
和melt
函数让我感到困惑。 I prefer to stick with a well-defined and unique index and use the stack
and unstack
methods of the dataframe itself.我更喜欢坚持使用定义明确且唯一的索引,并使用数据帧本身的
stack
和unstack
方法。
First, I'll ask if you really need to repeat the p-column like that?首先,我会问你是否真的需要像那样重复 p 列? I can sort of see its value when presenting data, but IMO pandas isn't really set up to work like that.
在呈现数据时我可以看到它的价值,但 IMO pandas 并没有真正设置为那样工作。 We could shoehorn it in, but let's see if a simpler solution gets you what you need.
我们可以硬塞进去,但让我们看看更简单的解决方案是否能满足您的需求。
Here's what I would do:这是我会做的:
from io import StringIO
import pandas
datatable = StringIO("""\
m r s p O W N
1 4 3 1 2.81 3.70 3.03
1 4 4 1 2.14 2.82 2.31
1 4 5 1 1.47 1.94 1.59
1 4 3 2 0.58 0.78 0.60
1 4 4 2 0.67 0.00 0.00
1 4 5 2 1.03 2.45 1.68
1 4 3 3 1.98 1.34 1.81
1 4 4 3 0.00 0.04 0.15
1 4 5 3 0.01 0.00 0.26""")
df = (
pandas.read_table(datatable, sep='\s+')
.set_index(['m', 'r', 's', 'p'])
.unstack(level='p')
)
df.columns = df.columns.swaplevel(0, 1)
df.sort(axis=1, inplace=True)
print(df)
Which prints:哪个打印:
p 1 2 3
O W N O W N O W N
m r s
1 4 3 2.81 3.70 3.03 0.58 0.78 0.60 1.98 1.34 1.81
4 2.14 2.82 2.31 0.67 0.00 0.00 0.00 0.04 0.15
5 1.47 1.94 1.59 1.03 2.45 1.68 0.01 0.00 0.26
So now the columns are a MultiIndex and you can access, for example, all of the values where p = 2
with df[2]
or df.xs(2, level='p', axis=1)
, which gives me:所以现在这些列是一个 MultiIndex 并且你可以访问,例如,所有的值,其中
p = 2
与df[2]
或df.xs(2, level='p', axis=1)
,这给了我:
O W N
m r s
1 4 3 0.58 0.78 0.60
4 0.67 0.00 0.00
5 1.03 2.45 1.68
Similarly, you can get all of the W
columns with: df.xs('W', level=1, axis=1)
(we say level=1
) because that column level does not have a name, so we use its position instead)类似地,您可以使用以下命令获取所有
W
列: df.xs('W', level=1, axis=1)
(我们说level=1
)因为该列级别没有名称,所以我们使用它的位置反而)
p 1 2 3
m r s
1 4 3 3.70 0.78 1.34
4 2.82 0.00 0.04
5 1.94 2.45 0.00
You can similarly query the columns by using axis=0
.您可以使用
axis=0
类似地查询列。
If you really need the p
values in a column, just add it there manually and reindex your columns:如果您真的需要列中的
p
值,只需手动添加它并重新索引您的列:
for p in df.columns.get_level_values('p').unique():
df[p, 'p'] = p
cols = pandas.MultiIndex.from_product([[1,2,3], list('pOWN')])
df = df.reindex(columns=cols)
print(df)
1 2 3
p O W N p O W N p O W N
m r s
1 4 3 1 2.81 3.70 3.03 2 0.58 0.78 0.60 3 1.98 1.34 1.81
4 1 2.14 2.82 2.31 2 0.67 0.00 0.00 3 0.00 0.04 0.15
5 1 1.47 1.94 1.59 2 1.03 2.45 1.68 3 0.01 0.00 0.26
b = open('ss2.csv', 'w')
a = csv.writer(b)
sk = ''
with open ('df_col2.csv', 'r') as ann:
for col in ann:
an = col.lower().strip('\n').split(',')
suk += an[0] + ','
sk = sk[:-2]
a.writerow([sk])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.