简体   繁体   English

Python Pandas:如何将列设置为索引?

[英]Python Pandas: How To Set Columns as an Index?

I was wondering if I might be missing an easy way to pull in a set of column names in as an index in a data frame. 我想知道我是否可能缺少一种简单的方法来提取一组列名作为数据帧中的索引。

The following is the example code I set up with my current (messy) solution: 以下是我使用当前(混乱)解决方案设置的示例代码:

df1 = pd.DataFrame({
'A' : ['a1', 'a1', 'a2', 'a3'],
'B' : ['b1', 'b2', 'b3', 'b4'],
'D1' : [1,0,0,0],
'D2' : [0,1,1,0],
'D3' : [0,0,1,1],
})

df1 = df1.set_index(['A','B'])
b = df1.unstack().unstack()
c = b.reset_index()
c.columns = ['D','B','A','Value']
d = c.set_index(['A','B','D'])
final1 = d.unstack()

df2 = pd.DataFrame({
'A' : ['a1', 'a1', 'a2', 'a3'],
'B' : ['b1', 'b2', 'b3', 'b4'],
'D1' : [1,0,0,0],
'D2' : [0,0,0,0],
'D3' : [0,0,0,1],
})

df2 = df2.set_index(['A','B'])
b = df2.unstack().unstack()
c = b.reset_index()
c.columns = ['D','B','A','Value']
d = c.set_index(['A','B','D'])
final2 = d.unstack()

result = (final1*final2).dropna()

So just by way of more background, the actual problem I am trying to solve is as follows: I have N number of data frames (eg df1, df2) which consist of 1s and 0s and I am trying to find a way to use Pandas to multiply them all together based on a 3-dimensional index in order to find the intersection of them (ie result). 因此,仅通过更多背景知识,我尝试解决的实际问题如下: 我有N个数据帧(例如df1,df2),该数据帧由1和0组成,我试图找到一种使用熊猫的方法根据3维索引将它们全部相乘,以便找到它们的交集(即结果)。

In order to do so, I thought why not convert the data set into Pandas data frames and then set the index to be the 3 dimensions. 为此,我想为什么不将数据集转换为Pandas数据帧,然后将索引设置为3维。 Then as shown above it should just be an easy multiplication job and Pandas will take care of the rest. 然后,如上所示,这应该只是一个简单的乘法工作,而熊猫会照顾其余的工作。

However, the data comes in the format shown in df1/df2. 但是,数据采用df1 / df2中显示的格式。 As such, the code above highlights my messy attempt at converting the data into a Pandas data frame with 3 indices. 这样,上面的代码突出了我在将数据转换为具有3个索引的Pandas数据帧时的混乱尝试。 So, again, was wondering if there was an easier way to move a set of column names into an index. 因此,再次想知道是否有更简单的方法将一组列名移入索引。

Thanks! 谢谢!

I think that you can just put all of your frames in a list and reduce. 我认为您可以将所有帧都放在列表中并缩小。 They will align each time; 他们每次都会对齐; including the fill_value=1 will propogate the values when multiplied vs a NaN (which is what I think you want). 包括fill_value = 1会在与NaN相乘时传播这些值(这是我认为您想要的)。

In [39]: list_of_dfs = [df1,df2]

In [40]: reduce(lambda x,y: x.mul(y,fill_value=1), list_of_dfs[1:], list_of_dfs[0])
Out[40]: 
       D1  D2  D3
A  B             
a1 b1   1   0   0
   b2   0   0   0
a2 b3   0   0   0
a3 b4   0   0   1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM