简体   繁体   English

Pandas:根据 MultiIndex 中整个列的相同值有条件地删除列 dataframe

[英]Pandas: Conditionally dropping columns based on same values throughout the column in MultiIndex dataframe

I have a dataframe as below:我有一个 dataframe 如下:

data = {('5105', 'Open'): [1.99,1.98,1.99,2.05,2.15],
        ('5105', 'Adj Close'): [1.92,1.92,1.96,2.07,2.08],
        ('5229', 'Open'): [0.01]*5,
        ('5229', 'Adj Close'): [0.02]*5,
        ('7076', 'Open'): [1.02,1.01,1.01,1.06,1.06],
        ('7076', 'Adj Close'): [0.90,0.92,0.94,0.94,0.95]}

df = pd.DataFrame(data)

   5105            5229            7076          
   Open Adj Close  Open Adj Close  Open Adj Close
0  1.99      1.92  0.01      0.02  1.02      0.90
1  1.98      1.92  0.01      0.02  1.01      0.92
2  1.99      1.96  0.01      0.02  1.01      0.94
3  2.05      2.07  0.01      0.02  1.06      0.94
4  2.15      2.08  0.01      0.02  1.06      0.95

As the dataframe above, we can see that df['5229'] has both columns Open and Adj Close having the same values respectively throughout the column.如上面的 dataframe,我们可以看到df['5229'] OpenAdj Close两列在整个列中分别具有相同的值。 So, I intend to drop it since it will not be useful in my analysis.所以,我打算放弃它,因为它对我的分析没有用。

I have two queries:我有两个疑问:

  1. How do I drop the column on level 0 (that is the 1st column) if its subcolumns have the same values respectively throughout the column?如果它的子列在整个列中分别具有相同的值,我如何将列放在第 0 级(即第 1 列)?
  2. On the other hand, if there's just one subcolumn that has the same values throughout the column, how can I drop it?另一方面,如果只有一个子列在整个列中具有相同的值,我该如何删除它?

As this is a conditional-based dropping, I was wondering if df.drop still works in this case?由于这是基于条件的丢弃,我想知道df.drop在这种情况下是否仍然有效?

Based on my 1st and 2nd query, in my case above, since the Open and Adj Close are having same values throughout the column, I would like to drop it entirely.根据我的第一个和第二个查询,在我上面的例子中,由于OpenAdj Close在整个列中具有相同的值,我想完全放弃它。

The expected output is:预期的 output 是:

   5105            7076          
   Open Adj Close  Open Adj Close
0  1.99      1.92  1.02      0.90
1  1.98      1.92  1.01      0.92
2  1.99      1.96  1.01      0.94
3  2.05      2.07  1.06      0.94
4  2.15      2.08  1.06      0.95

Edit编辑

Really thank you for those answering the question.真的很感谢回答问题的人。 Just to be more concise, I was trying to drop the columns from the dataframe consisting of more than 200 columns given the condition if all the values in that particular column are the same.为了更简洁,我试图从 dataframe 中删除包含 200 多列的列,条件是该特定列中的所有值都相同。

Try with nunique试试nunique

df = df.loc[:,~(df.nunique()==1).values]
Out[125]: 
   5105            7076          
   Open Adj Close  Open Adj Close
0  1.99      1.92  1.02      0.90
1  1.98      1.92  1.01      0.92
2  1.99      1.96  1.01      0.94
3  2.05      2.07  1.06      0.94
4  2.15      2.08  1.06      0.95

Try this:尝试这个:

df.drop('5229',level=0,axis=1)

Output: Output:

   5105            7076          
   Open Adj Close  Open Adj Close
0  1.99      1.92  1.02      0.90
1  1.98      1.92  1.01      0.92
2  1.99      1.96  1.01      0.94
3  2.05      2.07  1.06      0.94
4  2.15      2.08  1.06      0.95

We could use unstack + groupby + nunique to get the number of unique values in each column.我们可以使用unstack + groupby + nunique来获取每列中唯一值的数量。 Then select only the columns with more than 1 value by the loc :然后 select 只有loc值超过 1 的列:

out = df[df.unstack().groupby(level=[0,1]).nunique().loc[lambda x: x!=1].index]

Output: Output:

       5105            7076      
  Adj Close  Open Adj Close  Open
0      1.92  1.99      0.90  1.02
1      1.92  1.98      0.92  1.01
2      1.96  1.99      0.94  1.01
3      2.07  2.05      0.94  1.06
4      2.08  2.15      0.95  1.06

you can try this:你可以试试这个:

for a, b in df.columns:
    if df[a][b].duplicated(keep=False).sum() == df[a][b].size:
        df.drop((a, b), axis=1, inplace=True)

Result:结果:

   5105            7076          
   Open Adj Close  Open Adj Close
0  1.99      1.92  1.02      0.90
1  1.98      1.92  1.01      0.92
2  1.99      1.96  1.01      0.94
3  2.05      2.07  1.06      0.94
4  2.15      2.08  1.06      0.95

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM