如何通过任意长度的两列列表对熊猫数据框进行子集

Question

I have tried different combinations of Boolean arrays and .isin constructions, but my pandas fu is not strong enough. 我尝试了布尔数组和.isin构造的不同组合，但是我的pandas fu不够强大。

If I have the following example dataframe: 如果我有以下示例数据框：

In[1]:  import pandas as pd
        exampledf = pd.DataFrame({ 'factor1' : ['a', 'b', 'c', 'd', 'a', 'b', 'c', 'd'],
                                   'factor2' : ['e', 'e', 'e', 'e', 'f', 'f', 'f', 'f'],
                                   'numeric' : [1., 2., 3., 4., 5., 6., 7., 8.] })

I need to pass a list of factor1, factor2 pairs of any length to return the subset of the dataframe that has that combination of factors. 我需要传递任意长度的factor1，factor2对的列表，以返回具有该因子组合的数据框的子集。

For example: 例如：

In[2]:  def factorfilter(df, factorlist):
           # code goes here
           # returns a dataframe

        factorfilter(exampledf, [['a', 'e'], ['c', 'f']])

Out[2]:   factor1 factor2  numeric
        0       a       e        1
        6       f       f        7

(If there's a better way to set this up than with lists, I'm all ears, it's just what occurred to me and is easy to produce and pass to a function). （如果有比列表更好的设置方法，我全都听着，这就是发生在我身上的，很容易生成并传递给函数）。

Answer 1

You can utilize a multi-index (index off more than one column). 您可以利用多索引（索引超过一列）。 Two ways of building an index from the example schema come to mind. 我想到了从示例模式构建索引的两种方法。

import pandas as pd
index = pd.MultiIndex.from_product([list('abcd'),list('ef')],
                                   names=['factor1','factor2'])

or 要么

factor1 = list('abcdabcd')
factor2 = list('eeeeffff')
index = pd.MultIndex.from_tuples(list(zip(factor1, factor2)),
                                 names=['factor1', 'factor2'])

from this, you can create a multi-index DataFrame by 由此，您可以通过以下方式创建多索引DataFrame ：

numerics = list(range(1,9))
df = pd.DataFrame({'numeric': numerics}, index=index)

df outputs df输出

                 numeric
factor1 factor2
a       e              1
        f              2
b       e              3
        f              4
c       e              5
        f              6
d       e              7
        f              8

[8 rows x 1 columns]

Then, you can retrieve a subset of indices, by passing a list of tuples to the ix property. 然后，您可以通过将元组列表传递给ix属性来检索索引的子集。

subdf = df.ix[[('a','e'), ('c','f')]]

subdf outputs subdf输出

                 numeric
factor1 factor2
a       e              1
c       f              6

[2 rows x 1 columns]

如何通过任意长度的两列列表对熊猫数据框进行子集

问题描述

1 个解决方案

解决方案1
2 2014-04-24 16:55:42

如何通过任意长度的两列列表对熊猫数据框进行子集

问题描述

1 个解决方案

解决方案1 2 2014-04-24 16:55:42

解决方案1
2 2014-04-24 16:55:42