简体   繁体   中英

Find value based on a combination of columns

Is there a way to find a value based on the combination of column values?

Example:

df = pd.DataFrame({
    'One' : [np.random.randint(1, 10) for i in range(10)],
    'Two' : [np.random.randint(1, 10) for i in range(10)],
    'Three' : [np.random.randint(1, 10) for i in range(10)],
    'Four' : [np.random.randint(1, 10) for i in range(10)],
})

.

In [6]: df
Out[6]:
   One  Two  Three  Four
0    8    1      7     5
1    6    3      3     3
2    4    7      5     2
3    4    2      6     9
4    1    7      1     9
5    9    8      3     8
6    4    8      4     4
7    1    9      7     1
8    4    2      6     4
9    1    3      7     7

I have the following df and I'm interested which of the columns would I need to combine to get number 9. This would be a column based combination, so each row would output a name of columns that were combined to find 9.

  • In example row 0 will be give the result: ['One', 'Two']

  • row 1: will give a combination of 'One' and all other three. ['One', 'Two'], ['One', 'Three'], ['One', 'Four']

  • row 2: False

  • row 3: ['Four']

and etc...

NOTE: DataFrame should stay immutabile.

Thanks for any future suggestions or help.

First get all the combinations of column names for the dataframe, you can use itertools.combinations for it, then create a function that will calculate the sum for each of the combination of column names, and store such combinations in temporary list if the sum equals the required sum, finally return the list of column combinations, and apply this function on dataframe, for axis=1

import itertools
cols = [j for i in [[list(col)
                     for col in (itertools.combinations(df.columns, i))]
                    for i in range(1, df.shape[1] + 1)]
        for j in i]
def getSubArray(row, sum_=9):
    result=[]
    for col in cols:
        if row.loc[col].sum()==sum_:
            result.append(col)
    return result 
result = df.apply(getSubArray, axis=1)

OUTPUT:

0                                                   [[One, Two]]
1    [[One, Two], [One, Three], [One, Four], [Two, Three, Four]]
2                                    [[One, Three], [Two, Four]]
3                                                       [[Four]]
4                                    [[Four], [One, Two, Three]]
5                                                        [[One]]
6                                                             []
7                                    [[Two], [One, Three, Four]]
8                                                             []
9                                                             []
dtype: object

You can pass any value for required sum to getSubArray from .apply call, like: df.apply(getSubArray, axis=1, sum_=24)

from itertools import combinations
def find_9(x, v=9):
    c = combinations(x.index, 2)
    columns = []
    for i,j in c:
        if (x[i]+x[j]) == v:
            columns.append([i,j])
    return columns if columns else False

df.apply(lambda x: find_9(x),axis= 1)

ouput:

0                               [[One, Two]]
1    [[One, Two], [One, Three], [One, Four]]
2                [[One, Three], [Two, Four]]
3                                      False
4                                      False
5                                      False
6                                      False
7                                      False
8                                      False
9                                      False

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM