简体   繁体   English

Python itertools对象的组合

[英]Python itertools combinations on objects

Can the python itertools combinations library by used on objects rather than lists? python是否可以在对象而不是列表中使用itertools组合库?

For instance, how may I use it on the following data? 例如,我如何在以下数据上使用它?

Rahul - 20,000 - Mumbai

Shivani - 30,000 - Mumbai

Akash - 40,000 - Bangalore

I want all the possible combinations on the names and the combined salary value. 我想要名字和合并工资值的所有可能组合。

How can I do this with combinations ? 我怎么能用combinations做到这一点?
Assuming the data is read using pd.read_csv and is stored. 假设使用pd.read_csv读取数据并进行存储。

Code so far - 代码到目前为止 -

import pandas as pd
import itertools
df = pd.read_csv('stack.csv')

print (df)

for L in range(0, len(df)+1):
    for subset in itertools.combinations(df['Name'], L):
        print (subset)

Output 产量

      Name  Salary       City
0    Rahul   20000     Mumbai
1  Shivani   30000     Mumbai
2    Akash   40000  Bangalore
()
('Rahul',)
('Shivani',)
('Akash',)
('Rahul', 'Shivani')
('Rahul', 'Akash')
('Shivani', 'Akash')
('Rahul', 'Shivani', 'Akash')

Process finished with exit code 0

How do I add salary to these combinations? 如何为这些组合添加薪水?

First, get your indices: 首先,获取您的指数:

idx = [j for i in range(1, len(df) + 1) for j in list(itertools.combinations(df.index, i))]
# [(0,), (1,), (2,), (0, 1), (0, 2), (1, 2), (0, 1, 2)]

Get your dataframes for each group: 获取每个组的数据帧:

dfs = [df.iloc[list(i)] for i in idx]

Finally, join and sum: 最后,加入和总结:

out = [(', '.join(i.name.values), sum(i.salary.values)) for i in dfs]

Output: 输出:

[('Rahul', 20000),
 ('Shivani', 30000),
 ('Akash', 40000),
 ('Rahul, Shivani', 50000),
 ('Rahul, Akash', 60000),
 ('Shivani, Akash', 70000),
 ('Rahul, Shivani, Akash', 90000)]

If you want this as a dataframe, it's quite simple: 如果你想将它作为数据帧,那很简单:

df1 = pd.DataFrame(out, columns=['names', 'salaries'])

                   names  salaries
0                  Rahul     20000
1                Shivani     30000
2                  Akash     40000
3         Rahul, Shivani     50000
4           Rahul, Akash     60000
5         Shivani, Akash     70000
6  Rahul, Shivani, Akash     90000

To query this dataframe to find the closest value to a given salary, we can write a helper function: 要查询此数据框以找到与给定薪水最接近的值,我们可以编写一个辅助函数:

def return_closest(val):
    return df1.iloc[(df1.salaries - val).abs().idxmin()]


>>> return_closest(55000)
names       Rahul, Shivani
salaries             50000
Name: 3, dtype: object

I intentionally broke this down so you could understand what was going on at each step. 我故意将其打破,这样你才能理解每一步发生的事情。 Once you do understand, you could combine this into a one-liner to create your dataframe: 一旦明白了,你可以结合到这一个班轮创建您的数据帧:

pd.DataFrame(
    [(', '.join(d.name.values), sum(d.salary.values))
    for i in [j for i in range(1, len(df) + 1)
    for j in list(itertools.combinations(df.index, i))]
    for d in [df.iloc[list(i)]]], columns=['names', 'salaries']
)

you can use zip to iterate through both columns at the same time and use a list comprehension to generate the output dataframe such as: 您可以使用zip同时遍历两列,并使用列表推导来生成输出数据帧,例如:

df_ouput = pd.DataFrame( [[', '.join(subset), sum(salaries)] for L in range(1, len(df)+1)
                           for subset, salaries in zip(itertools.combinations(df['Name'], L),
                                                       itertools.combinations(df['Salary'], L))], 
                         columns = ['Names','Sum Salaries'])

and you get: 你得到:

                   Names  Sum Salaries
0                  Rahul         20000
1                Shivani         30000
2                  Akash         40000
3         Rahul, Shivani         50000
4           Rahul, Akash         60000
5         Shivani, Akash         70000
6  Rahul, Shivani, Akash         90000

How about like this? 这样怎么样?

nameList = list()
sumList = list()
for L in range(0, len(df)+1):
    for x in itertools.combinations(df['Name'], L):
        nameList.append(x)
    for y in itertools.combinations(df['Salary'], L):
        sumList.append(sum(y))

newDf = pd.DataFrame()
newDf['Names'] = nameList
newDf['Salary Sum'] = sumList

Output: 输出:

                     Names  Salary Sum
0                       ()           0
1                 (Rahul,)       20000
2               (Shivani,)       30000
3                 (Akash,)       40000
4         (Rahul, Shivani)       50000
5           (Rahul, Akash)       60000
6         (Shivani, Akash)       70000
7  (Rahul, Shivani, Akash)       90000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM