[英]Python itertools combinations on objects
Can the python itertools combinations library by used on objects rather than lists? python是否可以在对象而不是列表中使用itertools组合库?
For instance, how may I use it on the following data? 例如,我如何在以下数据上使用它?
Rahul - 20,000 - Mumbai
Shivani - 30,000 - Mumbai
Akash - 40,000 - Bangalore
I want all the possible combinations on the names and the combined salary value. 我想要名字和合并工资值的所有可能组合。
How can I do this with combinations
? 我怎么能用
combinations
做到这一点?
Assuming the data is read using pd.read_csv
and is stored. 假设使用
pd.read_csv
读取数据并进行存储。
Code so far - 代码到目前为止 -
import pandas as pd
import itertools
df = pd.read_csv('stack.csv')
print (df)
for L in range(0, len(df)+1):
for subset in itertools.combinations(df['Name'], L):
print (subset)
Output 产量
Name Salary City
0 Rahul 20000 Mumbai
1 Shivani 30000 Mumbai
2 Akash 40000 Bangalore
()
('Rahul',)
('Shivani',)
('Akash',)
('Rahul', 'Shivani')
('Rahul', 'Akash')
('Shivani', 'Akash')
('Rahul', 'Shivani', 'Akash')
Process finished with exit code 0
How do I add salary to these combinations? 如何为这些组合添加薪水?
First, get your indices: 首先,获取您的指数:
idx = [j for i in range(1, len(df) + 1) for j in list(itertools.combinations(df.index, i))]
# [(0,), (1,), (2,), (0, 1), (0, 2), (1, 2), (0, 1, 2)]
Get your dataframes for each group: 获取每个组的数据帧:
dfs = [df.iloc[list(i)] for i in idx]
Finally, join and sum: 最后,加入和总结:
out = [(', '.join(i.name.values), sum(i.salary.values)) for i in dfs]
Output: 输出:
[('Rahul', 20000),
('Shivani', 30000),
('Akash', 40000),
('Rahul, Shivani', 50000),
('Rahul, Akash', 60000),
('Shivani, Akash', 70000),
('Rahul, Shivani, Akash', 90000)]
If you want this as a dataframe, it's quite simple: 如果你想将它作为数据帧,那很简单:
df1 = pd.DataFrame(out, columns=['names', 'salaries'])
names salaries
0 Rahul 20000
1 Shivani 30000
2 Akash 40000
3 Rahul, Shivani 50000
4 Rahul, Akash 60000
5 Shivani, Akash 70000
6 Rahul, Shivani, Akash 90000
To query this dataframe to find the closest value to a given salary, we can write a helper function: 要查询此数据框以找到与给定薪水最接近的值,我们可以编写一个辅助函数:
def return_closest(val):
return df1.iloc[(df1.salaries - val).abs().idxmin()]
>>> return_closest(55000)
names Rahul, Shivani
salaries 50000
Name: 3, dtype: object
I intentionally broke this down so you could understand what was going on at each step. 我故意将其打破,这样你才能理解每一步发生的事情。 Once you do understand, you could combine this into a one-liner to create your dataframe:
一旦你明白了,你可以结合到这一个班轮创建您的数据帧:
pd.DataFrame(
[(', '.join(d.name.values), sum(d.salary.values))
for i in [j for i in range(1, len(df) + 1)
for j in list(itertools.combinations(df.index, i))]
for d in [df.iloc[list(i)]]], columns=['names', 'salaries']
)
you can use zip
to iterate through both columns at the same time and use a list comprehension to generate the output dataframe such as: 您可以使用
zip
同时遍历两列,并使用列表推导来生成输出数据帧,例如:
df_ouput = pd.DataFrame( [[', '.join(subset), sum(salaries)] for L in range(1, len(df)+1)
for subset, salaries in zip(itertools.combinations(df['Name'], L),
itertools.combinations(df['Salary'], L))],
columns = ['Names','Sum Salaries'])
and you get: 你得到:
Names Sum Salaries
0 Rahul 20000
1 Shivani 30000
2 Akash 40000
3 Rahul, Shivani 50000
4 Rahul, Akash 60000
5 Shivani, Akash 70000
6 Rahul, Shivani, Akash 90000
How about like this? 这样怎么样?
nameList = list()
sumList = list()
for L in range(0, len(df)+1):
for x in itertools.combinations(df['Name'], L):
nameList.append(x)
for y in itertools.combinations(df['Salary'], L):
sumList.append(sum(y))
newDf = pd.DataFrame()
newDf['Names'] = nameList
newDf['Salary Sum'] = sumList
Output: 输出:
Names Salary Sum
0 () 0
1 (Rahul,) 20000
2 (Shivani,) 30000
3 (Akash,) 40000
4 (Rahul, Shivani) 50000
5 (Rahul, Akash) 60000
6 (Shivani, Akash) 70000
7 (Rahul, Shivani, Akash) 90000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.