简体   繁体   English

将 Object 中的字段求和到给定数字以求解 Python 中的最大值

[英]Sum FIelds From an Object to a Given Number to Solve For Maximum in Python

I've spent the last hour doing some data entry and have hit a brick wall in Python now.我花了最后一个小时做一些数据输入,现在在 Python 中遇到了障碍。

Basically I have a set of data in JSON, where I want to sum the values from field price to add up to a certain value (14.0 in my case).基本上我在 JSON 中有一组数据,我想将字段price中的值相加以达到某个值(在我的情况下为 14.0)。 The final result should maximise the sum of the return field.最终结果应该最大化return字段的总和。 Here's an example of my dataset (there are more teams and fields):这是我的数据集的一个示例(有更多的团队和领域):

[
  { "team": "England", "price": 7.0, "return": 2.21 },
  { "team": "Belgium", "price": 7.0, "return": 2.27 },
  { "team": "Spain", "price": 6.0, "return": 2.14 },
  { "team": "Slovakia", "price": 1.0, "return": 0.97 }
]

So in this case, there are 3 possible answers:所以在这种情况下,有3个可能的答案:

a) England, Belgium (4.48) a) 英格兰、比利时 (4.48)

b) England, Spain, Slovakia (5.28) b) 英格兰、西班牙、斯洛伐克(5.28)

c) Belgium, Spain, Slovakia (5.38) c) 比利时、西班牙、斯洛伐克 (5.38)

With c) being the optimum because it has the biggest sum of return (5.38). c) 是最优的,因为它具有最大的return总和 (5.38)。 I would like to use Python to implement the solution.我想使用 Python 来实现该解决方案。

I've had a look at this question, but can't seem to figure out how to implement it in my case: Finding all possible combinations of numbers to reach a given sum我已经看过这个问题,但似乎无法弄清楚如何在我的情况下实现它: 找到所有可能的数字组合以达到给定的总和

NOTE : This solution takes one assumption which is unique team names in the data for doing indexing in the converted dataframe.注意:此解决方案采用一个假设,即数据中的唯一团队名称用于在转换后的 dataframe 中进行索引。

Firstly, Convert your JSON data to a pandas dataframe.首先,将您的 JSON 数据转换为 pandas dataframe。

from itertools import combinations
import pandas as pd

data = [
    {"team": "England", "price": 7.0, "return": 2.21 },
    {"team": "Belgium", "price": 7.0, "return": 2.27 },
    {"team": "Spain", "price": 6.0, "return": 2.14 },
    { "team": "Slovakia", "price": 1.0, "return": 0.97 }
    ]

df = pd.DataFrame(data)

       team  price  return
0   England    7.0    2.21
1   Belgium    7.0    2.27
2     Spain    6.0    2.14
3  Slovakia    1.0    0.97

Convert team column to list将团队列转换为列表

teams = df['team'].tolist()
['England', 'Belgium', 'Spain', 'Slovakia']

Next, we generate all possible combinations from the teams list接下来,我们从团队列表中生成所有可能的组合

all_team_combinations = []

for i in range(1, len(teams)):
  all_team_combinations.extend(list(combinations(teams, i)))
  i += 1

Now, we check for price constraints现在,我们检查价格限制

price_threshold = 14
team_combinations_with_price_constraint = [c for c in all_team_combinations if df.loc[df['team'].isin(list(c)), 'price'].sum() == price_threshold]

print(team_combinations_with_price_constraint)

[('England', 'Belgium'), ('England', 'Spain', 'Slovakia'), ('Belgium', 'Spain', 'Slovakia')]

Next, we calculate sum of returns from the combinations with constraint接下来,我们计算带有约束的组合的收益总和

combinations_return_sum = [round(df.loc[df['team'].isin(list(c)), 'return'].sum(), 3) for c in team_combinations_with_price_constraint]

print(combinations_return_sum)
[4.48, 5.32, 5.38]

Finally, use the index of max return sum value to get the desired combination最后,使用max return sum value的索引得到想要的组合

team_combinations_with_price_constraint[combinations_return_sum.index(max(combinations_return_sum))]

which yields产生

('Belgium', 'Spain', 'Slovakia')

For checking the combination-return_sum map you can create a dictionary like this.要检查combination-return_sum map,您可以创建这样的字典。

combination_return_map = dict(zip(team_combinations_with_price_constraint, combinations_return_sum))

print(combination_return_map)

{('England', 'Belgium'): 4.48, ('England', 'Spain', 'Slovakia'): 5.32, ('Belgium', 'Spain', 'Slovakia'): 5.38}

Building on the previous SO solution of subset sum, and using pandas建立在先前的子集和 SO 解决方案的基础上,并使用 pandas

I use pandas to handle indexing data I didn't get if you can pick England twice in your example, but i went ahead as if you could, solving it using pandas and itertools, pandas can be omitted.我使用 pandas 来处理索引数据如果你可以在你的例子中选择两次英格兰,我没有得到,但我继续前进,好像你可以一样,使用 pandas 和 itertools 解决它,Z3A43B4F88325D94022C0EFA9 可以省略。

import pandas as pd
from itertools import product, groupby

# i use pandas to handle indexing data
your_json = [
  { "team": "England", "price": 7.0, "return": 2.21 },
  { "team": "Belgium", "price": 7.0, "return": 2.27 },
  { "team": "Spain", "price": 6.0, "return": 2.14 },
  { "team": "Slovakia", "price": 1.0, "return": 0.97 }
]
your_data = pd.DataFrame(your_json)

#Copied iterator from previous SO solution. #从以前的 SO 解决方案复制迭代器。 It generates values that sum to your target它生成的值总和为您的目标

def subset_sum(numbers, target, partial=[], partial_sum=0):
    if partial_sum == target:
        yield partial
    if partial_sum >= target:
        return
    for i, n in enumerate(numbers):
        remaining = numbers[i + 1:]
        yield from subset_sum(remaining, target, partial + [n], partial_sum + n)

#To get the indexes that match the price values, the solutions values are iterated over: #为了获得与价格值匹配的索引,解决方案值被迭代:

soltion_indexes =[]
for solution_values in subset_sum( your_data.price, 14):
    possible_index= []
    for value in solution_values:
        #indexes that have the right value are added to list of possible indexes for this solution
        possible_index.append( your_data[your_data.price == value].index.tolist() )
    # in order to get all combinations, product from itertools is used
    listed_posible_indexes = list(product(*(possible_index)))
    # if indexes not allready in solution, and it does not contain the same row twince, they are added to sultion indexes. 
    for possible_indexes in  listed_posible_indexes:
        possible_solution_indexes = sorted(list(possible_indexes))
        if possible_solution_indexes not in soltion_indexes and not any(
            possible_solution_indexes.count(x) > 1 for x in possible_solution_indexes) :
            soltion_indexes.append(possible_solution_indexes)

#Then pull the rows for each index in the solution indexes, to create a dataframe with the full rows for your solutions, including return. #然后拉取解决方案索引中每个索引的行,以创建一个 dataframe,其中包含解决方案的完整行,包括返回。

i=0 
all_solutions= pd.DataFrame()
for combinations in soltion_indexes:
    i+=1
    solution = your_data.iloc[combinations]
    solution["solution_number"]= i 
    all_solutions = pd.concat([all_solutions,solution])

#Then the the sum of return for each group is found: #然后求每组的回报之和:

ranked_groups_by_return = all_solutions.groupby("solution_number")['return'].sum().sort_values()

#the the best group is found and printed: #找到并打印最好的组:

best = all_solutions[all_solutions.solution_number == ranked_groups_by_return.index[-1]]
print(best)

       team  price  return  solution_number
1   Belgium    7.0    2.27                3
2     Spain    6.0    2.14                3
3  Slovakia    1.0    0.97                3

We need to iterate all the combinations of size n which is less than the size of the number of elements in the array to capture all possible combinations.我们需要迭代所有大小为n的组合,该组合小于数组中元素数量的大小,以捕获所有可能的组合。 Then just apply your conditions to get the combination with maximum returns.然后只需应用您的条件即可获得最大回报的组合。

from itertools import combinations

data = [
  { "team": "England", "price": 7.0, "return": 2.21 },
  { "team": "Belgium", "price": 7.0, "return": 2.27 },
  { "team": "Spain", "price": 6.0, "return": 2.14 },
  { "team": "Slovakia", "price": 1.0, "return": 0.97 }
]

sum_data = []
COMB_SUM = 14  # Desired combination sum

max_combi = None
max_sum_return = float('-inf')  # Lowest possible value as temporary maximum

for i in range(len(data), 0, -1):  # 4, 3, 2, 1
    combsi = list(combinations(data, i))  # Combinations of size n
    for index, combi in enumerate(combsi):
        if sum(item['price'] for item in combi) == COMB_SUM:
            sum_return = sum(item['return'] for item in combi)
            if sum_return > max_sum_return:
                max_sum_return = sum_return
                max_combi = combi

print(max_combi)
print(max_sum_return)

Output Output

(
    {'team': 'Belgium', 'price': 7.0, 'return': 2.27},
    {'team': 'Spain', 'price': 6.0, 'return': 2.14},
    {'team': 'Slovakia', 'price': 1.0, 'return': 0.97}
)
5.38

Sure, I'd go with something like this当然,我会用这样的东西 go

import numpy.ma as ma
import numpy as np
import pandas as pd

df = pd.DataFrame([
  { "team": "England", "price": 7.0, "return": 2.21 },
  { "team": "Belgium", "price": 7.0, "return": 2.27 },
  { "team": "Spain", "price": 6.0, "return": 2.14 },
  { "team": "Slovakia", "price": 1.0, "return": 0.97 }
])

price_limit = 14
powers_of_two = np.array([1<<n for n in range(len(df))])
combinations = (np.arange(2**len(df))[:, None] & powers_of_two)[1:].astype(bool)

prices = ma.masked_array(np.tile(df.price, (len(combinations), 1)), mask=~combinations)
valid_combinations = (prices.sum(axis=-1) == price_limit)

returns = ma.masked_array(np.tile(df["return"], (len(combinations), 1)), mask=~(valid_combinations[:, None] & combinations))

best = np.argmax(returns.sum(axis=-1))

print(f"Best combination (price={df['price'][combinations[best]].sum():.0f}): {' + '.join(df.team[combinations[best]].to_list())} = {df['return'][combinations[best]].sum():.2f}")
# prints: Best combination (price=14): Belgium + Spain + Slovakia = 5.38

This is a bit liberal in terms of memory usage, but that can be improved by simply restriding df.price and df.return instead of tiling them就 memory 的使用而言,这有点自由,但这可以通过简单地重新设置df.pricedf.return而不是平铺它们来改善

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 给定一个具有最大和的数组,找到两个产生最大和但不超过最大值的数字 - Given an array with maximum sum, locate two number which produce the maximum sum without exceeding the maximum 总和为 Python 中给定数字的最小行数 - The minimum number of rows that sum to a given number in Python 从给定数量的最大位数和Python中的小数位数创建最大可能的十进制数 - Creating the maximum possible decimal number from a given number of max digits and decimal places in Python 遍历列表以获取 python 中给定范围的最大总和 - loop through the list to get Maximum sum for a given range in python 数字中的最大交替总和 - Maximum alternating sum in a number 来自2个范围的2个元素的总和将是一个给定的数字 - Sum of 2 elements from 2 ranges that will be one given number 如何解决调用 Python 对象时超出最大递归深度的问题 - how to solve maximum recursion depth exceeded while calling a Python object 如何在 python 的给定范围内从给定的 function 中找到最大值? - How to find maximum from given function with given range in python? 如何解决以下代码? python 中给定间隔之间的阿姆斯壮数 - How to solve the below code!? Armstrong number between given intervals in python Python 的最大子数组和 - Maximum subarray sum for Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM