I have a list of dictionaries, which looks like this:
_input = [{'cumulated_quantity': 30, 'price': 7000, 'quantity': 30},
{'cumulated_quantity': 80, 'price': 7002, 'quantity': 50},
{'cumulated_quantity': 130, 'price': 7010, 'quantity': 50},
{'cumulated_quantity': 330, 'price': 7050, 'quantity': 200},
{'cumulated_quantity': 400, 'price': 7065, 'quantity': 70}]
I would like to group the dictionary in bins of quantity 100, where the price is calculated as a weighted average. The result should look like this:
result = [{'cumulated_quantity': 100, 'price': 7003, 'quantity': 100},
{'cumulated_quantity': 200, 'price': 7038, 'quantity': 100},
{'cumulated_quantity': 300, 'price': 7050, 'quantity': 100},
{'cumulated_quantity': 400, 'price': 7060.5, 'quantity': 100}]
The weighted averages, in the result dictionary are calculated as follows:
7003 = (30*7000+50*7002+20*7010)/100
7038 = (30*7010+70*7050)/100
7050 = 100*7050/100
7060.5 = (30*7050+70*7065)/100
I managed to receive the result, by utilising pandas dataframes, however their performance is way too slow (about 0.5 seconds). Is there a fast method to do this in python?
Not using pandas, it's nearly instantaneous by doing it yourself:
result = []
cumulative_quantity = 0
bucket = {'price': 0.0, 'quantity': 0}
for dct in lst:
dct_quantity = dct['quantity'] # enables non-destructive decrementing
while dct_quantity > 0:
if bucket['quantity'] == 100:
bucket['cumulative_quantity'] = cumulative_quantity
result.append(bucket)
bucket = {'price': 0.0, 'quantity': 0}
added_quantity = min([dct_quantity, 100 - bucket['quantity']])
bucket['price'] = (bucket['price'] * bucket['quantity'] + dct['price'] * added_quantity) / (bucket['quantity'] + added_quantity)
dct_quantity -= added_quantity
bucket['quantity'] += added_quantity
cumulative_quantity += added_quantity
if bucket['quantity'] != 0:
bucket['cumulative_quantity'] = cumulative_quantity
result.append(bucket)
Gives
>>> result
[{'cumulative_quantity': 100, 'price': 7003.0, 'quantity': 100},
{'cumulative_quantity': 200, 'price': 7038.0, 'quantity': 100},
{'cumulative_quantity': 300, 'price': 7050.0, 'quantity': 100},
{'cumulative_quantity': 400, 'price': 7060.5, 'quantity': 100}]
This can be done linearly, as O(p), where p is the number of parts (equivalent to O(n * k) where k is the average number of pieces each dict must be split into (in your example k = 1.6)).
BIN_SIZE = 100
cum_quantity = 0
value = 0.
bin_quantity = 0
bin_value = 0
results = []
for record in _input:
price, quantity = record['price'], record['quantity']
while quantity:
prior_quantity = bin_quantity
bin_quantity = min(BIN_SIZE, bin_quantity + quantity)
quantity_delta = bin_quantity - prior_quantity
bin_value += quantity_delta * price
quantity -= quantity_delta
if bin_quantity == BIN_SIZE:
avg_price = bin_value / float(BIN_SIZE)
cum_quantity += BIN_SIZE
bin_quantity = bin_value = 0 # Reset bin values.
results.append({'cumulated_quantity': cum_quantity,
'price': avg_price,
'quantity': BIN_SIZE})
# Add stub for anything left in remaining bin (optional).
if bin_quantity:
results.append({'cumulated_quantity': cum_quantity + bin_quantity,
'price': bin_value / float(bin_quantity),
'quantity': bin_quantity})
>>> results
[{'cumulated_quantity': 100, 'price': 7003.0, 'quantity': 100},
{'cumulated_quantity': 200, 'price': 7038.0, 'quantity': 100},
{'cumulated_quantity': 300, 'price': 7050.0, 'quantity': 100},
{'cumulated_quantity': 400, 'price': 7060.5, 'quantity': 100}]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.