After process the data, I have a batch of rows with the format
(u'378491520468_sale', {'price': 2100000, 'built': 3815})
(u'378491119.1537520468_sale', {'price': 2100000, 'built': 3815})
(u'1306084076.1535728358_rent', {'price': 1400, 'built': 1109})
(u'1303342766.1548320090_sale', {'price': 550, 'built': 1200})
(u'1890530682.1515660872_sale', {'price': 130000, 'built': 759})
(u'8212134.1548317851_rent', {'price': 2900, 'built': 1220})
(u'1170655463.1513653914_sale', {'price': 430000, 'built': 1142})
(u'58676746.1548308550_sale', {'price': 1700000, 'built': 3000})
(u'1162578480.1474216313_sale', {'price': 10000000, 'built': 3})
(u'1860145003.1546594155_rent', {'price': 4200, 'built': 839})
(u'1640943061.1489124089_sale', {'price': 710000, 'built': 1600})
(u'1008351255.1547539066_rent', {'price': 15000, 'built': 8400})
(u'903442891.1547795833_sale', {'price': 148000, 'built': 786})
where the first element in the set is the unique ID.
I know about the basic combineFn class that able to group (key, value) and count the min, max and average in a fixed window. But with a dictionary as value, I need some guidance to compute them with a format of:
("the_unique_id", {
"price":{
"min": 0,
"max": 0,
"average": 0
},
"built": {
"min": 0,
"max": 0,
"average": 0
}
), ...
If you can get the data into the form below, here is a way to calculate the aggregate values:
import pandas as pd
data = {'ID': [u'378491520468_sale', u'378491119.1537520468_sale', u'1306084076.1535728358_rent'],
'price': [2100000, 2100000, 1400],
'built': [3815, 3815, 1109]}
df = pd.DataFrame(data)
aggregates = {
'price': ['min', 'max', 'mean'],
'built': ['min', 'max', 'mean'],
}
df = df.groupby('ID').agg(aggregates)
res = []
for i in range(len(df)):
row = df.iloc[i]
res.append((row.name,
{'price': {'min': row['price']['min'],
'max': row['price']['max'],
'average': row['price']['mean']},
'built': {'min': row['built']['min'],
'max': row['built']['max'],
'average': row['built']['mean']}}))
print(res)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.