I am currently working on a d3 treemap which require a nested json as a entry, I succeded in organizing my df and generating the json but some of my treemap rectangle are 30x bigger than other so I decided to drop the rows that generate this rectangle.
My function dropSmall()
iterate in my columns and my rows to verify for each groupby if the sum is 30x smaller than the max sum I am struggling with updating the df either using a drop or affecting the value that match Here is my code:
def dropSmall(df):
list = []
for i in df.columns: #b, c, z ..
if i != 'valeur' and i!='unite':
list.append(i)
# iterating on rows
for j in range(df.groupby(list).sum().shape[0]):
myMax = df.groupby(list).sum().iloc[:, 0].max() / 30
myJ = df.groupby(list).sum().iloc[:, 0][j]
myDf = df.groupby(list).sum().iloc[:, 0]
if myJ <= myMax:
df = df[myDf['value']>= myMax]
and my groupby look like this
name b c z l sL value unit
3099 Myindicator 1 1 3 NA NA 129.74 kg
3100 1 44929.74 kg
3101 2 5174.74 kg
3110 3 1 3 1 NA 2497.66 kg
3156 2 NA 29.43 kg
3222 3 NA 304.81 kg
For the example of the firt row when b=1 c=1 z=3 l=NA I want to verify while iterating on the 3 sL that the value of the sL is > 30x the max of this sum and for this case drop the row when value = 129
My function verify the condition but I don't know how to drop the row from my initial df not df.groupby('list').sum()
Example of the ungrouped df for the first row
name Continent Region Country State City Borough Value Unit
1000 Myindicator 1 1 3 1 1 1 53.86 kg
[EDIT FROM HERE]
My cutoff multiplier here is 2 There is a max for each hierarchy
Value
name Continent Region Country State
Myindicator 1 1 1 7 50[MAX]
8 30
2 5 70[MAX]
6 30 *
3 1 50[MAX]
4 5 200[MAX]
6 150
5 1 300[MAX]
6 160
7 100*
8 50*
9 50*
2 4 9 100[MAX]
10 40 *
5 3 80[MAX]
11 20 *
6 2 10[MAX]
3 7 12 100[MAX]
In this example you won't drop region 2 country 6 state 2 because it's the only row for this region>country>state and it's at the same time the max
Hope this is clearer
So I am not 100% clear on what your input looks like, or what you want back, but if I understand correctly I think the following would work.
EDITED FROM HERE
EDIT2 : Added stars ( *
) to indicate which rows are getting dropped.
EDIT3 : Changed function due to the way assignment and copies work with pandas.DataFrame
A function to do the process:
def drop_small(dfcop, cutoff_multiplier):
# Create copy of dataframe so we don't alter the original
df=dfcop.copy(deep=True)
# Group on all columns except 'Value' and 'Unit'
grp_cols = [i for i in df.columns if i not in ['Value', 'Unit']]
groupers = [grp_cols[:i+1] for i in range(len(grp_cols))]
print(groupers)
#loop through all hierarchical groupings
for grp in groupers:
print(f"Grouping on {grp}")
# Add a column with the group sums to the dataframe
df['gsum'] = df.groupby(grp)['Value'].transform('sum')
# Compute the max of the parent group - don't do this if we are grouping by a single field
if len(grp) > 1:
df['gmax'] = df.groupby(grp[:-1])['gsum'].transform(lambda x: max(x)/cutoff_multiplier)
else:
df['gmax'] = df.gsum.max()/cutoff_multiplier
print("Grouped sums and cutoffs for this hierarchy:")
print(df)
# Drop all rows where the group sum is less than the cutoff mulitplier of the max
idexs = df[df.gsum < df.gmax].index
df = df[df.gsum >= df.gmax]
print('Indexes dropped:')
print(','.join([str(i) for i in idexs]))
# Remove the group sum column
df.drop(['gsum', 'gmax'], axis=1, inplace=True)
return df
Here's how it works for the example table.
name Continent Region Country State Value Unit
0 Myindicator 1 1 3 1 50 kg
1 Myindicator 1 1 3 4 50 kg
2 Myindicator 1 1 2 5 20 kg
3 Myindicator 1 1 2 5 50 kg
4 Myindicator 1 1 2 6 30 kg
5 Myindicator 1 1 1 7 50 kg
6 Myindicator 1 1 1 8 20 kg
7 Myindicator 1 2 4 9 50 kg
8 Myindicator 1 2 4 9 50 kg
9 Myindicator 1 2 4 10 40 kg
10 Myindicator 1 2 5 11 20 kg
11 Myindicator 1 2 5 3 40 kg
12 Myindicator 1 2 5 3 40 kg
13 Myindicator 1 2 6 2 10 kg
14 Myindicator 1 3 7 12 50 kg
15 Myindicator 1 3 7 12 50 kg
16 Myindicator 1 3 8 14 15 kg
17 Myindicator 1 3 8 14 15 kg
18 Myindicator 1 3 8 13 15 kg
19 Myindicator 1 3 8 13 1 kg
20 Myindicator 1 4 9 15 10 kg
21 Myindicator 1 4 9 16 10 kg
Grouping on ['name']
Grouped sums and cutoffs for this hierarchy:
name Continent Region Country State Value Unit gsum gmax
0 Myindicator 1 1 3 1 50 kg 686 343
1 Myindicator 1 1 3 4 50 kg 686 343
2 Myindicator 1 1 2 5 20 kg 686 343
3 Myindicator 1 1 2 5 50 kg 686 343
4 Myindicator 1 1 2 6 30 kg 686 343
5 Myindicator 1 1 1 7 50 kg 686 343
6 Myindicator 1 1 1 8 20 kg 686 343
7 Myindicator 1 2 4 9 50 kg 686 343
8 Myindicator 1 2 4 9 50 kg 686 343
9 Myindicator 1 2 4 10 40 kg 686 343
10 Myindicator 1 2 5 11 20 kg 686 343
11 Myindicator 1 2 5 3 40 kg 686 343
12 Myindicator 1 2 5 3 40 kg 686 343
13 Myindicator 1 2 6 2 10 kg 686 343
14 Myindicator 1 3 7 12 50 kg 686 343
15 Myindicator 1 3 7 12 50 kg 686 343
16 Myindicator 1 3 8 14 15 kg 686 343
17 Myindicator 1 3 8 14 15 kg 686 343
18 Myindicator 1 3 8 13 15 kg 686 343
19 Myindicator 1 3 8 13 1 kg 686 343
20 Myindicator 1 4 9 15 10 kg 686 343
21 Myindicator 1 4 9 16 10 kg 686 343
Indexes dropped: None
Grouping on ['name', 'Continent']
Grouped sums and cutoffs for this hierarchy:
name Continent Region Country State Value Unit gsum gmax
0 Myindicator 1 1 3 1 50 kg 686 343
1 Myindicator 1 1 3 4 50 kg 686 343
2 Myindicator 1 1 2 5 20 kg 686 343
3 Myindicator 1 1 2 5 50 kg 686 343
4 Myindicator 1 1 2 6 30 kg 686 343
5 Myindicator 1 1 1 7 50 kg 686 343
6 Myindicator 1 1 1 8 20 kg 686 343
7 Myindicator 1 2 4 9 50 kg 686 343
8 Myindicator 1 2 4 9 50 kg 686 343
9 Myindicator 1 2 4 10 40 kg 686 343
10 Myindicator 1 2 5 11 20 kg 686 343
11 Myindicator 1 2 5 3 40 kg 686 343
12 Myindicator 1 2 5 3 40 kg 686 343
13 Myindicator 1 2 6 2 10 kg 686 343
14 Myindicator 1 3 7 12 50 kg 686 343
15 Myindicator 1 3 7 12 50 kg 686 343
16 Myindicator 1 3 8 14 15 kg 686 343
17 Myindicator 1 3 8 14 15 kg 686 343
18 Myindicator 1 3 8 13 15 kg 686 343
19 Myindicator 1 3 8 13 1 kg 686 343
20 Myindicator 1 4 9 15 10 kg 686 343
21 Myindicator 1 4 9 16 10 kg 686 343
Indexes dropped: None
Grouping on ['name', 'Continent', 'Region']
Grouped sums and cutoffs for this hierarchy:
name Continent Region Country State Value Unit gsum gmax
0 Myindicator 1 1 3 1 50 kg 270 135
1 Myindicator 1 1 3 4 50 kg 270 135
2 Myindicator 1 1 2 5 20 kg 270 135
3 Myindicator 1 1 2 5 50 kg 270 135
4 Myindicator 1 1 2 6 30 kg 270 135
5 Myindicator 1 1 1 7 50 kg 270 135
6 Myindicator 1 1 1 8 20 kg 270 135
7 Myindicator 1 2 4 9 50 kg 250 135
8 Myindicator 1 2 4 9 50 kg 250 135
9 Myindicator 1 2 4 10 40 kg 250 135
10 Myindicator 1 2 5 11 20 kg 250 135
11 Myindicator 1 2 5 3 40 kg 250 135
12 Myindicator 1 2 5 3 40 kg 250 135
13 Myindicator 1 2 6 2 10 kg 250 135
14 Myindicator 1 3 7 12 50 kg 146 135
15 Myindicator 1 3 7 12 50 kg 146 135
16 Myindicator 1 3 8 14 15 kg 146 135
17 Myindicator 1 3 8 14 15 kg 146 135
18 Myindicator 1 3 8 13 15 kg 146 135
19 Myindicator 1 3 8 13 1 kg 146 135
20 Myindicator 1 4 9 15 10 kg 20 135 *
21 Myindicator 1 4 9 16 10 kg 20 135 *
Indexes dropped: 20,21
Grouping on ['name', 'Continent', 'Region', 'Country']
Grouped sums and cutoffs for this hierarchy:
name Continent Region Country State Value Unit gsum gmax
0 Myindicator 1 1 3 1 50 kg 100 50
1 Myindicator 1 1 3 4 50 kg 100 50
2 Myindicator 1 1 2 5 20 kg 100 50
3 Myindicator 1 1 2 5 50 kg 100 50
4 Myindicator 1 1 2 6 30 kg 100 50
5 Myindicator 1 1 1 7 50 kg 70 50
6 Myindicator 1 1 1 8 20 kg 70 50
7 Myindicator 1 2 4 9 50 kg 140 70
8 Myindicator 1 2 4 9 50 kg 140 70
9 Myindicator 1 2 4 10 40 kg 140 70
10 Myindicator 1 2 5 11 20 kg 100 70
11 Myindicator 1 2 5 3 40 kg 100 70
12 Myindicator 1 2 5 3 40 kg 100 70
13 Myindicator 1 2 6 2 10 kg 10 70 *
14 Myindicator 1 3 7 12 50 kg 100 50
15 Myindicator 1 3 7 12 50 kg 100 50
16 Myindicator 1 3 8 14 15 kg 46 50 *
17 Myindicator 1 3 8 14 15 kg 46 50 *
18 Myindicator 1 3 8 13 15 kg 46 50 *
19 Myindicator 1 3 8 13 1 kg 46 50 *
Indexes dropped: 13,16,17,18,19
Grouping on ['name', 'Continent', 'Region', 'Country', 'State']
Grouped sums and cutoffs for this hierarchy:
name Continent Region Country State Value Unit gsum gmax
0 Myindicator 1 1 3 1 50 kg 50 25
1 Myindicator 1 1 3 4 50 kg 50 25
2 Myindicator 1 1 2 5 20 kg 70 35
3 Myindicator 1 1 2 5 50 kg 70 35
4 Myindicator 1 1 2 6 30 kg 30 35 *
5 Myindicator 1 1 1 7 50 kg 50 25
6 Myindicator 1 1 1 8 20 kg 20 25 *
7 Myindicator 1 2 4 9 50 kg 100 50
8 Myindicator 1 2 4 9 50 kg 100 50
9 Myindicator 1 2 4 10 40 kg 40 50 *
10 Myindicator 1 2 5 11 20 kg 20 40 *
11 Myindicator 1 2 5 3 40 kg 80 40
12 Myindicator 1 2 5 3 40 kg 80 40
14 Myindicator 1 3 7 12 50 kg 100 50
15 Myindicator 1 3 7 12 50 kg 100 50
Indexes dropped: 4,6,9,10
Final table:
name Continent Region Country State Value Unit
0 Myindicator 1 1 3 1 50 kg
1 Myindicator 1 1 3 4 50 kg
2 Myindicator 1 1 2 5 20 kg
3 Myindicator 1 1 2 5 50 kg
5 Myindicator 1 1 1 7 50 kg
7 Myindicator 1 2 4 9 50 kg
8 Myindicator 1 2 4 9 50 kg
11 Myindicator 1 2 5 3 40 kg
12 Myindicator 1 2 5 3 40 kg
14 Myindicator 1 3 7 12 50 kg
15 Myindicator 1 3 7 12 50 kg
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.