简体   繁体   中英

How to drop specific rows a DataFrame to generate a nested JSON

I am currently working on a d3 treemap which require a nested json as a entry, I succeded in organizing my df and generating the json but some of my treemap rectangle are 30x bigger than other so I decided to drop the rows that generate this rectangle.

My function dropSmall() iterate in my columns and my rows to verify for each groupby if the sum is 30x smaller than the max sum I am struggling with updating the df either using a drop or affecting the value that match Here is my code:

def dropSmall(df):
    list = []
    for i in df.columns: #b, c, z ..
        if i != 'valeur' and i!='unite':
            list.append(i)
            # iterating on rows
            for j in range(df.groupby(list).sum().shape[0]): 
                myMax = df.groupby(list).sum().iloc[:, 0].max() / 30
                myJ = df.groupby(list).sum().iloc[:, 0][j]
                myDf = df.groupby(list).sum().iloc[:, 0]
                if myJ <= myMax:
                    df = df[myDf['value']>=  myMax]

and my groupby look like this


          name          b   c   z   l   sL  value       unit
3099    Myindicator     1   1   3   NA  NA  129.74      kg
3100                                    1   44929.74    kg
3101                                    2   5174.74     kg
3110                    3   1   3   1   NA  2497.66     kg
3156                                2   NA  29.43       kg
3222                                3   NA  304.81      kg


For the example of the firt row when b=1 c=1 z=3 l=NA I want to verify while iterating on the 3 sL that the value of the sL is > 30x the max of this sum and for this case drop the row when value = 129

My function verify the condition but I don't know how to drop the row from my initial df not df.groupby('list').sum()

Example of the ungrouped df for the first row

        name        Continent  Region   Country   State   City    Borough  Value       Unit
1000    Myindicator     1        1        3        1      1         1      53.86      kg

[EDIT FROM HERE]

My cutoff multiplier here is 2 There is a max for each hierarchy

                                            Value
name        Continent Region Country State       
Myindicator 1         1      1       7         50[MAX]
                                     8         30 
                             2       5         70[MAX]
                                     6         30 *
                             3       1         50[MAX]
                             4       5        200[MAX]
                                     6        150 
                             5       1        300[MAX]
                                     6        160
                                     7        100*
                                     8         50*
                                     9         50*
                      2      4       9        100[MAX]
                                     10        40 *
                             5       3         80[MAX]
                                     11        20 *
                             6       2         10[MAX]
                      3      7       12       100[MAX]


In this example you won't drop region 2 country 6 state 2 because it's the only row for this region>country>state and it's at the same time the max

Hope this is clearer

So I am not 100% clear on what your input looks like, or what you want back, but if I understand correctly I think the following would work.

EDITED FROM HERE

EDIT2 : Added stars ( * ) to indicate which rows are getting dropped.

EDIT3 : Changed function due to the way assignment and copies work with pandas.DataFrame

A function to do the process:

def drop_small(dfcop, cutoff_multiplier):
    # Create copy of dataframe so we don't alter the original
    df=dfcop.copy(deep=True)
    # Group on all columns except 'Value' and 'Unit'
    grp_cols = [i for i in df.columns if i not in ['Value', 'Unit']]
    groupers = [grp_cols[:i+1] for i in range(len(grp_cols))]
    print(groupers)
    #loop through all hierarchical groupings
    for grp in groupers:
        print(f"Grouping on {grp}")
        # Add a column with the group sums to the dataframe
        df['gsum'] = df.groupby(grp)['Value'].transform('sum')
        # Compute the max of the parent group - don't do this if we are grouping by a single field
        if len(grp) > 1:
            df['gmax'] = df.groupby(grp[:-1])['gsum'].transform(lambda x: max(x)/cutoff_multiplier)
        else:
            df['gmax'] = df.gsum.max()/cutoff_multiplier
        print("Grouped sums and cutoffs for this hierarchy:")
        print(df)
        # Drop all rows where the group sum is less than the cutoff mulitplier of the max
        idexs = df[df.gsum < df.gmax].index
        df = df[df.gsum >= df.gmax]
        print('Indexes dropped:')
        print(','.join([str(i) for i in idexs]))
        # Remove the group sum column
        df.drop(['gsum', 'gmax'], axis=1, inplace=True)
    return df

Here's how it works for the example table.

           name  Continent  Region  Country  State  Value Unit
0   Myindicator          1       1        3      1     50   kg
1   Myindicator          1       1        3      4     50   kg
2   Myindicator          1       1        2      5     20   kg
3   Myindicator          1       1        2      5     50   kg
4   Myindicator          1       1        2      6     30   kg
5   Myindicator          1       1        1      7     50   kg
6   Myindicator          1       1        1      8     20   kg
7   Myindicator          1       2        4      9     50   kg
8   Myindicator          1       2        4      9     50   kg
9   Myindicator          1       2        4     10     40   kg
10  Myindicator          1       2        5     11     20   kg
11  Myindicator          1       2        5      3     40   kg
12  Myindicator          1       2        5      3     40   kg
13  Myindicator          1       2        6      2     10   kg
14  Myindicator          1       3        7     12     50   kg
15  Myindicator          1       3        7     12     50   kg
16  Myindicator          1       3        8     14     15   kg
17  Myindicator          1       3        8     14     15   kg
18  Myindicator          1       3        8     13     15   kg
19  Myindicator          1       3        8     13      1   kg
20  Myindicator          1       4        9     15     10   kg
21  Myindicator          1       4        9     16     10   kg

Grouping on ['name'] Grouped sums and cutoffs for this hierarchy:

           name  Continent  Region  Country  State  Value Unit  gsum  gmax
0   Myindicator          1       1        3      1     50   kg   686   343
1   Myindicator          1       1        3      4     50   kg   686   343
2   Myindicator          1       1        2      5     20   kg   686   343
3   Myindicator          1       1        2      5     50   kg   686   343
4   Myindicator          1       1        2      6     30   kg   686   343
5   Myindicator          1       1        1      7     50   kg   686   343
6   Myindicator          1       1        1      8     20   kg   686   343
7   Myindicator          1       2        4      9     50   kg   686   343
8   Myindicator          1       2        4      9     50   kg   686   343
9   Myindicator          1       2        4     10     40   kg   686   343
10  Myindicator          1       2        5     11     20   kg   686   343
11  Myindicator          1       2        5      3     40   kg   686   343
12  Myindicator          1       2        5      3     40   kg   686   343
13  Myindicator          1       2        6      2     10   kg   686   343
14  Myindicator          1       3        7     12     50   kg   686   343
15  Myindicator          1       3        7     12     50   kg   686   343
16  Myindicator          1       3        8     14     15   kg   686   343
17  Myindicator          1       3        8     14     15   kg   686   343
18  Myindicator          1       3        8     13     15   kg   686   343
19  Myindicator          1       3        8     13      1   kg   686   343
20  Myindicator          1       4        9     15     10   kg   686   343
21  Myindicator          1       4        9     16     10   kg   686   343

Indexes dropped: None

Grouping on ['name', 'Continent'] Grouped sums and cutoffs for this hierarchy:

           name  Continent  Region  Country  State  Value Unit  gsum  gmax
0   Myindicator          1       1        3      1     50   kg   686   343
1   Myindicator          1       1        3      4     50   kg   686   343
2   Myindicator          1       1        2      5     20   kg   686   343
3   Myindicator          1       1        2      5     50   kg   686   343
4   Myindicator          1       1        2      6     30   kg   686   343
5   Myindicator          1       1        1      7     50   kg   686   343
6   Myindicator          1       1        1      8     20   kg   686   343
7   Myindicator          1       2        4      9     50   kg   686   343
8   Myindicator          1       2        4      9     50   kg   686   343
9   Myindicator          1       2        4     10     40   kg   686   343
10  Myindicator          1       2        5     11     20   kg   686   343
11  Myindicator          1       2        5      3     40   kg   686   343
12  Myindicator          1       2        5      3     40   kg   686   343
13  Myindicator          1       2        6      2     10   kg   686   343
14  Myindicator          1       3        7     12     50   kg   686   343
15  Myindicator          1       3        7     12     50   kg   686   343
16  Myindicator          1       3        8     14     15   kg   686   343
17  Myindicator          1       3        8     14     15   kg   686   343
18  Myindicator          1       3        8     13     15   kg   686   343
19  Myindicator          1       3        8     13      1   kg   686   343
20  Myindicator          1       4        9     15     10   kg   686   343
21  Myindicator          1       4        9     16     10   kg   686   343

Indexes dropped: None

Grouping on ['name', 'Continent', 'Region'] Grouped sums and cutoffs for this hierarchy:

           name  Continent  Region  Country  State  Value Unit  gsum  gmax
0   Myindicator          1       1        3      1     50   kg   270   135
1   Myindicator          1       1        3      4     50   kg   270   135
2   Myindicator          1       1        2      5     20   kg   270   135
3   Myindicator          1       1        2      5     50   kg   270   135
4   Myindicator          1       1        2      6     30   kg   270   135
5   Myindicator          1       1        1      7     50   kg   270   135
6   Myindicator          1       1        1      8     20   kg   270   135
7   Myindicator          1       2        4      9     50   kg   250   135
8   Myindicator          1       2        4      9     50   kg   250   135
9   Myindicator          1       2        4     10     40   kg   250   135
10  Myindicator          1       2        5     11     20   kg   250   135
11  Myindicator          1       2        5      3     40   kg   250   135
12  Myindicator          1       2        5      3     40   kg   250   135
13  Myindicator          1       2        6      2     10   kg   250   135
14  Myindicator          1       3        7     12     50   kg   146   135
15  Myindicator          1       3        7     12     50   kg   146   135
16  Myindicator          1       3        8     14     15   kg   146   135
17  Myindicator          1       3        8     14     15   kg   146   135
18  Myindicator          1       3        8     13     15   kg   146   135
19  Myindicator          1       3        8     13      1   kg   146   135
20  Myindicator          1       4        9     15     10   kg    20   135 *
21  Myindicator          1       4        9     16     10   kg    20   135 *

Indexes dropped: 20,21

Grouping on ['name', 'Continent', 'Region', 'Country'] Grouped sums and cutoffs for this hierarchy:

           name  Continent  Region  Country  State  Value Unit  gsum  gmax
0   Myindicator          1       1        3      1     50   kg   100    50
1   Myindicator          1       1        3      4     50   kg   100    50
2   Myindicator          1       1        2      5     20   kg   100    50
3   Myindicator          1       1        2      5     50   kg   100    50
4   Myindicator          1       1        2      6     30   kg   100    50
5   Myindicator          1       1        1      7     50   kg    70    50
6   Myindicator          1       1        1      8     20   kg    70    50
7   Myindicator          1       2        4      9     50   kg   140    70
8   Myindicator          1       2        4      9     50   kg   140    70
9   Myindicator          1       2        4     10     40   kg   140    70
10  Myindicator          1       2        5     11     20   kg   100    70
11  Myindicator          1       2        5      3     40   kg   100    70
12  Myindicator          1       2        5      3     40   kg   100    70
13  Myindicator          1       2        6      2     10   kg    10    70 *
14  Myindicator          1       3        7     12     50   kg   100    50
15  Myindicator          1       3        7     12     50   kg   100    50
16  Myindicator          1       3        8     14     15   kg    46    50 *
17  Myindicator          1       3        8     14     15   kg    46    50 *
18  Myindicator          1       3        8     13     15   kg    46    50 *
19  Myindicator          1       3        8     13      1   kg    46    50 *

Indexes dropped: 13,16,17,18,19

Grouping on ['name', 'Continent', 'Region', 'Country', 'State'] Grouped sums and cutoffs for this hierarchy:

           name  Continent  Region  Country  State  Value Unit  gsum  gmax
0   Myindicator          1       1        3      1     50   kg    50    25
1   Myindicator          1       1        3      4     50   kg    50    25
2   Myindicator          1       1        2      5     20   kg    70    35
3   Myindicator          1       1        2      5     50   kg    70    35
4   Myindicator          1       1        2      6     30   kg    30    35 *
5   Myindicator          1       1        1      7     50   kg    50    25
6   Myindicator          1       1        1      8     20   kg    20    25 *
7   Myindicator          1       2        4      9     50   kg   100    50
8   Myindicator          1       2        4      9     50   kg   100    50
9   Myindicator          1       2        4     10     40   kg    40    50 *
10  Myindicator          1       2        5     11     20   kg    20    40 *
11  Myindicator          1       2        5      3     40   kg    80    40
12  Myindicator          1       2        5      3     40   kg    80    40
14  Myindicator          1       3        7     12     50   kg   100    50
15  Myindicator          1       3        7     12     50   kg   100    50

Indexes dropped: 4,6,9,10

Final table:

           name  Continent  Region  Country  State  Value Unit
0   Myindicator          1       1        3      1     50   kg
1   Myindicator          1       1        3      4     50   kg
2   Myindicator          1       1        2      5     20   kg
3   Myindicator          1       1        2      5     50   kg
5   Myindicator          1       1        1      7     50   kg
7   Myindicator          1       2        4      9     50   kg
8   Myindicator          1       2        4      9     50   kg
11  Myindicator          1       2        5      3     40   kg
12  Myindicator          1       2        5      3     40   kg
14  Myindicator          1       3        7     12     50   kg
15  Myindicator          1       3        7     12     50   kg

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM