简体   繁体   中英

Inserting default rows into Pandas Dataframe based on condition/missing data

I have a dataframe that looks like this:

import pandas as pd

data = {'TABLE_NM': ['TABLE_A', 'TABLE_A', 'TABLE_A', 'TABLE_A',
                     'TABLE_B', 'TABLE_B', 'TABLE_B',
                     'TABLE_C', 'TABLE_C', 'TABLE_C', 'TABLE_C'
                     ],
        'TEST_TABLE_NM': ['TEST_TABLE_A', 'TEST_TABLE_A', 'TEST_TABLE_A', 'TEST_TABLE_A',
                     'TEST_TABLE_B', 'TEST_TABLE_B', 'TEST_TABLE_B',
                     'TEST_TABLE_C', 'TEST_TABLE_C', 'TEST_TABLE_C', 'TEST_TABLE_C'],
        'TYPE': ['TEST1', 'TEST2', 'TEST3', 'TEST4', 'TEST1', 'TEST2', 'TEST3',
                 'TEST1', 'TEST2', 'TEST3', 'TEST4'],
        'RESULTS': [1005,560,2000,2000,1005,560,2000,1005,560,135,55]
        }

df = pd.DataFrame(data, columns=['TABLE_NM', 'TEST_TABLE_NM', 'TYPE', 'RESULTS'])

Which results in this:

   TABLE_NM TEST_TABLE_NM   TYPE  RESULTS
0   TABLE_A  TEST_TABLE_A  TEST1     1005
1   TABLE_A  TEST_TABLE_A  TEST2      560
2   TABLE_A  TEST_TABLE_A  TEST3     2000
3   TABLE_A  TEST_TABLE_A  TEST4     2000
4   TABLE_B  TEST_TABLE_B  TEST1     1005
5   TABLE_B  TEST_TABLE_B  TEST2      560
6   TABLE_B  TEST_TABLE_B  TEST3     2000
7   TABLE_C  TEST_TABLE_C  TEST1     1005
8   TABLE_C  TEST_TABLE_C  TEST2      560
9   TABLE_C  TEST_TABLE_C  TEST3      135
10  TABLE_C  TEST_TABLE_C  TEST4       55

There are hundreds of TABLE_NM/TEST_TABLE_NM combinations in reality, each of them should be associated to 4 tests. Some however, only have 3 tests associated to them as you can see above with TABLE_B.
What I want to do is for every TABLE_NM AND TEST_TABLE_NM combo, if there is NO 'TEST4' listed, I want to insert a dummy row into the dataframe after the 'TEST3' row, which has 'TEST4' listed as 'Type' and 0 listed as the 'RESULT'. So the above dataframe would then look like this instead:

     TABLE_NM TEST_TABLE_NM   TYPE  RESULTS
0   TABLE_A  TEST_TABLE_A  TEST1     1005
1   TABLE_A  TEST_TABLE_A  TEST2      560
2   TABLE_A  TEST_TABLE_A  TEST3     2000
3   TABLE_A  TEST_TABLE_A  TEST4     2000
4   TABLE_B  TEST_TABLE_B  TEST1     1005
5   TABLE_B  TEST_TABLE_B  TEST2      560
6   TABLE_B  TEST_TABLE_B  TEST3     2000
7   TABLE_B  TEST_TABLE_B  TEST4        0
8   TABLE_C  TEST_TABLE_C  TEST1     1005
9   TABLE_C  TEST_TABLE_C  TEST2      560
10  TABLE_C  TEST_TABLE_C  TEST3      135
11  TABLE_C  TEST_TABLE_C  TEST4       55

Any ideas on how this could be achieved?

You can chain pivot table to get all columns with all rows, fillna to fill zeros for missing data, stack to get the columns back to rows, and reset the index (you can skip this step to get a multiindex of table/test_table)

df=df.pivot_table(index=['TABLE_NM','TEST_TABLE_NM'], columns=['TYPE']).fillna(0).stack().reset_index()

    TABLE_NM    TEST_TABLE_NM   TYPE    RESULTS
0   TABLE_A     TEST_TABLE_A    TEST1   1005.0
1   TABLE_A     TEST_TABLE_A    TEST2   560.0
2   TABLE_A     TEST_TABLE_A    TEST3   2000.0
3   TABLE_A     TEST_TABLE_A    TEST4   2000.0
4   TABLE_B     TEST_TABLE_B    TEST1   1005.0
5   TABLE_B     TEST_TABLE_B    TEST2   560.0
6   TABLE_B     TEST_TABLE_B    TEST3   2000.0
7   TABLE_B     TEST_TABLE_B    TEST4   0.0
8   TABLE_C     TEST_TABLE_C    TEST1   1005.0
9   TABLE_C     TEST_TABLE_C    TEST2   560.0
10  TABLE_C     TEST_TABLE_C    TEST3   135.0
11  TABLE_C     TEST_TABLE_C    TEST4   55.0

If you want to see it in action, I would recommend doing each operation one at a time and viewing the output in between each step:

df=df.pivot_table(index=['TABLE_NM','TEST_TABLE_NM'], columns=['TYPE'])

df=df.fillna(0)

df=df.stack()

df=df.reset_index()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM