I have a dataframe that looks like this:
import pandas as pd
data = {'TABLE_NM': ['TABLE_A', 'TABLE_A', 'TABLE_A', 'TABLE_A',
'TABLE_B', 'TABLE_B', 'TABLE_B',
'TABLE_C', 'TABLE_C', 'TABLE_C', 'TABLE_C'
],
'TEST_TABLE_NM': ['TEST_TABLE_A', 'TEST_TABLE_A', 'TEST_TABLE_A', 'TEST_TABLE_A',
'TEST_TABLE_B', 'TEST_TABLE_B', 'TEST_TABLE_B',
'TEST_TABLE_C', 'TEST_TABLE_C', 'TEST_TABLE_C', 'TEST_TABLE_C'],
'TYPE': ['TEST1', 'TEST2', 'TEST3', 'TEST4', 'TEST1', 'TEST2', 'TEST3',
'TEST1', 'TEST2', 'TEST3', 'TEST4'],
'RESULTS': [1005,560,2000,2000,1005,560,2000,1005,560,135,55]
}
df = pd.DataFrame(data, columns=['TABLE_NM', 'TEST_TABLE_NM', 'TYPE', 'RESULTS'])
Which results in this:
TABLE_NM TEST_TABLE_NM TYPE RESULTS
0 TABLE_A TEST_TABLE_A TEST1 1005
1 TABLE_A TEST_TABLE_A TEST2 560
2 TABLE_A TEST_TABLE_A TEST3 2000
3 TABLE_A TEST_TABLE_A TEST4 2000
4 TABLE_B TEST_TABLE_B TEST1 1005
5 TABLE_B TEST_TABLE_B TEST2 560
6 TABLE_B TEST_TABLE_B TEST3 2000
7 TABLE_C TEST_TABLE_C TEST1 1005
8 TABLE_C TEST_TABLE_C TEST2 560
9 TABLE_C TEST_TABLE_C TEST3 135
10 TABLE_C TEST_TABLE_C TEST4 55
There are hundreds of TABLE_NM/TEST_TABLE_NM combinations in reality, each of them should be associated to 4 tests. Some however, only have 3 tests associated to them as you can see above with TABLE_B.
What I want to do is for every TABLE_NM AND TEST_TABLE_NM combo, if there is NO 'TEST4' listed, I want to insert a dummy row into the dataframe after the 'TEST3' row, which has 'TEST4' listed as 'Type' and 0 listed as the 'RESULT'. So the above dataframe would then look like this instead:
TABLE_NM TEST_TABLE_NM TYPE RESULTS
0 TABLE_A TEST_TABLE_A TEST1 1005
1 TABLE_A TEST_TABLE_A TEST2 560
2 TABLE_A TEST_TABLE_A TEST3 2000
3 TABLE_A TEST_TABLE_A TEST4 2000
4 TABLE_B TEST_TABLE_B TEST1 1005
5 TABLE_B TEST_TABLE_B TEST2 560
6 TABLE_B TEST_TABLE_B TEST3 2000
7 TABLE_B TEST_TABLE_B TEST4 0
8 TABLE_C TEST_TABLE_C TEST1 1005
9 TABLE_C TEST_TABLE_C TEST2 560
10 TABLE_C TEST_TABLE_C TEST3 135
11 TABLE_C TEST_TABLE_C TEST4 55
Any ideas on how this could be achieved?
You can chain pivot table to get all columns with all rows, fillna to fill zeros for missing data, stack to get the columns back to rows, and reset the index (you can skip this step to get a multiindex of table/test_table)
df=df.pivot_table(index=['TABLE_NM','TEST_TABLE_NM'], columns=['TYPE']).fillna(0).stack().reset_index()
TABLE_NM TEST_TABLE_NM TYPE RESULTS
0 TABLE_A TEST_TABLE_A TEST1 1005.0
1 TABLE_A TEST_TABLE_A TEST2 560.0
2 TABLE_A TEST_TABLE_A TEST3 2000.0
3 TABLE_A TEST_TABLE_A TEST4 2000.0
4 TABLE_B TEST_TABLE_B TEST1 1005.0
5 TABLE_B TEST_TABLE_B TEST2 560.0
6 TABLE_B TEST_TABLE_B TEST3 2000.0
7 TABLE_B TEST_TABLE_B TEST4 0.0
8 TABLE_C TEST_TABLE_C TEST1 1005.0
9 TABLE_C TEST_TABLE_C TEST2 560.0
10 TABLE_C TEST_TABLE_C TEST3 135.0
11 TABLE_C TEST_TABLE_C TEST4 55.0
If you want to see it in action, I would recommend doing each operation one at a time and viewing the output in between each step:
df=df.pivot_table(index=['TABLE_NM','TEST_TABLE_NM'], columns=['TYPE'])
df=df.fillna(0)
df=df.stack()
df=df.reset_index()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.