so I have a dataframe that I made via df4.append(df3,ignore_index= True) ; however, I am having some issues removing repeats in my column Gene_symbol while still keeping the values in case 1, 2 and 3. I have already tried df4.drop_duplicates(["Gene_Symbol"]) and various other methods, all of which tend to delete the other rows and with it my Data.
What I am getting is this:
X Case1 Case2 Case3 Gene_Symbol
8026 8025 0.5326718 0.0000000 0.0000000 GAPDHS;TMEM147
32531 32530 0.0000000 0.5416982 0.0000000 GAPDHS;TMEM147
57051 57050 0.0000000 0.0000000 0.4821592 GAPDHS;TMEM147
What I would like to have is a dataframe below where my actual values are kept
Case1 Case2 Case3 Gene_Symbol
0.5326718 0.5416982 0.4821592 GAPDHS;TMEM147
Thank you for your time!
You could try the following, if all Cases columns contain only one non zero values for each gene , this should work (assume you don't have the X
column which looks like an index):
df.set_index('Gene_Symbol').stack()[lambda x: x != 0].unstack(level=1).reset_index()
# Gene_Symbol Case1 Case2 Case3
#0 GAPDHS;TMEM147 0.532672 0.541698 0.482159
Or:
df
# X Case1 Case2 Case3 Gene_Symbol
#8026 8025 0.532672 0.000000 0.000000 GAPDHS;TMEM147
#32531 32530 0.000000 0.541698 0.000000 GAPDHS;TMEM147
#57051 57050 0.000000 0.000000 0.482159 GAPDHS;TMEM147
df.drop('X', 1, inplace=True)
df.set_index('Gene_Symbol').stack()[lambda x: x != 0].unstack(level=1).reset_index()
# Gene_Symbol Case1 Case2 Case3
#0 GAPDHS;TMEM147 0.532672 0.541698 0.482159
How about
df = df.groupby('Gene_Symbol')['Case1', 'Case2', 'Case3'].sum().reset_index()
Gene_Symbol Case1 Case2 Case3
0 GAPDHS;TMEM147 0.532672 0.541698 0.482159
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.