简体   繁体   中英

remove specific rows in dataframe with pandas

i need some help from all of you I'm working with a data form from excel, so basically now i have something like this.

csr   id   ac  otc  tm  lease  maint 
 1    456  b    0    0    0      0
 1    543  a    0    1    1      0
 1    435  e    0    0    0      0
 2    123  w    1    1    1      1
 2    123  g    0    0    0      0
 3    987  j    0    0    0      0
 4    258  k    1    1    1      1 
 4    258  m    0    0    0      0

So i need to delete the rows with zero in 'otc' 'tm' 'lease' 'maint' columns. i do something like this

df = pd.read_excel(xlsx,'Sheet1')
df_zero = df[(df['OTC'] == 0) & (df['TM'] == 0) & (df['Lease'] == 0) & (df['Maint'] == 0) & (df['Support'] == 0) & (df['Other']  == 0)]

In this way i open the file and save in df_zero all the rows that contain zero in the specific column. Then

df1 = df_zero.loc[:, 'CSR']

Basically this save in df1 the CSR number for the rows with zeros in the specific columns, like this

csr
 1
 1
 2 
 3
 4

So i think ok i do this and problem resolved.

for n1 in df1:
    df = df[df.CSR != n1]

But the problem here is, as you can see in the CSR 1, we have 3 different rows, if i run that 'for', i will delete the 3 of them, i just need to remove the ones that have zeros in the specific columns ('otc' 'tm' 'lease' 'maint').

I think in a 'for' for be moving in the CSR and another one to be moving in 'otc' if the value that i found is zero move to 'tm'(in the same row) check for zero, then to 'lease' and 'maint' in the same row, if any of this columns is not zero, jump to the next CSR. In this example. We will remove the CSR 1, because all of them ('otc' 'tm' 'lease' 'maint') are zero, then jump to the next CSR, again 1, but in this case we have zero in 'otc' but 1 in 'tm', so we have to jump to the next CSR is again 1 but all of the columns ('otc' 'tm' 'lease' 'maint') are in zero so we remove the row, and continue in this way until the last CSR...

I think that could work but i'm having some problems to implement that, or maybe any of you have a better idea. Thanks and sorry for bad english

You can also extract the four columns that you are interested and count how many zeros it has for each row and create logical vector for indexing:

df[(df[['otc', 'tm', 'lease', 'maint']] == 0).sum(axis = 1) < 4]

#  csr   id  ac  otc    tm  lease   maint
# 1  1  543   a    0     1      1       0
# 3  2  123   w    1     1      1       1
# 6  4  258   k    1     1      1       1

try this:

In [35]: df.eval('otc == 0 and tm == 0 and lease == 0 and maint == 0')
Out[35]:
0     True
1    False
2     True
3    False
4     True
5     True
6    False
7     True
dtype: bool

In [36]: df[~df.eval('otc == 0 and tm == 0 and lease == 0 and maint == 0')]
Out[36]:
   csr   id ac  otc  tm  lease  maint
1    1  543  a    0   1      1      0
3    2  123  w    1   1      1      1
6    4  258  k    1   1      1      1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM