简体   繁体   中英

Python - Delete lines from dataframe (pandas)

I am trying to delete certain information from a data frame, but the 'delete-command' (.drop) does not work like it should anyone got an idea?

My Code:

    import pandas as pd

def join():

    open_momox_xlsx = "momox_ergebnisse.xlsx"
    open_rebuy_xlsx = "rebuy_ergebnisse.xlsx"

    rebuy_xlsx = pd.read_excel(open_rebuy_xlsx)
    momox_xlsx = pd.read_excel(open_momox_xlsx)

    rebuy_data = rebuy_xlsx[['ReBuy']]
    isbn_data = rebuy_xlsx[['ISBN']]
    momox_data = momox_xlsx[['Momox']]

    dataframe = pd.DataFrame =({'ISBN': isbn_data, 'Rebuy': rebuy_data, 'Momox': momox_data})
    data = pd.concat(dataframe,axis=1, ignore_index=True)

    c=0
    #print(data[0])
    while c < len(data):

        if data[1][c] and data[2][c] == '///':
            data.drop(index=c)
        elif data[1][c] and data[2][c] < '1':
            data.drop(index=c)
        elif data[1][c] or data[2][c] < '1' and data[1][c] or data[2][c] == '///' :
            data.drop(index=c)
        c=c+1
    print(data)

Output:

                0      1      2
0   9783630876672  12,35   2.62
1   9783423282789  11,67   6.07
2   9783833879500  17,25  12.40
3   9783898798822   6,91   1.16
4   9783453281417  12,93   2.84
5   9783630876672  12,35   4.08
6   9783423282789  11,67   6.07
7   9783833879500  17,25   9.94
8   9783898798822   6,91   2.96
9   9783453281417  12,93   2.68
10     3927905909    ///    ///
11     3872948210    ///   0.15
12  9783293003781    ///   0.15
13  9783423246842    ///    ///
14  9783423247146    ///    ///
15  9783423246934    ///    ///
16     387294116x    ///    ///
17  9783935597456   0,16   0.15
18  9783423204545    ///    ///

Wanted Output:

                0      1      2
0   9783630876672  12,35   2.62
1   9783423282789  11,67   6.07
2   9783833879500  17,25  12.40
3   9783898798822   6,91   1.16
4   9783453281417  12,93   2.84
5   9783630876672  12,35   4.08
6   9783423282789  11,67   6.07
7   9783833879500  17,25   9.94
8   9783898798822   6,91   2.96
9   9783453281417  12,93   2.68

the if-statement seems to work properly, but the data.drop does not do what it should..

You should add inplace=True to drop function.

To explain the advice with data.drop(index=c, inplace=True) little bit better, you could think of it as assigning the result back to the same variable: data = data.drop(index=c) . Although the approach with inplace=True is usually be better, both should work.

Make a clean dataframe and keep values you want:

data[['1', '2']] = data[['1', '2']].replace({"///": np.nan, ",": "."}, regex=True)
                                   .astype(float)
data = data.loc[data[["1", "2"]].ge(1.).all(axis="columns")]
>>> data
               0      1      2
0  9783630876672  12.35   2.62
1  9783423282789  11.67   6.07
2  9783833879500  17.25  12.40
3  9783898798822   6.91   1.16
4  9783453281417  12.93   2.84
5  9783630876672  12.35   4.08
6  9783423282789  11.67   6.07
7  9783833879500  17.25   9.94
8  9783898798822   6.91   2.96
9  9783453281417  12.93   2.68

Comments:

1st line:

  • data[['1', '2']] select columns named '1' and '2'
  • replace change existing values ('///' and ',') by new ones ('nan' and '.')
  • astype(float) convert your string columns to real numbers (float) since your dataframe is cleaned.

2nd line:

  • data.loc[...] locate something in your dataframe
  • data[["1", "2"]].ge(1.).all(axis="columns") : in columns '1' and '2', search values 'greater than or equal 1 and it must be true for 'all columns' of the row.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM