I am trying to delete certain information from a data frame, but the 'delete-command' (.drop) does not work like it should anyone got an idea?
My Code:
import pandas as pd
def join():
open_momox_xlsx = "momox_ergebnisse.xlsx"
open_rebuy_xlsx = "rebuy_ergebnisse.xlsx"
rebuy_xlsx = pd.read_excel(open_rebuy_xlsx)
momox_xlsx = pd.read_excel(open_momox_xlsx)
rebuy_data = rebuy_xlsx[['ReBuy']]
isbn_data = rebuy_xlsx[['ISBN']]
momox_data = momox_xlsx[['Momox']]
dataframe = pd.DataFrame =({'ISBN': isbn_data, 'Rebuy': rebuy_data, 'Momox': momox_data})
data = pd.concat(dataframe,axis=1, ignore_index=True)
c=0
#print(data[0])
while c < len(data):
if data[1][c] and data[2][c] == '///':
data.drop(index=c)
elif data[1][c] and data[2][c] < '1':
data.drop(index=c)
elif data[1][c] or data[2][c] < '1' and data[1][c] or data[2][c] == '///' :
data.drop(index=c)
c=c+1
print(data)
Output:
0 1 2
0 9783630876672 12,35 2.62
1 9783423282789 11,67 6.07
2 9783833879500 17,25 12.40
3 9783898798822 6,91 1.16
4 9783453281417 12,93 2.84
5 9783630876672 12,35 4.08
6 9783423282789 11,67 6.07
7 9783833879500 17,25 9.94
8 9783898798822 6,91 2.96
9 9783453281417 12,93 2.68
10 3927905909 /// ///
11 3872948210 /// 0.15
12 9783293003781 /// 0.15
13 9783423246842 /// ///
14 9783423247146 /// ///
15 9783423246934 /// ///
16 387294116x /// ///
17 9783935597456 0,16 0.15
18 9783423204545 /// ///
Wanted Output:
0 1 2
0 9783630876672 12,35 2.62
1 9783423282789 11,67 6.07
2 9783833879500 17,25 12.40
3 9783898798822 6,91 1.16
4 9783453281417 12,93 2.84
5 9783630876672 12,35 4.08
6 9783423282789 11,67 6.07
7 9783833879500 17,25 9.94
8 9783898798822 6,91 2.96
9 9783453281417 12,93 2.68
the if-statement seems to work properly, but the data.drop does not do what it should..
You should add inplace=True
to drop
function.
To explain the advice with data.drop(index=c, inplace=True)
little bit better, you could think of it as assigning the result back to the same variable: data = data.drop(index=c)
. Although the approach with inplace=True
is usually be better, both should work.
Make a clean dataframe and keep values you want:
data[['1', '2']] = data[['1', '2']].replace({"///": np.nan, ",": "."}, regex=True)
.astype(float)
data = data.loc[data[["1", "2"]].ge(1.).all(axis="columns")]
>>> data
0 1 2
0 9783630876672 12.35 2.62
1 9783423282789 11.67 6.07
2 9783833879500 17.25 12.40
3 9783898798822 6.91 1.16
4 9783453281417 12.93 2.84
5 9783630876672 12.35 4.08
6 9783423282789 11.67 6.07
7 9783833879500 17.25 9.94
8 9783898798822 6.91 2.96
9 9783453281417 12.93 2.68
Comments:
1st line:
data[['1', '2']]
select columns named '1' and '2' replace
change existing values ('///' and ',') by new ones ('nan' and '.')astype(float)
convert your string columns to real numbers (float) since your dataframe is cleaned. 2nd line:
data.loc[...]
locate something in your dataframe data[["1", "2"]].ge(1.).all(axis="columns")
: in columns '1' and '2', search values 'greater than or equal 1 and it must be true for 'all columns' of the row.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.