[英]Floating point arithmetic and comparison causing undesirable output
考慮 Pandas DataFrame df1
:
df1 = pd.DataFrame({"Name":["Kevin","Peter","James","Jose","Matthew","Pattrick","Alexander"],"Number":[1,2,3,4,5,6,7],"Total":[495.2,432.5,'-',395.5,485.8,415,418.7],"Average_old":[86.57,83.97,'-',96.59,84.67,83.10,83.84],"Grade_old":['A','A','A','A+','A','A','A'],"Total_old":[432.8,419.8,'-',482.9,423.3,415,418.7]})
我用下面的公式計算了Average
和Grade
df1["Average"] = df1["Total"].apply(lambda x: x/5 + 0.1 if x != "-" else "-")
df1["Grade"] = df1["Average"].apply((lambda x:'A+' if x!='-' and x>90 else 'A'))
所以 df1 變成
df1
Name Number Total Average_old Grade_old Total_old Average Grade
0 Kevin 1 495.2 86.57 A 432.8 99.14 A+
1 Peter 2 432.5 83.97 A 419.8 86.60 A
2 James 3 - - A - - A
3 Jose 4 395.5 96.59 A+ 482.9 79.20 A
4 Matthew 5 485.8 84.67 A 423.3 97.26 A+
5 Pattrick 6 415.0 83.10 A 415.0 83.10 A
6 Alexander 7 418.7 83.84 A 418.7 83.84 A
df1
有Total, Total_old, Grade, Grade_old, Average, Average_old
。 我正在嘗試檢查 Total 的任何值是否針對Total_old
進行了修改, Grade
的任何值是否針對Grade_old
進行了修改,或者Average
的任何值是否針對Average_old
進行了修改。 我正在嘗試創建一個新的Dataframe
dfmod
,它將使用以下代碼提供 df1 的所有修改值
dfmod = pd.DataFrame()
columns =["Total","Average","Grade"]
for col in columns:
dfmod = pd.concat([dfmod,df1[["Name","Number",col + '_old']][df1[col].ne(df1[col +'_old'])].dropna()],sort=False)
dfmod.rename(columns={col + '_old':col},inplace=True)
dfmod = dfmod.groupby('Name',as_index = False,sort = False).first()
並得到 output 作為
dfmod
Name Number Total Average Grade
0 Kevin 1 432.8 86.57 A
1 Peter 2 419.8 83.97 None
2 Jose 4 482.9 96.59 A+
3 Matthew 5 423.3 84.67 A
4 Alexander 7 NaN 83.84 None
在比較 Total 與 Total_old、Average 與 Average_old 以及 Grade 與 Grade_old 時,這里沒有修改 Pattrick 的值,因此正確刪除了 Pattrick 的條目。
但是,如果您觀察Alexander's
Average
,即使Total
、 Average
和Grade
分別與Total_old,Average_old,Grade_old
相同,修改后的值 dataframe dfmod
Average
值錯誤地添加為修改后的值。 它發生的原因是因為浮點運算不會像下面鏈接中提到的編程語言中的 integer 運算那樣工作。 https://www.geeksforgeeks.org/floating-point-error-in-python/
所以我嘗試將np.isclose
function 實現為:
for col in columns:
if col is 'Grade':
dfmod = pd.concat([dfmod,df1[["Name","Number",col + '_old']][df1[col].ne(df1[col +'_old'])].dropna()],sort=False)
continue
dfmod = pd.concat([dfmod,df1[["Name","Number",col + '_old']][~np.isclose(df1[col],df1[col+'_old'])].dropna()],sort=False)
但它將錯誤消息拋出為
`Exception has occurred: TypeError ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''`
錯誤似乎是因為數據中的“-”字符 我該如何解決這個問題,請幫忙,我在這個問題上停留了一段時間並嘗試了我可以獲得的所有資源。
預計 output:
Name Number Total Average Grade
0 Kevin 1 432.8 86.57 A
1 Peter 2 419.8 83.97 A
3 Jose 4 482.9 96.59 A+
4 Matthew 5 423.3 84.67 A
它應該省略 James、Pattrick 和 Alexander 的值,因為它們對 Total - Total_old、Average - Average_old、Grade - Grade_old 沒有任何變化
請查看這是否是您要查找的內容。
import pandas as pd
import numpy as np
def compute_grade(new_average, old_grade):
try:
grade = 'A+' if float(new_average) > 90 else 'A'
except:
grade = old_grade
return grade
df1 = pd.DataFrame({"Name":["Kevin","Peter","James","Jose","Matthew","Pattrick","Alexander"],"Number":[1,2,3,4,5,6,7],"Total":[495.2,432.5,'-',395.5,485.8,415,418.7],"Average_old":[86.57,83.97,'-',96.59,84.67,83.10,83.84],"Grade_old":['A','A','A','A+','A','A','A'],"Total_old":[432.8,419.8,'-',482.9,423.3,415,418.7]})
df1["Average"] = df1["Total"].apply(lambda x: round((x/5) + 0.1, 2) if x != "-" else "-")
df1["Grade"] = df1.apply((lambda x: compute_grade(x['Average'], x['Grade_old'])), axis=1)
print(df1)
# import pdb; pdb.set_trace()
dfmod = df1[(df1['Total'] != df1['Total_old']) | (df1['Average'] != df1['Average_old']) | (df1['Grade'] != df1['Grade_old'])]
print(dfmod)
Output:
Name Number Total Average_old Grade_old Total_old Average Grade
0 Kevin 1 495.2 86.57 A 432.8 99.14 A+
1 Peter 2 432.5 83.97 A 419.8 86.6 A
2 James 3 - - A - - A
3 Jose 4 395.5 96.59 A+ 482.9 79.2 A
4 Matthew 5 485.8 84.67 A 423.3 97.26 A+
5 Pattrick 6 415 83.1 A 415 83.1 A
6 Alexander 7 418.7 83.84 A 418.7 83.84 A
Name Number Total Average_old Grade_old Total_old Average Grade
0 Kevin 1 495.2 86.57 A 432.8 99.14 A+
1 Peter 2 432.5 83.97 A 419.8 86.6 A
3 Jose 4 395.5 96.59 A+ 482.9 79.2 A
4 Matthew 5 485.8 84.67 A 423.3 97.26 A+
將我們計算的平均值四舍五入到兩位小數是這里的關鍵。
此外,在計算等級時,如果我們遇到像“-”這樣的非數值,我會返回 Grade_old。
據我所知, "-"
字符是不必要的——您可以在它們出現的列中將其替換為None
,然后將這些列設為數字。 這將使您的預處理步驟更加清晰,並避免我們需要檢查某些值是否為"-"
的不必要的條件語句。
df1 = df1.replace("-",None).astype({"Total":float, "Average_old":float, "Total_old":float})
然后,您不必在創建新列Average
和Grade
時使用.apply
或檢查特定元素是否為"-"
:
df1["Average"] = df1["Total"]/5 + 0.1
df1["Grade"] = ["A+" if x>90 else "A" for x in df1["Average"]]
然后你可以在你的條件下使用np.isclose
,刪除任何包含 null, select 列的行"_old"
,並重命名列:
condition = ~(np.isclose(df1['Average'], df1['Average_old']) & np.isclose(df1['Total'], df1['Total_old']) & (df1['Grade_old'] == df1['Grade']))
cols = ['Name','Number'] + [col for col in df1.columns if "_old" in col]
df1.loc[condition, cols].dropna().rename(columns={'Total_old':'Total','Average_old':'Average','Grade_old':'Grade'})
結果:
Name Number Total Average Grade
0 Kevin 1 432.8 86.57 A
1 Peter 2 419.8 83.97 A
3 Jose 4 482.9 96.59 A+
4 Matthew 5 423.3 84.67 A
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.