[英]Applying function over all rows of dataframe through iteration - Python
我正在處理此數據框:
Detection_Location Blast Hole East Coordinate North Coordinate Tag Detector ID Detection Start Time Detection end time Tags CV22 105,100,99 16764.83,16752.74,16743.1 107347.67,107360.32,107362.96 385742468,385112050,385087366 2018-09-06 20:02:46 2018-09-06 20:49:21 3 CV23 63,64,61 16755.07,16745.42,16773.48 107387.68,107390.32,107382.6 385262370,385656531,385760755 2018-09-08 14:12:42 2018-09-08 14:24:19 3 CV22 5,35,19 16757.27,16747.75,16770.89 107452.4,107417.68,107420.83 385662254,385453358,385826979 2018-09-23 05:01:12 2018-09-23 05:52:54 3
我試圖從“東坐標”列中拉出X坐標,從“北坐標”列中拉出y坐標。 我編寫了一個函數來計算每行中3個點的質心,然后計算到每個點的距離並進行驗證。 它可以完美地運行一排。
我想為每行應用該函數,然后返回答案。 我也嘗試了df.iterrows和df.apply,但是對於所有行都給出了相同的答案,因此顯然它不起作用。
這是代碼:(僅在最后一行中介紹如何在每個行上應用該函數以及如何獲得結果,因為原始數據集中的其他列將是一個挑戰,其余代碼僅用於理解。)
def Calculate_dispersion(row):
#Picking up the columns with x coordinates.
df2 = df['East Coordinate'].tolist()
#Picking up the columns with y coordinates.
df3 = df['North Coordinate'].tolist()
#Splitting the list into seperate x coordinates.
df4 = pd.DataFrame([sub.split(",") for sub in df2])
#Splitting the list into seperate y coordinates.
df5 = pd.DataFrame([sub.split(",") for sub in df3])
#Creating a tuple with x coordinates
x1 = df4.iloc[0]
x2 = x1.tolist()
x3 = tuple(float(x) for x in x2)
#Creating a tuple with y coordinates
y1 = df5.iloc[0]
y2 = y1.tolist()
y3 = tuple(float(x) for x in y2)
#Creating the Coordinate tuple for centroid calculation.
c = (x3,y3)
#Calculating centroid.
centroid = (sum(c[0])/len(c[0]),sum(c[1])/len(c[1]))
Centroid1 = (round(centroid[0],2), round(centroid[1],2))
#Converting tuple in (x,y) form.
a = (x3[0], y3[0])
b = (x3[1], y3[1])
c = (x3[2], y3[2])
#Function for calculating distance from centroid.
def get_distance(x1, x2, y1, y2):
dist = math.sqrt((x2-x1)**2 +(y2-y1)**2)
return dist
#Assinging first coordinate points value.
x1 = a[0]
x2 = Centroid1[0]
y1 = a[1]
y2 = Centroid1[1]
#Calculating distance for second coordinate point and centroid.
distance_a_centroid = get_distance(x1,x2,y1,y2)
print(distance_a_centroid)
#Assinging second coordinate points value.
x_1 = b[0]
y_1 = b[1]
#Calculating distance for second coordinate point and centroid.
distance_b_centroid = get_distance(x_1, x2, y_1, y2)
print(distance_b_centroid)
#Assinging second coordinate points value.
x_2 = c[0]
y_2 = c[1]
#Calculating distance for third coordinate point and centroid.
distance_c_centroid = get_distance(x_2,x2,y_2,y2)
print(distance_c_centroid)
#calculate average dispersion
Average_dispersion = (distance_a_centroid+distance_b_centroid+distance_c_centroid)/3
print(Average_dispersion)
#Validation statement
if distance_a_centroid <= 16.00 and distance_b_centroid <= 16.00 and distance_c_centroid <=16.00 :
print ("True")
else:
print("False")
for index, row in df.iterrows():
Final_PHD = Calculate_dispersion(row)
print(Final_PHD)
提前致謝
很簡單,如果您在函數內部什么也不看,取決於您執行該函數的row
,這意味着該函數將始終返回相同的值。 請注意,該函數的變量是局部變量,因此在每次調用函數時都會重新啟動。
嘗試這個:
def Calculate_dispersion(row):
#Picking up the columns with x coordinates.
df2 = df.loc[row,'East Coordinate'].tolist()
#Picking up the columns with y coordinates.
df3 = df.loc[row,'North Coordinate'].tolist()
#Splitting the list into seperate x coordinates.
df4 = pd.DataFrame([sub.split(",") for sub in df2])
#Splitting the list into seperate y coordinates.
df5 = pd.DataFrame([sub.split(",") for sub in df3])
#Creating a tuple with x coordinates
x1 = df4.iloc[0]
x2 = x1.tolist()
x3 = tuple(float(x) for x in x2)
#Creating a tuple with y coordinates
y1 = df5.iloc[0]
y2 = y1.tolist()
y3 = tuple(float(x) for x in y2)
#Creating the Coordinate tuple for centroid calculation.
c = (x3,y3)
#Calculating centroid.
centroid = (sum(c[0])/len(c[0]),sum(c[1])/len(c[1]))
Centroid1 = (round(centroid[0],2), round(centroid[1],2))
#Converting tuple in (x,y) form.
a = (x3[0], y3[0])
b = (x3[1], y3[1])
c = (x3[2], y3[2])
#Function for calculating distance from centroid.
def get_distance(x1, x2, y1, y2):
dist = math.sqrt((x2-x1)**2 +(y2-y1)**2)
return dist
#Assinging first coordinate points value.
x1 = a[0]
x2 = Centroid1[0]
y1 = a[1]
y2 = Centroid1[1]
#Calculating distance for second coordinate point and centroid.
distance_a_centroid = get_distance(x1,x2,y1,y2)
print(distance_a_centroid)
#Assinging second coordinate points value.
x_1 = b[0]
y_1 = b[1]
#Calculating distance for second coordinate point and centroid.
distance_b_centroid = get_distance(x_1, x2, y_1, y2)
print(distance_b_centroid)
#Assinging second coordinate points value.
x_2 = c[0]
y_2 = c[1]
#Calculating distance for third coordinate point and centroid.
distance_c_centroid = get_distance(x_2,x2,y_2,y2)
print(distance_c_centroid)
#calculate average dispersion
Average_dispersion = (distance_a_centroid+distance_b_centroid+distance_c_centroid)/3
print(Average_dispersion)
#Validation statement
if distance_a_centroid <= 16.00 and distance_b_centroid <= 16.00 and distance_c_centroid <=16.00 :
print ("True")
else:
print("False")
row=0
while row<len(df.index)
Final_PHD = Calculate_dispersion(row)
row+=1
print(Final_PHD)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.