[英]Applying function over all rows of dataframe through iteration - Python
我正在处理此数据框:
Detection_Location Blast Hole East Coordinate North Coordinate Tag Detector ID Detection Start Time Detection end time Tags CV22 105,100,99 16764.83,16752.74,16743.1 107347.67,107360.32,107362.96 385742468,385112050,385087366 2018-09-06 20:02:46 2018-09-06 20:49:21 3 CV23 63,64,61 16755.07,16745.42,16773.48 107387.68,107390.32,107382.6 385262370,385656531,385760755 2018-09-08 14:12:42 2018-09-08 14:24:19 3 CV22 5,35,19 16757.27,16747.75,16770.89 107452.4,107417.68,107420.83 385662254,385453358,385826979 2018-09-23 05:01:12 2018-09-23 05:52:54 3
我试图从“东坐标”列中拉出X坐标,从“北坐标”列中拉出y坐标。 我编写了一个函数来计算每行中3个点的质心,然后计算到每个点的距离并进行验证。 它可以完美地运行一排。
我想为每行应用该函数,然后返回答案。 我也尝试了df.iterrows和df.apply,但是对于所有行都给出了相同的答案,因此显然它不起作用。
这是代码:(仅在最后一行中介绍如何在每个行上应用该函数以及如何获得结果,因为原始数据集中的其他列将是一个挑战,其余代码仅用于理解。)
def Calculate_dispersion(row):
#Picking up the columns with x coordinates.
df2 = df['East Coordinate'].tolist()
#Picking up the columns with y coordinates.
df3 = df['North Coordinate'].tolist()
#Splitting the list into seperate x coordinates.
df4 = pd.DataFrame([sub.split(",") for sub in df2])
#Splitting the list into seperate y coordinates.
df5 = pd.DataFrame([sub.split(",") for sub in df3])
#Creating a tuple with x coordinates
x1 = df4.iloc[0]
x2 = x1.tolist()
x3 = tuple(float(x) for x in x2)
#Creating a tuple with y coordinates
y1 = df5.iloc[0]
y2 = y1.tolist()
y3 = tuple(float(x) for x in y2)
#Creating the Coordinate tuple for centroid calculation.
c = (x3,y3)
#Calculating centroid.
centroid = (sum(c[0])/len(c[0]),sum(c[1])/len(c[1]))
Centroid1 = (round(centroid[0],2), round(centroid[1],2))
#Converting tuple in (x,y) form.
a = (x3[0], y3[0])
b = (x3[1], y3[1])
c = (x3[2], y3[2])
#Function for calculating distance from centroid.
def get_distance(x1, x2, y1, y2):
dist = math.sqrt((x2-x1)**2 +(y2-y1)**2)
return dist
#Assinging first coordinate points value.
x1 = a[0]
x2 = Centroid1[0]
y1 = a[1]
y2 = Centroid1[1]
#Calculating distance for second coordinate point and centroid.
distance_a_centroid = get_distance(x1,x2,y1,y2)
print(distance_a_centroid)
#Assinging second coordinate points value.
x_1 = b[0]
y_1 = b[1]
#Calculating distance for second coordinate point and centroid.
distance_b_centroid = get_distance(x_1, x2, y_1, y2)
print(distance_b_centroid)
#Assinging second coordinate points value.
x_2 = c[0]
y_2 = c[1]
#Calculating distance for third coordinate point and centroid.
distance_c_centroid = get_distance(x_2,x2,y_2,y2)
print(distance_c_centroid)
#calculate average dispersion
Average_dispersion = (distance_a_centroid+distance_b_centroid+distance_c_centroid)/3
print(Average_dispersion)
#Validation statement
if distance_a_centroid <= 16.00 and distance_b_centroid <= 16.00 and distance_c_centroid <=16.00 :
print ("True")
else:
print("False")
for index, row in df.iterrows():
Final_PHD = Calculate_dispersion(row)
print(Final_PHD)
提前致谢
很简单,如果您在函数内部什么也不看,取决于您执行该函数的row
,这意味着该函数将始终返回相同的值。 请注意,该函数的变量是局部变量,因此在每次调用函数时都会重新启动。
尝试这个:
def Calculate_dispersion(row):
#Picking up the columns with x coordinates.
df2 = df.loc[row,'East Coordinate'].tolist()
#Picking up the columns with y coordinates.
df3 = df.loc[row,'North Coordinate'].tolist()
#Splitting the list into seperate x coordinates.
df4 = pd.DataFrame([sub.split(",") for sub in df2])
#Splitting the list into seperate y coordinates.
df5 = pd.DataFrame([sub.split(",") for sub in df3])
#Creating a tuple with x coordinates
x1 = df4.iloc[0]
x2 = x1.tolist()
x3 = tuple(float(x) for x in x2)
#Creating a tuple with y coordinates
y1 = df5.iloc[0]
y2 = y1.tolist()
y3 = tuple(float(x) for x in y2)
#Creating the Coordinate tuple for centroid calculation.
c = (x3,y3)
#Calculating centroid.
centroid = (sum(c[0])/len(c[0]),sum(c[1])/len(c[1]))
Centroid1 = (round(centroid[0],2), round(centroid[1],2))
#Converting tuple in (x,y) form.
a = (x3[0], y3[0])
b = (x3[1], y3[1])
c = (x3[2], y3[2])
#Function for calculating distance from centroid.
def get_distance(x1, x2, y1, y2):
dist = math.sqrt((x2-x1)**2 +(y2-y1)**2)
return dist
#Assinging first coordinate points value.
x1 = a[0]
x2 = Centroid1[0]
y1 = a[1]
y2 = Centroid1[1]
#Calculating distance for second coordinate point and centroid.
distance_a_centroid = get_distance(x1,x2,y1,y2)
print(distance_a_centroid)
#Assinging second coordinate points value.
x_1 = b[0]
y_1 = b[1]
#Calculating distance for second coordinate point and centroid.
distance_b_centroid = get_distance(x_1, x2, y_1, y2)
print(distance_b_centroid)
#Assinging second coordinate points value.
x_2 = c[0]
y_2 = c[1]
#Calculating distance for third coordinate point and centroid.
distance_c_centroid = get_distance(x_2,x2,y_2,y2)
print(distance_c_centroid)
#calculate average dispersion
Average_dispersion = (distance_a_centroid+distance_b_centroid+distance_c_centroid)/3
print(Average_dispersion)
#Validation statement
if distance_a_centroid <= 16.00 and distance_b_centroid <= 16.00 and distance_c_centroid <=16.00 :
print ("True")
else:
print("False")
row=0
while row<len(df.index)
Final_PHD = Calculate_dispersion(row)
row+=1
print(Final_PHD)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.