繁体   English   中英

通过迭代将函数应用于数据帧的所有行-Python

[英]Applying function over all rows of dataframe through iteration - Python

我正在处理此数据框:

数据框

 Detection_Location Blast Hole East Coordinate North Coordinate Tag Detector ID Detection Start Time Detection end time Tags CV22 105,100,99 16764.83,16752.74,16743.1 107347.67,107360.32,107362.96 385742468,385112050,385087366 2018-09-06 20:02:46 2018-09-06 20:49:21 3 CV23 63,64,61 16755.07,16745.42,16773.48 107387.68,107390.32,107382.6 385262370,385656531,385760755 2018-09-08 14:12:42 2018-09-08 14:24:19 3 CV22 5,35,19 16757.27,16747.75,16770.89 107452.4,107417.68,107420.83 385662254,385453358,385826979 2018-09-23 05:01:12 2018-09-23 05:52:54 3 

我试图从“东坐标”列中拉出X坐标,从“北坐标”列中拉出y坐标。 我编写了一个函数来计算每行中3个点的质心,然后计算到每个点的距离并进行验证。 它可以完美地运行一排。

我想为每行应用该函数,然后返回答案。 我也尝试了df.iterrows和df.apply,但是对于所有行都给出了相同的答案,因此显然它不起作用。

这是代码:(仅在最后一行中介绍如何在每个行上应用该函数以及如何获得结果,因为原始数据集中的其他列将是一个挑战,其余代码仅用于理解。)

def Calculate_dispersion(row):
    #Picking up the columns with x coordinates.
    df2 = df['East Coordinate'].tolist()
    #Picking up the columns with y coordinates.
    df3 = df['North Coordinate'].tolist()


    #Splitting the list into seperate x coordinates.
    df4 = pd.DataFrame([sub.split(",") for sub in df2])
    #Splitting the list into seperate y coordinates.
    df5 = pd.DataFrame([sub.split(",") for sub in df3])


    #Creating a tuple with x coordinates
    x1 = df4.iloc[0]
    x2 = x1.tolist()
    x3 = tuple(float(x) for x in x2)
    #Creating a tuple with y coordinates
    y1 = df5.iloc[0]
    y2 = y1.tolist()
    y3 = tuple(float(x) for x in y2)


    #Creating the Coordinate tuple for centroid calculation.
    c = (x3,y3)

    #Calculating centroid.
    centroid = (sum(c[0])/len(c[0]),sum(c[1])/len(c[1]))
    Centroid1 = (round(centroid[0],2), round(centroid[1],2))

    #Converting tuple in (x,y) form.
    a = (x3[0], y3[0])
    b = (x3[1], y3[1])
    c = (x3[2], y3[2])

    #Function for calculating distance from centroid.
    def get_distance(x1, x2, y1, y2):
        dist = math.sqrt((x2-x1)**2 +(y2-y1)**2)
        return dist


    #Assinging first coordinate points value.
    x1 = a[0]
    x2 = Centroid1[0]
    y1 = a[1]
    y2 = Centroid1[1]
    #Calculating distance for second coordinate point and centroid.
    distance_a_centroid = get_distance(x1,x2,y1,y2)
    print(distance_a_centroid)

    #Assinging second coordinate points value.
    x_1 = b[0]
    y_1 = b[1]
    #Calculating distance for second coordinate point and centroid.
    distance_b_centroid = get_distance(x_1, x2, y_1, y2)
    print(distance_b_centroid)


    #Assinging second coordinate points value.
    x_2 = c[0]
    y_2 = c[1]
    #Calculating distance for third coordinate point and centroid.
    distance_c_centroid = get_distance(x_2,x2,y_2,y2)
    print(distance_c_centroid)

    #calculate average dispersion
    Average_dispersion = (distance_a_centroid+distance_b_centroid+distance_c_centroid)/3
    print(Average_dispersion)

    #Validation statement
    if distance_a_centroid <= 16.00 and distance_b_centroid <= 16.00 and distance_c_centroid <=16.00 :
        print ("True")
    else:
        print("False")





for index, row in df.iterrows():
    Final_PHD = Calculate_dispersion(row)
print(Final_PHD)

提前致谢

很简单,如果您在函数内部什么也不看,取决于您执行该函数的row ,这意味着该函数将始终返回相同的值。 请注意,该函数的变量是局部变量,因此在每次调用函数时都会重新启动。

尝试这个:

def Calculate_dispersion(row):
    #Picking up the columns with x coordinates.
    df2 = df.loc[row,'East Coordinate'].tolist()
    #Picking up the columns with y coordinates.
    df3 = df.loc[row,'North Coordinate'].tolist()


    #Splitting the list into seperate x coordinates.
    df4 = pd.DataFrame([sub.split(",") for sub in df2])
    #Splitting the list into seperate y coordinates.
    df5 = pd.DataFrame([sub.split(",") for sub in df3])


    #Creating a tuple with x coordinates
    x1 = df4.iloc[0]
    x2 = x1.tolist()
    x3 = tuple(float(x) for x in x2)
    #Creating a tuple with y coordinates
    y1 = df5.iloc[0]
    y2 = y1.tolist()
    y3 = tuple(float(x) for x in y2)


    #Creating the Coordinate tuple for centroid calculation.
    c = (x3,y3)

    #Calculating centroid.
    centroid = (sum(c[0])/len(c[0]),sum(c[1])/len(c[1]))
    Centroid1 = (round(centroid[0],2), round(centroid[1],2))

    #Converting tuple in (x,y) form.
    a = (x3[0], y3[0])
    b = (x3[1], y3[1])
    c = (x3[2], y3[2])

    #Function for calculating distance from centroid.
    def get_distance(x1, x2, y1, y2):
        dist = math.sqrt((x2-x1)**2 +(y2-y1)**2)
        return dist


    #Assinging first coordinate points value.
    x1 = a[0]
    x2 = Centroid1[0]
    y1 = a[1]
    y2 = Centroid1[1]
    #Calculating distance for second coordinate point and centroid.
    distance_a_centroid = get_distance(x1,x2,y1,y2)
    print(distance_a_centroid)

    #Assinging second coordinate points value.
    x_1 = b[0]
    y_1 = b[1]
    #Calculating distance for second coordinate point and centroid.
    distance_b_centroid = get_distance(x_1, x2, y_1, y2)
    print(distance_b_centroid)


    #Assinging second coordinate points value.
    x_2 = c[0]
    y_2 = c[1]
    #Calculating distance for third coordinate point and centroid.
    distance_c_centroid = get_distance(x_2,x2,y_2,y2)
    print(distance_c_centroid)

    #calculate average dispersion
    Average_dispersion = (distance_a_centroid+distance_b_centroid+distance_c_centroid)/3
    print(Average_dispersion)

    #Validation statement
    if distance_a_centroid <= 16.00 and distance_b_centroid <= 16.00 and distance_c_centroid <=16.00 :
        print ("True")
    else:
        print("False")





row=0
while row<len(df.index)
    Final_PHD = Calculate_dispersion(row)
    row+=1
    print(Final_PHD)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM