簡體   English   中英

通過迭代將函數應用於數據幀的所有行-Python

[英]Applying function over all rows of dataframe through iteration - Python

我正在處理此數據框:

數據框

 Detection_Location Blast Hole East Coordinate North Coordinate Tag Detector ID Detection Start Time Detection end time Tags CV22 105,100,99 16764.83,16752.74,16743.1 107347.67,107360.32,107362.96 385742468,385112050,385087366 2018-09-06 20:02:46 2018-09-06 20:49:21 3 CV23 63,64,61 16755.07,16745.42,16773.48 107387.68,107390.32,107382.6 385262370,385656531,385760755 2018-09-08 14:12:42 2018-09-08 14:24:19 3 CV22 5,35,19 16757.27,16747.75,16770.89 107452.4,107417.68,107420.83 385662254,385453358,385826979 2018-09-23 05:01:12 2018-09-23 05:52:54 3 

我試圖從“東坐標”列中拉出X坐標,從“北坐標”列中拉出y坐標。 我編寫了一個函數來計算每行中3個點的質心,然后計算到每個點的距離並進行驗證。 它可以完美地運行一排。

我想為每行應用該函數,然后返回答案。 我也嘗試了df.iterrows和df.apply,但是對於所有行都給出了相同的答案,因此顯然它不起作用。

這是代碼:(僅在最后一行中介紹如何在每個行上應用該函數以及如何獲得結果,因為原始數據集中的其他列將是一個挑戰,其余代碼僅用於理解。)

def Calculate_dispersion(row):
    #Picking up the columns with x coordinates.
    df2 = df['East Coordinate'].tolist()
    #Picking up the columns with y coordinates.
    df3 = df['North Coordinate'].tolist()


    #Splitting the list into seperate x coordinates.
    df4 = pd.DataFrame([sub.split(",") for sub in df2])
    #Splitting the list into seperate y coordinates.
    df5 = pd.DataFrame([sub.split(",") for sub in df3])


    #Creating a tuple with x coordinates
    x1 = df4.iloc[0]
    x2 = x1.tolist()
    x3 = tuple(float(x) for x in x2)
    #Creating a tuple with y coordinates
    y1 = df5.iloc[0]
    y2 = y1.tolist()
    y3 = tuple(float(x) for x in y2)


    #Creating the Coordinate tuple for centroid calculation.
    c = (x3,y3)

    #Calculating centroid.
    centroid = (sum(c[0])/len(c[0]),sum(c[1])/len(c[1]))
    Centroid1 = (round(centroid[0],2), round(centroid[1],2))

    #Converting tuple in (x,y) form.
    a = (x3[0], y3[0])
    b = (x3[1], y3[1])
    c = (x3[2], y3[2])

    #Function for calculating distance from centroid.
    def get_distance(x1, x2, y1, y2):
        dist = math.sqrt((x2-x1)**2 +(y2-y1)**2)
        return dist


    #Assinging first coordinate points value.
    x1 = a[0]
    x2 = Centroid1[0]
    y1 = a[1]
    y2 = Centroid1[1]
    #Calculating distance for second coordinate point and centroid.
    distance_a_centroid = get_distance(x1,x2,y1,y2)
    print(distance_a_centroid)

    #Assinging second coordinate points value.
    x_1 = b[0]
    y_1 = b[1]
    #Calculating distance for second coordinate point and centroid.
    distance_b_centroid = get_distance(x_1, x2, y_1, y2)
    print(distance_b_centroid)


    #Assinging second coordinate points value.
    x_2 = c[0]
    y_2 = c[1]
    #Calculating distance for third coordinate point and centroid.
    distance_c_centroid = get_distance(x_2,x2,y_2,y2)
    print(distance_c_centroid)

    #calculate average dispersion
    Average_dispersion = (distance_a_centroid+distance_b_centroid+distance_c_centroid)/3
    print(Average_dispersion)

    #Validation statement
    if distance_a_centroid <= 16.00 and distance_b_centroid <= 16.00 and distance_c_centroid <=16.00 :
        print ("True")
    else:
        print("False")





for index, row in df.iterrows():
    Final_PHD = Calculate_dispersion(row)
print(Final_PHD)

提前致謝

很簡單,如果您在函數內部什么也不看,取決於您執行該函數的row ,這意味着該函數將始終返回相同的值。 請注意,該函數的變量是局部變量,因此在每次調用函數時都會重新啟動。

嘗試這個:

def Calculate_dispersion(row):
    #Picking up the columns with x coordinates.
    df2 = df.loc[row,'East Coordinate'].tolist()
    #Picking up the columns with y coordinates.
    df3 = df.loc[row,'North Coordinate'].tolist()


    #Splitting the list into seperate x coordinates.
    df4 = pd.DataFrame([sub.split(",") for sub in df2])
    #Splitting the list into seperate y coordinates.
    df5 = pd.DataFrame([sub.split(",") for sub in df3])


    #Creating a tuple with x coordinates
    x1 = df4.iloc[0]
    x2 = x1.tolist()
    x3 = tuple(float(x) for x in x2)
    #Creating a tuple with y coordinates
    y1 = df5.iloc[0]
    y2 = y1.tolist()
    y3 = tuple(float(x) for x in y2)


    #Creating the Coordinate tuple for centroid calculation.
    c = (x3,y3)

    #Calculating centroid.
    centroid = (sum(c[0])/len(c[0]),sum(c[1])/len(c[1]))
    Centroid1 = (round(centroid[0],2), round(centroid[1],2))

    #Converting tuple in (x,y) form.
    a = (x3[0], y3[0])
    b = (x3[1], y3[1])
    c = (x3[2], y3[2])

    #Function for calculating distance from centroid.
    def get_distance(x1, x2, y1, y2):
        dist = math.sqrt((x2-x1)**2 +(y2-y1)**2)
        return dist


    #Assinging first coordinate points value.
    x1 = a[0]
    x2 = Centroid1[0]
    y1 = a[1]
    y2 = Centroid1[1]
    #Calculating distance for second coordinate point and centroid.
    distance_a_centroid = get_distance(x1,x2,y1,y2)
    print(distance_a_centroid)

    #Assinging second coordinate points value.
    x_1 = b[0]
    y_1 = b[1]
    #Calculating distance for second coordinate point and centroid.
    distance_b_centroid = get_distance(x_1, x2, y_1, y2)
    print(distance_b_centroid)


    #Assinging second coordinate points value.
    x_2 = c[0]
    y_2 = c[1]
    #Calculating distance for third coordinate point and centroid.
    distance_c_centroid = get_distance(x_2,x2,y_2,y2)
    print(distance_c_centroid)

    #calculate average dispersion
    Average_dispersion = (distance_a_centroid+distance_b_centroid+distance_c_centroid)/3
    print(Average_dispersion)

    #Validation statement
    if distance_a_centroid <= 16.00 and distance_b_centroid <= 16.00 and distance_c_centroid <=16.00 :
        print ("True")
    else:
        print("False")





row=0
while row<len(df.index)
    Final_PHD = Calculate_dispersion(row)
    row+=1
    print(Final_PHD)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM