简体   繁体   中英

Python - How to construct for loop between multiple data frame?

I have two df as shown below.

df1 = {'Aberdeen Tunnel':[22.2620666,114.1779123]
         , 'Lion Rock Tunnel':[22.35134,114.1753917]
         , 'Shing Mun Tunnels':[22.3773149,114.1513125]
         , 'Tseung Kwan O Tunnel':[22.3191321,114.2440963]
         , 'Tsing Sha Highway':[22.343242,114.141755]
         , 'Cross Harbour Tunnel':[22.2922422,114.1796539]
         , 'Eastern Harbour Crossing':[22.2951813,114.220724]
         , 'Western Harbour Crossing':[22.2973088,114.1508622]
            , 'Tate\'s Cairn Tunnel':[22.3588556,114.2079283]
            , 'Tai Lam Tunnel':[22.3917362,114.0598441]}
df1 = pd.DataFrame(data = df1, index = ['lat','lon'])
df1 = pd.DataFrame.transpose(df1)
print(df1)

df2 = {(22.250559,114.170959),(22.281769,114.180153),(22.336325,114.178978)}
df2 = pd.DataFrame(data = df2, index = ['lat','lon'])
df2 = pd.DataFrame.transpose(df2)
print(df2)

I want to construct a for loop so as to find out from df2, which "Tunnel" is the nearest to the respective coordinates.

I have tried the below to first calculate the respective distance, but it doesn't seem to produce the right output.

for i in df1:
  for j in df1:
    for h in df2:
      for k in df2:
        dist = math.hypot(i-h , j-k)
print (dist)

In case your two data frames are not tooo large, you can use a cross join:

DataFrame 1
dat1 = {'Aberdeen Tunnel':[22.2620666,114.1779123]
         , 'Lion Rock Tunnel':[22.35134,114.1753917]
         , 'Shing Mun Tunnels':[22.3773149,114.1513125]
         , 'Tseung Kwan O Tunnel':[22.3191321,114.2440963]
         , 'Tsing Sha Highway':[22.343242,114.141755]
         , 'Cross Harbour Tunnel':[22.2922422,114.1796539]
         , 'Eastern Harbour Crossing':[22.2951813,114.220724]
         , 'Western Harbour Crossing':[22.2973088,114.1508622]
            , 'Tate\'s Cairn Tunnel':[22.3588556,114.2079283]
            , 'Tai Lam Tunnel':[22.3917362,114.0598441]}

df1 = pd.DataFrame(data = dat1, index = ['lat','lon'])
df1 = pd.DataFrame.transpose(df1)
df1['coord'] = list(zip(df1.lat, df1.lon))
df1 = df1[["coord"]]
# use tunnel as a column
df1['tunnel'] = df1.index
DataFrame 2
df2 = pd.DataFrame(columns=["coord"], index=[])
dat2 = [(22.250559,114.170959),
        (22.281769,114.180153),
        (22.336325,114.178978)]
df2["coord"] = dat2
# there must be some sort of identification for each coordinate
df2["id"] = ["a", "b", "c"]
df2

Solution:

# Cross join between both data frames
df_merge = df1.merge(df2, how="cross")

# calculate the distance between each pair of coordinates
df_merge["distance"] = df_merge.apply(lambda row: distance.distance(row["coord_x"], row["coord_y"]), axis=1)

# find the minimum distance for each point
minimiums = df_merge.groupby("id").distance.transform("min")

# return the tunnel - id pair with the minimum distance for each id
df_merge.loc[minimiums == df_merge["distance"], ["id", "tunnel", "distance"]]
#       id  tunnel                  distance
# 0     a   Aberdeen Tunnel         1.4620087109472515 km
# 5     c   Lion Rock Tunnel        1.7032322759652727 km
# 16    b   Cross Harbour Tunnel    1.1608810339311775 km

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM