简体   繁体   中英

New Pandas DF with index from one DF and columns from another

I have two dataframes. DF1 and DF2. I am comparing absolute distances between coordinate pairs from both. I want to populate a new dataframe that has rows for each df1 coordinate pair and a column for each df2 coordinate pair.

This would result in the absolute distance between each df1 pair and each df2 pair. This is my code so far and I'm struggling to figure out how to populate the new dataframe with each iteration.

`df_new = pd.DataFrame(index=df1.index.copy())

for idx_crime, x_crime in enumerate(df2['X_COORD']):
    y_crime = df2['Y_COORD'].iloc[idx_crime]
    for idx_subway, x_subway in enumerate(df1['X_COORD']):
        y_subway = df1['Y_COORD'].iloc[idx_subway]
        dist = np.sqrt((x_crime - x_subway)**2 + (y_crime - y_subway)**2)
        append.df_new
return df_new`

It isn't running. Any ideas of how to fill out this new dataframe?

EDIT Sample Data

DF2 Coordinates:

    X_COORD      Y_COORD 
0   1007314.0    241257.0
1   1043991.0    193406.0
2    999463.0    231690.0
3   1060183.0    177862.0
4    987606.0    208148.0

DF1 Coordinates:

    X_COORD      Y_COORD
0   1020671.0    248680.0
1   1019420.0    245867.0
2   1017558.0    245632.0

So df_new would look like this. Just the index numbers would work for column headings. I just wanted to show you how the data would look:

                 df2_coord0        df2_coord1        df2_coord2
    df1_coord0   13356.72213       23318.81485       21207.59944
    df1_coord1   12105.8096        24569.93244       19956.64481

Apparently, append.df_new is wrong.If that's your pseudo code, then you need insert cells to a dataFrame.Here are two ways: using position indexing or using conditional indexing .

Sample code:

import pandas as pd

lst = [
    {"a":1,"b":1},
    {"a":2,"b":2}
]

df = pd.DataFrame(lst)

df.loc[2] = [3, 3]    #2 here should be your desire index
df.loc[3] = {"a":4,"b":4} #3 here should be your desire index

print df

I had to break down df2 into smaller dfs to not throw a memory error. I changed the for loop to this and it works...just took a while to get there:

df_new = pd.DataFrame(index = df1.index.copy(),columns = df2.index.copy())

for idx_crime, x_crime in enumerate(df2['X_COORD']):
    y_crime = df2['Y_COORD'].iloc[idx_crime]
    for idx_subway, x_subway in enumerate(df1['X_COORD']):
        y_subway = df1['Y_COORD'].iloc[idx_subway]
        dist = np.sqrt((x_crime - x_subway)**2 + (y_crime - y_subway)**2)
        df_new.iloc[idx_subway, idx_crime] = dist
return df_new

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM