简体   繁体   English

新的Pandas DF,其索引来自一个DF,列来自另一个DF

[英]New Pandas DF with index from one DF and columns from another

I have two dataframes. 我有两个数据框。 DF1 and DF2. DF1和DF2。 I am comparing absolute distances between coordinate pairs from both. 我正在比较两个坐标对之间的绝对距离。 I want to populate a new dataframe that has rows for each df1 coordinate pair and a column for each df2 coordinate pair. 我想填充一个新的数据框,其中每个df1坐标对都有行,每个df2坐标对都有列。

This would result in the absolute distance between each df1 pair and each df2 pair. 这将导致每个df1对和每个df2对之间的绝对距离。 This is my code so far and I'm struggling to figure out how to populate the new dataframe with each iteration. 到目前为止,这是我的代码,我正在努力找出如何在每次迭代中填充新数据框。

`df_new = pd.DataFrame(index=df1.index.copy())

for idx_crime, x_crime in enumerate(df2['X_COORD']):
    y_crime = df2['Y_COORD'].iloc[idx_crime]
    for idx_subway, x_subway in enumerate(df1['X_COORD']):
        y_subway = df1['Y_COORD'].iloc[idx_subway]
        dist = np.sqrt((x_crime - x_subway)**2 + (y_crime - y_subway)**2)
        append.df_new
return df_new`

It isn't running. 它没有运行。 Any ideas of how to fill out this new dataframe? 关于如何填写此新数据框的任何想法?

EDIT Sample Data 编辑样本数据

DF2 Coordinates:

    X_COORD      Y_COORD 
0   1007314.0    241257.0
1   1043991.0    193406.0
2    999463.0    231690.0
3   1060183.0    177862.0
4    987606.0    208148.0

DF1 Coordinates:

    X_COORD      Y_COORD
0   1020671.0    248680.0
1   1019420.0    245867.0
2   1017558.0    245632.0

So df_new would look like this. 因此df_new看起来像这样。 Just the index numbers would work for column headings. 仅索引号适用于列标题。 I just wanted to show you how the data would look: 我只是想向您展示数据的外观:

                 df2_coord0        df2_coord1        df2_coord2
    df1_coord0   13356.72213       23318.81485       21207.59944
    df1_coord1   12105.8096        24569.93244       19956.64481

Apparently, append.df_new is wrong.If that's your pseudo code, then you need insert cells to a dataFrame.Here are two ways: using position indexing or using conditional indexing . 显然, append.df_new是错误的。如果那是您的伪代码,那么您需要在dataFrame中插入单元格。这里有两种方法: 使用位置索引使用条件索引

Sample code: 样例代码:

import pandas as pd

lst = [
    {"a":1,"b":1},
    {"a":2,"b":2}
]

df = pd.DataFrame(lst)

df.loc[2] = [3, 3]    #2 here should be your desire index
df.loc[3] = {"a":4,"b":4} #3 here should be your desire index

print df

I had to break down df2 into smaller dfs to not throw a memory error. 我不得不将df2分解为较小的dfs,以免发生内存错误。 I changed the for loop to this and it works...just took a while to get there: 我将for循环更改为此,并且它起作用了……只是花了一段时间才到达那里:

df_new = pd.DataFrame(index = df1.index.copy(),columns = df2.index.copy())

for idx_crime, x_crime in enumerate(df2['X_COORD']):
    y_crime = df2['Y_COORD'].iloc[idx_crime]
    for idx_subway, x_subway in enumerate(df1['X_COORD']):
        y_subway = df1['Y_COORD'].iloc[idx_subway]
        dist = np.sqrt((x_crime - x_subway)**2 + (y_crime - y_subway)**2)
        df_new.iloc[idx_subway, idx_crime] = dist
return df_new

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM