[英]New Pandas DF with index from one DF and columns from another
I have two dataframes. 我有两个数据框。 DF1 and DF2. DF1和DF2。 I am comparing absolute distances between coordinate pairs from both. 我正在比较两个坐标对之间的绝对距离。 I want to populate a new dataframe that has rows for each df1 coordinate pair and a column for each df2 coordinate pair. 我想填充一个新的数据框,其中每个df1坐标对都有行,每个df2坐标对都有列。
This would result in the absolute distance between each df1 pair and each df2 pair. 这将导致每个df1对和每个df2对之间的绝对距离。 This is my code so far and I'm struggling to figure out how to populate the new dataframe with each iteration. 到目前为止,这是我的代码,我正在努力找出如何在每次迭代中填充新数据框。
`df_new = pd.DataFrame(index=df1.index.copy())
for idx_crime, x_crime in enumerate(df2['X_COORD']):
y_crime = df2['Y_COORD'].iloc[idx_crime]
for idx_subway, x_subway in enumerate(df1['X_COORD']):
y_subway = df1['Y_COORD'].iloc[idx_subway]
dist = np.sqrt((x_crime - x_subway)**2 + (y_crime - y_subway)**2)
append.df_new
return df_new`
It isn't running. 它没有运行。 Any ideas of how to fill out this new dataframe? 关于如何填写此新数据框的任何想法?
EDIT Sample Data 编辑样本数据
DF2 Coordinates:
X_COORD Y_COORD
0 1007314.0 241257.0
1 1043991.0 193406.0
2 999463.0 231690.0
3 1060183.0 177862.0
4 987606.0 208148.0
DF1 Coordinates:
X_COORD Y_COORD
0 1020671.0 248680.0
1 1019420.0 245867.0
2 1017558.0 245632.0
So df_new would look like this. 因此df_new看起来像这样。 Just the index numbers would work for column headings. 仅索引号适用于列标题。 I just wanted to show you how the data would look: 我只是想向您展示数据的外观:
df2_coord0 df2_coord1 df2_coord2
df1_coord0 13356.72213 23318.81485 21207.59944
df1_coord1 12105.8096 24569.93244 19956.64481
Apparently, append.df_new
is wrong.If that's your pseudo code, then you need insert cells to a dataFrame.Here are two ways: using position indexing or using conditional indexing . 显然, append.df_new
是错误的。如果那是您的伪代码,那么您需要在dataFrame中插入单元格。这里有两种方法: 使用位置索引或使用条件索引 。
Sample code: 样例代码:
import pandas as pd
lst = [
{"a":1,"b":1},
{"a":2,"b":2}
]
df = pd.DataFrame(lst)
df.loc[2] = [3, 3] #2 here should be your desire index
df.loc[3] = {"a":4,"b":4} #3 here should be your desire index
print df
I had to break down df2 into smaller dfs to not throw a memory error. 我不得不将df2分解为较小的dfs,以免发生内存错误。 I changed the for loop to this and it works...just took a while to get there: 我将for循环更改为此,并且它起作用了……只是花了一段时间才到达那里:
df_new = pd.DataFrame(index = df1.index.copy(),columns = df2.index.copy())
for idx_crime, x_crime in enumerate(df2['X_COORD']):
y_crime = df2['Y_COORD'].iloc[idx_crime]
for idx_subway, x_subway in enumerate(df1['X_COORD']):
y_subway = df1['Y_COORD'].iloc[idx_subway]
dist = np.sqrt((x_crime - x_subway)**2 + (y_crime - y_subway)**2)
df_new.iloc[idx_subway, idx_crime] = dist
return df_new
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.