[英]Use one dataframe rows to connect the columns of two different dataframes (Pandas)
[英]How can I use two pandas dataframes to create a new dataframe with specific rows from one dataframe?
我目前正在使用兩組數據框。 每組包含 60 個數據幀。 它們被排序以進行映射(例如,set1 df1 對應於 set2 df1)。 第一組大約是 27 行 x 2 列; 第二組超過 25000 行 x 8 列。 我想根據第一個數據幀中的值創建一個包含來自第二個數據幀的行的新數據幀。
為簡單起見,我創建了每個集合的第一個 df 的簡短示例來說明。 我想使用 797 從 df2 中獲取前 796 行(索引 0 - 795)並將它們添加到新數據幀,然后將 796 行添加到 930 並將它們過濾到第二個新數據幀。 對於所有 60 對數據幀,我有什么建議嗎?
0 1
0 797.0 930.0
1 1650.0 1760.0
2 2500.0 2570.0
3 3250.0 3333.0
4 3897.0 3967.0
0 -1 -2 -1 -3 -2 -1 2 0
1 0 0 0 -2 0 -1 0 0
2 -3 0 0 -1 -2 -1 -1 -1
3 0 1 -1 -1 -3 -2 -1 0
4 0 -3 -3 0 0 0 -4 -2
編輯添加:
import pandas as pd
df1 = pd.DataFrame([(3, 5), (8, 11)])
df2 = pd.DataFrame([(1, 0, 2, 3, 1, 0, 1, 2), (2, 0.5, 1, 3, 1, 0, 1, 2), (3, 0, 2, 3, 1, 0, 1, 2),
(4, 0, 2, 3, 1, 0, 1, 2), (5, 0, 2, 3, 1, 0, 1, 2), (6, 0, 2, 3, 1, 0, 1, 2),
(7, 0, 2, 3, 1, 0, 1, 2), (8, 0, 2, 3, 1, 0, 1, 2), (9, 0, 2, 3, 1, 0, 1, 2),
(10, 0, 2, 3, 1, 0, 1, 2), (11, 0, 2, 3, 1, 0, 1, 2), (12, 0, 2, 3, 1, 0, 1, 2),
(13, 0, 2, 3, 1, 0, 1, 2), (14, 0, 0, 1, 2, 5, 2, 3), (15, 0.5, 1, 3, 1.5, 2, 3, 1)])
#expected output will be two dataframes containing rows from df2
output1 = pd.DataFrame([(1, 0, 2, 3, 1, 0, 1, 2), (2, 0.5, 1, 3, 1, 0, 1, 2), (6, 0, 2, 3, 1, 0, 1, 2),
(7, 0, 2, 3, 1, 0, 1, 2), (12, 0, 2, 3, 1, 0, 1, 2), (13, 0, 2, 3, 1, 0, 1, 2),
(14, 0, 0, 1, 2, 5, 2, 3), (15, 0.5, 1, 3, 1.5, 2, 3, 1)])
output2 = pd.DataFrame([(3, 0, 2, 3, 1, 0, 1, 2), (4, 0, 2, 3, 1, 0, 1, 2), (5, 0, 2, 3, 1, 0, 1, 2),
(8, 0, 2, 3, 1, 0, 1, 2), (9, 0, 2, 3, 1, 0, 1, 2), (10, 0, 2, 3, 1, 0, 1, 2),
(11, 0, 2, 3, 1, 0, 1, 2)])
您可以對索引使用帶有 flatten 的列表理解:
rng = [x for a, b in df.values for x in range(int(a)-1, int(b))]
print (rng)
[2, 3, 4, 7, 8, 9, 10]
然后通過DataFrame.iloc
和Index.difference
過濾:
output1 = df2.iloc[df2.index.difference(rng)]
print (output1)
0 1 2 3 4 5 6 7
0 1 0.0 2 3 1.0 0 1 2
1 2 0.5 1 3 1.0 0 1 2
5 6 0.0 2 3 1.0 0 1 2
6 7 0.0 2 3 1.0 0 1 2
11 12 0.0 2 3 1.0 0 1 2
12 13 0.0 2 3 1.0 0 1 2
13 14 0.0 0 1 2.0 5 2 3
output2 = df2.iloc[rng]
print (output2)
0 1 2 3 4 5 6 7
2 3 0.0 2 3 1.0 0 1 2
3 4 0.0 2 3 1.0 0 1 2
4 5 0.0 2 3 1.0 0 1 2
7 8 0.0 2 3 1.0 0 1 2
8 9 0.0 2 3 1.0 0 1 2
9 10 0.0 2 3 1.0 0 1 2
10 11 0.0 2 3 1.0 0 1 2
編輯:
#list of DataFrames
L1 = [df11, df21, df31]
L2 = [df12, df22, df32]
#if necessary output lists
out1 = []
out2 = []
#loop with zipped lists and apply solution
for df1, df2 in zip(L1, L2):
print (df1)
print (df2)
rng = [x for a, b in df.values for x in range(int(a)-1, int(b))]
output1 = df2.iloc[df2.index.difference(rng)]
output2 = df2.iloc[rng]
#if necessary append output df to lists
out1.append(output1)
out2.append(output2)
這可能效率不高,但我可以生成您想要的結果
import pandas as pd
import numpy as np
df_out1 = pd.DataFrame()
df_out2 = pd.DataFrame()
#generate the secode dataframe
for x, y in np.array(df1):
df_out2 = df_out2.append(df2.iloc[x-1:y], ignore_index=True)
#get the difference
df_out1 = pd.concat([df_out2,df2]).drop_duplicates(keep=False)
將結果與您的結果進行比較
np.array_equal(df_out1.values,output1.values)
np.array_equal(df_out2.values,output2.values)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.