如何使用兩個 Pandas 數據幀創建一個新數據幀，其中包含來自一個數據幀的特定行？

Question

我目前正在使用兩組數據框。 每組包含 60 個數據幀。 它們被排序以進行映射（例如，set1 df1 對應於 set2 df1）。 第一組大約是 27 行 x 2 列； 第二組超過 25000 行 x 8 列。 我想根據第一個數據幀中的值創建一個包含來自第二個數據幀的行的新數據幀。

為簡單起見，我創建了每個集合的第一個 df 的簡短示例來說明。 我想使用 797 從 df2 中獲取前 796 行（索引 0 - 795）並將它們添加到新數據幀，然后將 796 行添加到 930 並將它們過濾到第二個新數據幀。 對於所有 60 對數據幀，我有什么建議嗎？

          0        1
0     797.0    930.0
1    1650.0   1760.0
2    2500.0   2570.0
3    3250.0   3333.0
4    3897.0   3967.0


0        -1    -2    -1    -3    -2    -1     2     0
1         0     0     0    -2     0    -1     0     0
2        -3     0     0    -1    -2    -1    -1    -1
3         0     1    -1    -1    -3    -2    -1     0
4         0    -3    -3     0     0     0    -4    -2

編輯添加：

import pandas as pd

df1 = pd.DataFrame([(3, 5), (8, 11)])
df2 = pd.DataFrame([(1, 0, 2, 3, 1, 0, 1, 2), (2, 0.5, 1, 3, 1, 0, 1, 2), (3, 0, 2, 3, 1, 0, 1, 2), 
                    (4, 0, 2, 3, 1, 0, 1, 2), (5, 0, 2, 3, 1, 0, 1, 2), (6, 0, 2, 3, 1, 0, 1, 2), 
                    (7, 0, 2, 3, 1, 0, 1, 2), (8, 0, 2, 3, 1, 0, 1, 2), (9, 0, 2, 3, 1, 0, 1, 2), 
                    (10, 0, 2, 3, 1, 0, 1, 2), (11, 0, 2, 3, 1, 0, 1, 2), (12, 0, 2, 3, 1, 0, 1, 2), 
                    (13, 0, 2, 3, 1, 0, 1, 2), (14, 0, 0, 1, 2, 5, 2, 3), (15, 0.5, 1, 3, 1.5, 2, 3, 1)])


#expected output will be two dataframes containing rows from df2
output1 = pd.DataFrame([(1, 0, 2, 3, 1, 0, 1, 2), (2, 0.5, 1, 3, 1, 0, 1, 2), (6, 0, 2, 3, 1, 0, 1, 2), 
                    (7, 0, 2, 3, 1, 0, 1, 2), (12, 0, 2, 3, 1, 0, 1, 2), (13, 0, 2, 3, 1, 0, 1, 2), 
                    (14, 0, 0, 1, 2, 5, 2, 3), (15, 0.5, 1, 3, 1.5, 2, 3, 1)])
output2 = pd.DataFrame([(3, 0, 2, 3, 1, 0, 1, 2), (4, 0, 2, 3, 1, 0, 1, 2), (5, 0, 2, 3, 1, 0, 1, 2), 
                    (8, 0, 2, 3, 1, 0, 1, 2), (9, 0, 2, 3, 1, 0, 1, 2), (10, 0, 2, 3, 1, 0, 1, 2), 
                    (11, 0, 2, 3, 1, 0, 1, 2)])

Answer 1

您可以對索引使用帶有 flatten 的列表理解：

rng = [x for a, b in df.values for x in range(int(a)-1, int(b))]
print (rng)
[2, 3, 4, 7, 8, 9, 10]

然后通過DataFrame.iloc和Index.difference過濾：

output1 = df2.iloc[df2.index.difference(rng)]
print (output1)
     0    1  2  3    4  5  6  7
0    1  0.0  2  3  1.0  0  1  2
1    2  0.5  1  3  1.0  0  1  2
5    6  0.0  2  3  1.0  0  1  2
6    7  0.0  2  3  1.0  0  1  2
11  12  0.0  2  3  1.0  0  1  2
12  13  0.0  2  3  1.0  0  1  2
13  14  0.0  0  1  2.0  5  2  3

output2 = df2.iloc[rng]
print (output2)
     0    1  2  3    4  5  6  7
2    3  0.0  2  3  1.0  0  1  2
3    4  0.0  2  3  1.0  0  1  2
4    5  0.0  2  3  1.0  0  1  2
7    8  0.0  2  3  1.0  0  1  2
8    9  0.0  2  3  1.0  0  1  2
9   10  0.0  2  3  1.0  0  1  2
10  11  0.0  2  3  1.0  0  1  2

編輯：

#list of DataFrames
L1 = [df11, df21, df31]
L2 = [df12, df22, df32]

#if necessary output lists
out1 = []
out2 = []
#loop with zipped lists and apply solution
for df1, df2 in zip(L1, L2):
    print (df1)
    print (df2)

    rng = [x for a, b in df.values for x in range(int(a)-1, int(b))]
    output1 = df2.iloc[df2.index.difference(rng)]
    output2 = df2.iloc[rng]

    #if necessary append output df to lists
    out1.append(output1)
    out2.append(output2)

Answer 2

這可能效率不高，但我可以生成您想要的結果

import pandas as pd
import numpy as np

df_out1 = pd.DataFrame()
df_out2 = pd.DataFrame()
#generate the secode dataframe 
for x, y in np.array(df1):   
    df_out2 = df_out2.append(df2.iloc[x-1:y], ignore_index=True)
#get the difference 
df_out1 = pd.concat([df_out2,df2]).drop_duplicates(keep=False)

將結果與您的結果進行比較

np.array_equal(df_out1.values,output1.values)
np.array_equal(df_out2.values,output2.values)

如何使用兩個 Pandas 數據幀創建一個新數據幀，其中包含來自一個數據幀的特定行？

問題描述

2 個解決方案

解決方案1
1 已采納 2020-01-23 07:23:18

解決方案2
0 2020-01-23 10:47:44

如何使用兩個 Pandas 數據幀創建一個新數據幀，其中包含來自一個數據幀的特定行？

問題描述

2 個解決方案

解決方案1 1 已采納 2020-01-23 07:23:18

解決方案2 0 2020-01-23 10:47:44

解決方案1
1 已采納 2020-01-23 07:23:18

解決方案2
0 2020-01-23 10:47:44