简体   繁体   中英

How can I use two pandas dataframes to create a new dataframe with specific rows from one dataframe?

I am currently working with two sets of dataframes. Each set contains 60 dataframes. They are sorted to line up for mapping (eg. set1 df1 corresponds with set2 df1). First set is about 27 rows x 2 columns; second set is over 25000 rows x 8 columns. I want to create a new dataframe that contains rows from the 2nd dataframe according to the values in the 1st dataframe.

For simplicity I've created a shorten example of the first df of each set to illustrate. I want to use the 797 to take the first 796 rows (indexes 0 - 795) from df2 and add them to a new dataframe, and then rows 796 to 930 and filter them to a 2nd new dataframe. Any suggestions how I could that do for all 60 pairs of dataframes?

          0        1
0     797.0    930.0
1    1650.0   1760.0
2    2500.0   2570.0
3    3250.0   3333.0
4    3897.0   3967.0


0        -1    -2    -1    -3    -2    -1     2     0
1         0     0     0    -2     0    -1     0     0
2        -3     0     0    -1    -2    -1    -1    -1
3         0     1    -1    -1    -3    -2    -1     0
4         0    -3    -3     0     0     0    -4    -2

edit to add:

import pandas as pd

df1 = pd.DataFrame([(3, 5), (8, 11)])
df2 = pd.DataFrame([(1, 0, 2, 3, 1, 0, 1, 2), (2, 0.5, 1, 3, 1, 0, 1, 2), (3, 0, 2, 3, 1, 0, 1, 2), 
                    (4, 0, 2, 3, 1, 0, 1, 2), (5, 0, 2, 3, 1, 0, 1, 2), (6, 0, 2, 3, 1, 0, 1, 2), 
                    (7, 0, 2, 3, 1, 0, 1, 2), (8, 0, 2, 3, 1, 0, 1, 2), (9, 0, 2, 3, 1, 0, 1, 2), 
                    (10, 0, 2, 3, 1, 0, 1, 2), (11, 0, 2, 3, 1, 0, 1, 2), (12, 0, 2, 3, 1, 0, 1, 2), 
                    (13, 0, 2, 3, 1, 0, 1, 2), (14, 0, 0, 1, 2, 5, 2, 3), (15, 0.5, 1, 3, 1.5, 2, 3, 1)])


#expected output will be two dataframes containing rows from df2
output1 = pd.DataFrame([(1, 0, 2, 3, 1, 0, 1, 2), (2, 0.5, 1, 3, 1, 0, 1, 2), (6, 0, 2, 3, 1, 0, 1, 2), 
                    (7, 0, 2, 3, 1, 0, 1, 2), (12, 0, 2, 3, 1, 0, 1, 2), (13, 0, 2, 3, 1, 0, 1, 2), 
                    (14, 0, 0, 1, 2, 5, 2, 3), (15, 0.5, 1, 3, 1.5, 2, 3, 1)])
output2 = pd.DataFrame([(3, 0, 2, 3, 1, 0, 1, 2), (4, 0, 2, 3, 1, 0, 1, 2), (5, 0, 2, 3, 1, 0, 1, 2), 
                    (8, 0, 2, 3, 1, 0, 1, 2), (9, 0, 2, 3, 1, 0, 1, 2), (10, 0, 2, 3, 1, 0, 1, 2), 
                    (11, 0, 2, 3, 1, 0, 1, 2)])

You can use list comprehension with flatten for indices:

rng = [x for a, b in df.values for x in range(int(a)-1, int(b))]
print (rng)
[2, 3, 4, 7, 8, 9, 10]

And then filter by DataFrame.iloc and Index.difference :

output1 = df2.iloc[df2.index.difference(rng)]
print (output1)
     0    1  2  3    4  5  6  7
0    1  0.0  2  3  1.0  0  1  2
1    2  0.5  1  3  1.0  0  1  2
5    6  0.0  2  3  1.0  0  1  2
6    7  0.0  2  3  1.0  0  1  2
11  12  0.0  2  3  1.0  0  1  2
12  13  0.0  2  3  1.0  0  1  2
13  14  0.0  0  1  2.0  5  2  3

output2 = df2.iloc[rng]
print (output2)
     0    1  2  3    4  5  6  7
2    3  0.0  2  3  1.0  0  1  2
3    4  0.0  2  3  1.0  0  1  2
4    5  0.0  2  3  1.0  0  1  2
7    8  0.0  2  3  1.0  0  1  2
8    9  0.0  2  3  1.0  0  1  2
9   10  0.0  2  3  1.0  0  1  2
10  11  0.0  2  3  1.0  0  1  2

EDIT:

#list of DataFrames
L1 = [df11, df21, df31]
L2 = [df12, df22, df32]

#if necessary output lists
out1 = []
out2 = []
#loop with zipped lists and apply solution
for df1, df2 in zip(L1, L2):
    print (df1)
    print (df2)

    rng = [x for a, b in df.values for x in range(int(a)-1, int(b))]
    output1 = df2.iloc[df2.index.difference(rng)]
    output2 = df2.iloc[rng]

    #if necessary append output df to lists
    out1.append(output1)
    out2.append(output2)

this might not be efficient, but I could generate your desired results

import pandas as pd
import numpy as np

df_out1 = pd.DataFrame()
df_out2 = pd.DataFrame()
#generate the secode dataframe 
for x, y in np.array(df1):   
    df_out2 = df_out2.append(df2.iloc[x-1:y], ignore_index=True)
#get the difference 
df_out1 = pd.concat([df_out2,df2]).drop_duplicates(keep=False)

to compare the results with yours

np.array_equal(df_out1.values,output1.values)
np.array_equal(df_out2.values,output2.values)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM