I am currently working with two sets of dataframes. Each set contains 60 dataframes. They are sorted to line up for mapping (eg. set1 df1 corresponds with set2 df1). First set is about 27 rows x 2 columns; second set is over 25000 rows x 8 columns. I want to create a new dataframe that contains rows from the 2nd dataframe according to the values in the 1st dataframe.
For simplicity I've created a shorten example of the first df of each set to illustrate. I want to use the 797 to take the first 796 rows (indexes 0 - 795) from df2 and add them to a new dataframe, and then rows 796 to 930 and filter them to a 2nd new dataframe. Any suggestions how I could that do for all 60 pairs of dataframes?
0 1
0 797.0 930.0
1 1650.0 1760.0
2 2500.0 2570.0
3 3250.0 3333.0
4 3897.0 3967.0
0 -1 -2 -1 -3 -2 -1 2 0
1 0 0 0 -2 0 -1 0 0
2 -3 0 0 -1 -2 -1 -1 -1
3 0 1 -1 -1 -3 -2 -1 0
4 0 -3 -3 0 0 0 -4 -2
edit to add:
import pandas as pd
df1 = pd.DataFrame([(3, 5), (8, 11)])
df2 = pd.DataFrame([(1, 0, 2, 3, 1, 0, 1, 2), (2, 0.5, 1, 3, 1, 0, 1, 2), (3, 0, 2, 3, 1, 0, 1, 2),
(4, 0, 2, 3, 1, 0, 1, 2), (5, 0, 2, 3, 1, 0, 1, 2), (6, 0, 2, 3, 1, 0, 1, 2),
(7, 0, 2, 3, 1, 0, 1, 2), (8, 0, 2, 3, 1, 0, 1, 2), (9, 0, 2, 3, 1, 0, 1, 2),
(10, 0, 2, 3, 1, 0, 1, 2), (11, 0, 2, 3, 1, 0, 1, 2), (12, 0, 2, 3, 1, 0, 1, 2),
(13, 0, 2, 3, 1, 0, 1, 2), (14, 0, 0, 1, 2, 5, 2, 3), (15, 0.5, 1, 3, 1.5, 2, 3, 1)])
#expected output will be two dataframes containing rows from df2
output1 = pd.DataFrame([(1, 0, 2, 3, 1, 0, 1, 2), (2, 0.5, 1, 3, 1, 0, 1, 2), (6, 0, 2, 3, 1, 0, 1, 2),
(7, 0, 2, 3, 1, 0, 1, 2), (12, 0, 2, 3, 1, 0, 1, 2), (13, 0, 2, 3, 1, 0, 1, 2),
(14, 0, 0, 1, 2, 5, 2, 3), (15, 0.5, 1, 3, 1.5, 2, 3, 1)])
output2 = pd.DataFrame([(3, 0, 2, 3, 1, 0, 1, 2), (4, 0, 2, 3, 1, 0, 1, 2), (5, 0, 2, 3, 1, 0, 1, 2),
(8, 0, 2, 3, 1, 0, 1, 2), (9, 0, 2, 3, 1, 0, 1, 2), (10, 0, 2, 3, 1, 0, 1, 2),
(11, 0, 2, 3, 1, 0, 1, 2)])
You can use list comprehension with flatten for indices:
rng = [x for a, b in df.values for x in range(int(a)-1, int(b))]
print (rng)
[2, 3, 4, 7, 8, 9, 10]
And then filter by DataFrame.iloc
and Index.difference
:
output1 = df2.iloc[df2.index.difference(rng)]
print (output1)
0 1 2 3 4 5 6 7
0 1 0.0 2 3 1.0 0 1 2
1 2 0.5 1 3 1.0 0 1 2
5 6 0.0 2 3 1.0 0 1 2
6 7 0.0 2 3 1.0 0 1 2
11 12 0.0 2 3 1.0 0 1 2
12 13 0.0 2 3 1.0 0 1 2
13 14 0.0 0 1 2.0 5 2 3
output2 = df2.iloc[rng]
print (output2)
0 1 2 3 4 5 6 7
2 3 0.0 2 3 1.0 0 1 2
3 4 0.0 2 3 1.0 0 1 2
4 5 0.0 2 3 1.0 0 1 2
7 8 0.0 2 3 1.0 0 1 2
8 9 0.0 2 3 1.0 0 1 2
9 10 0.0 2 3 1.0 0 1 2
10 11 0.0 2 3 1.0 0 1 2
EDIT:
#list of DataFrames
L1 = [df11, df21, df31]
L2 = [df12, df22, df32]
#if necessary output lists
out1 = []
out2 = []
#loop with zipped lists and apply solution
for df1, df2 in zip(L1, L2):
print (df1)
print (df2)
rng = [x for a, b in df.values for x in range(int(a)-1, int(b))]
output1 = df2.iloc[df2.index.difference(rng)]
output2 = df2.iloc[rng]
#if necessary append output df to lists
out1.append(output1)
out2.append(output2)
this might not be efficient, but I could generate your desired results
import pandas as pd
import numpy as np
df_out1 = pd.DataFrame()
df_out2 = pd.DataFrame()
#generate the secode dataframe
for x, y in np.array(df1):
df_out2 = df_out2.append(df2.iloc[x-1:y], ignore_index=True)
#get the difference
df_out1 = pd.concat([df_out2,df2]).drop_duplicates(keep=False)
to compare the results with yours
np.array_equal(df_out1.values,output1.values)
np.array_equal(df_out2.values,output2.values)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.