简体   繁体   中英

Extracting rows from a data frame with respect to the bin value from other data frame(without using column names)

I am working with 3 data frames, out of which 2 data frames contains additional bin number assigned to column data based on the range in which they belong (in separate columns).

df_1

A B

5 6

8 1

6 7

4 9

1 3

9 2

2 5

df_2

A B A_bin B_bin

5 6 2 2

8 1 1 1

6 7 3 2

4 9 3 3

1 3 1 1

9 2 1 1

2 5 2 2

df_3

A B C D A_bin B_bin C_bin D_bin

5 6 2 6 2 2 1 2

8 1 6 4 1 1 2 2

6 7 3 1 3 2 1 1

4 9 1 9 3 3 1 3

1 3 8 7 1 1 3 3

9 2 4 8 1 1 2 3

2 5 9 2 2 2 3 1

df_1 contain just two columns, df_2 have additional column which contain the bin assigned to column A and B according to the range in which the belong, similarly, df_3 contains columns with values and additional column with bin number assigned.

I want to extract the rows from df_3 such that it only extract data where df_2 columns have bin value "2" for each column respectively in a separate data frame.

The Main problem i am facing is to do it WITHOUT mentioning the column names anywhere in the code.

expected output

df_output_1 (where bin values for column 'A' in df_2 is 2)

A B C D

5 6 2 6

2 5 9 2

df_output_2 (where bin values for column 'B' in df_2 is 2)

A B C D

5 6 2 6

6 7 3 1

2 5 9 2

using merging (right or left) we can filter the data fame.

    for bin_name in (column_name + "_bin" for column_name in df_1_columns):
      print(bin_name)
      df_3_joined = pd.merge(df_3[df_3_op_columns], df_2[df_2[bin_name] == 2][df_1_columns], how='right', on=df_1_columns, suffixes=['_l', ''])
      print(df_3_joined)

Complete example is

import pandas as pd

df_1 = pd.DataFrame(columns = ['A', 'B'])
df_1.loc[len(df_1)] = [5,6]
df_1.loc[len(df_1)] = [8, 1]
df_1.loc[len(df_1)] = [6, 7]
df_1.loc[len(df_1)] = [4, 9]
df_1.loc[len(df_1)] = [1, 3]
df_1.loc[len(df_1)] = [9, 2]
df_1.loc[len(df_1)] = [2, 5]

df_2 = pd.DataFrame(columns = ['A', 'B', 'A_bin', 'B_bin'])
df_2.loc[len(df_2)] = [5, 6, 2, 2]
df_2.loc[len(df_2)] = [8, 1, 1, 1]
df_2.loc[len(df_2)] = [6, 7, 3, 2]
df_2.loc[len(df_2)] = [4, 9, 3, 3]
df_2.loc[len(df_2)] = [1, 3, 1, 1]
df_2.loc[len(df_2)] = [9, 2, 1, 1]
df_2.loc[len(df_2)] = [2, 5, 2, 2]

df_3 = pd.DataFrame(columns = ['A', 'B', 'C', 'D', 'A_bin', 'B_bin', 'C_bin', 'D_bin'])
df_3.loc[len(df_3)] = [5, 6, 2, 6, 2, 2, 1, 2]
df_3.loc[len(df_3)] = [8, 1, 6, 4, 1, 1, 2, 2]
df_3.loc[len(df_3)] = [6, 7, 3, 1, 3, 2, 1, 1]
df_3.loc[len(df_3)] = [4, 9, 1, 9, 3, 3, 1, 3]
df_3.loc[len(df_3)] = [1, 3, 8, 7, 1, 1, 3, 3]
df_3.loc[len(df_3)] = [9, 2, 4, 8, 1, 1, 2, 3]
df_3.loc[len(df_3)] = [2, 5, 9, 2, 2, 2, 3, 1]

results = {}
df_1_columns = list(df_1.columns)
df_3_op_columns = [cname for cname in list(df_3.columns) if not cname.endswith("_bin")]
for bin_name in (column_name + "_bin" for column_name in df_1_columns):
    df_3_joined = pd.merge(df_3[df_3_op_columns], df_2[df_2[bin_name] == 2][df_1_columns], how='right', on=df_1_columns)
    results[bin_name] = df_3_joined

for binName, result in results.iteritems():
    print(binName)
    print(result)

If you know the bin names, then retrieve the result as follows.

A_bin_df = results['A_bin']
print(A_bin_df)
B_bin_df = results['B_bin']
print(B_bin_df)

Use df.columns and column index to prevent using column names .

You can use all_cols = df_2.columns to get a list of column names. Then, use all_cols[i] to get column names.

For example, you can get column B with df_2[all_cols[1]] and get column B_bin with df_2[all_cols[1 + len(all_cols) / 2]] . If you want to get another column and its corresponding _bin column, just change the "1" to other dataframe column index.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM