简体   繁体   English

如何使用Python根据另一个DataFrame中的行选择DataFrame中的行

[英]How to select rows in a DataFrame based on rows in another DataFrame using Python

I have two dataframes, df1 looks like as follows: 我有两个数据帧,df1如下所示:

id  year    CalendarWeek    DayName interval    counts
1   2014    1   sun 10:30   3
1   2014    1   sun 11:30   4
1   2014    2   wed 12:00   5
1   2014    2   fri 9:00    2
2   2014    1   sun 13:00   3
2   2014    1   sun 14:30   1
2   2014    1   mon 10:30   2
2   2014    2   wed 14:00   3
2   2014    2   fri 15:00   5
3   2014    1   thu 16:30   2
3   2014    1   thu 17:00   1
3   2014    2   sat 12:00   2
3   2014    2   sat 13:30   3

And df2 looks like as follows: df2如下所示:

id  year    CalendarWeek    DayName interval    NewCounts
1   2014    1   sun 10:00   2
1   2014    1   sun 10:30   4
1   2014    1   sun 11:30   5
1   2014    2   wed 10:30   6
1   2014    2   wed 12:00   3
1   2014    2   fri 8:30    1
1   2014    2   fri 9:00    2
2   2014    1   sun 12:30   3
2   2014    1   sun 13:00   4
2   2014    1   sun 14:30   4
2   2014    1   mon 9:00    35
2   2014    1   mon 10:30   1
2   2014    2   wed 12:30   23
2   2014    2   wed 14:00   4
2   2014    2   fri 15:00   3
3   2014    1   thu 14:30   1
3   2014    1   thu 15:00   3
3   2014    1   thu 16:30   34
3   2014    1   thu 17:00   5
3   2014    2   sat 12:00   3
3   2014    2   sat 13:30   4
3   2014    2   sat 14:00   2

I want to pick up all rows in df2 that match the columns id,year,CalendarWeek,DayName and interval in df1. 我想在df2中拾取与df1中的列id,year,CalendarWeek,DayName和interval匹配的所有行。 The result I want should looks like as follows: 我想要的结果应如下所示:

id  year    CalendarWeek    DayName interval    NewCounts
1   2014    1   sun 10:30   4
1   2014    1   sun 11:30   5
1   2014    2   wed 12:00   3
1   2014    2   fri 9:00    2
2   2014    1   sun 13:00   4
2   2014    1   sun 14:30   4
2   2014    1   mon 10:30   1
2   2014    2   wed 14:00   4
2   2014    2   fri 15:00   3
3   2014    1   thu 16:30   34
3   2014    1   thu 17:00   5
3   2014    2   sat 12:00   3
3   2014    2   sat 13:30   4

In Python, how to select these specific rows in a dataframe based on columns in another dataframe? 在Python中,如何根据另一个数据框中的列选择数据框中的这些特定行?

Thank you! 谢谢!

Perform a merge and pass the list of columns to param on , the default type of merge is 'inner' which only matches where values exist in both dfs: 执行merge并将列列表传递给param on ,合并的默认类型为'inner' ,仅匹配两个dfs中都存在值的位置:

In [2]:

df.merge(df1, on=['id','year','CalendarWeek','DayName','interval'])
Out[2]:
    id  year  CalendarWeek DayName interval  counts  NewCounts
0    1  2014             1     sun    10:30       3          4
1    1  2014             1     sun    11:30       4          5
2    1  2014             2     wed    12:00       5          3
3    1  2014             2     fri     9:00       2          2
4    2  2014             1     sun    13:00       3          4
5    2  2014             1     sun    14:30       1          4
6    2  2014             1     mon    10:30       2          1
7    2  2014             2     wed    14:00       3          4
8    2  2014             2     fri    15:00       5          3
9    3  2014             1     thu    16:30       2         34
10   3  2014             1     thu    17:00       1          5
11   3  2014             2     sat    12:00       2          3
12   3  2014             2     sat    13:30       3          4

If your 'id' column is your index, you'd have to reset the index on both df's so that they become a column in the df's, this is because the inner join will produce an incorrect result if you specify the on list of columns and also specify left_index=True and right_index=True : 如果您的“ id”列是索引,则必须在两个df上重置索引,以使其成为df的列,这是因为如果您指定列的on列表,则内部联接将产生错误的结果并指定left_index=Trueright_index=True

In [4]:

df.merge(df1, on=['year','CalendarWeek','DayName','interval'], left_index=True, right_index=True)
Out[4]:
    year  CalendarWeek DayName interval  counts  NewCounts
id                                                        
1   2014             1     sun    10:30       3          2
1   2014             1     sun    10:30       3          4
1   2014             1     sun    10:30       3          5
1   2014             1     sun    10:30       3          6
1   2014             1     sun    10:30       3          3
1   2014             1     sun    10:30       3          1
1   2014             1     sun    10:30       3          2
1   2014             1     sun    11:30       4          2
1   2014             1     sun    11:30       4          4
1   2014             1     sun    11:30       4          5
1   2014             1     sun    11:30       4          6
1   2014             1     sun    11:30       4          3
1   2014             1     sun    11:30       4          1
1   2014             1     sun    11:30       4          2
1   2014             2     wed    12:00       5          2
1   2014             2     wed    12:00       5          4
1   2014             2     wed    12:00       5          5
1   2014             2     wed    12:00       5          6
1   2014             2     wed    12:00       5          3
1   2014             2     wed    12:00       5          1
1   2014             2     wed    12:00       5          2
1   2014             2     fri     9:00       2          2
1   2014             2     fri     9:00       2          4
1   2014             2     fri     9:00       2          5
1   2014             2     fri     9:00       2          6
1   2014             2     fri     9:00       2          3
1   2014             2     fri     9:00       2          1
1   2014             2     fri     9:00       2          2
2   2014             1     sun    13:00       3          3
2   2014             1     sun    13:00       3          4
..   ...           ...     ...      ...     ...        ...
2   2014             2     fri    15:00       5          4
2   2014             2     fri    15:00       5          3
3   2014             1     thu    16:30       2          1
3   2014             1     thu    16:30       2          3
3   2014             1     thu    16:30       2         34
3   2014             1     thu    16:30       2          5
3   2014             1     thu    16:30       2          3
3   2014             1     thu    16:30       2          4
3   2014             1     thu    16:30       2          2
3   2014             1     thu    17:00       1          1
3   2014             1     thu    17:00       1          3
3   2014             1     thu    17:00       1         34
3   2014             1     thu    17:00       1          5
3   2014             1     thu    17:00       1          3
3   2014             1     thu    17:00       1          4
3   2014             1     thu    17:00       1          2
3   2014             2     sat    12:00       2          1
3   2014             2     sat    12:00       2          3
3   2014             2     sat    12:00       2         34
3   2014             2     sat    12:00       2          5
3   2014             2     sat    12:00       2          3
3   2014             2     sat    12:00       2          4
3   2014             2     sat    12:00       2          2
3   2014             2     sat    13:30       3          1
3   2014             2     sat    13:30       3          3
3   2014             2     sat    13:30       3         34
3   2014             2     sat    13:30       3          5
3   2014             2     sat    13:30       3          3
3   2014             2     sat    13:30       3          4
3   2014             2     sat    13:30       3          2

[96 rows x 6 columns]

so to reset the index just do df = df.reset_index(0) and likewise for the other df, after merging you can then set the index back to id so: 因此,要重置索引,只需执行df = df.reset_index(0) ,对于其他df同样如此,合并后,您可以将索引设置回id,这样:

merged = df.merge(df1, on=['id','year','CalendarWeek','DayName','interval'])
merged = merged.reset_index()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM