[英]How to select rows in a DataFrame based on rows in another DataFrame using Python
我有兩個數據幀,df1如下所示:
id year CalendarWeek DayName interval counts
1 2014 1 sun 10:30 3
1 2014 1 sun 11:30 4
1 2014 2 wed 12:00 5
1 2014 2 fri 9:00 2
2 2014 1 sun 13:00 3
2 2014 1 sun 14:30 1
2 2014 1 mon 10:30 2
2 2014 2 wed 14:00 3
2 2014 2 fri 15:00 5
3 2014 1 thu 16:30 2
3 2014 1 thu 17:00 1
3 2014 2 sat 12:00 2
3 2014 2 sat 13:30 3
df2如下所示:
id year CalendarWeek DayName interval NewCounts
1 2014 1 sun 10:00 2
1 2014 1 sun 10:30 4
1 2014 1 sun 11:30 5
1 2014 2 wed 10:30 6
1 2014 2 wed 12:00 3
1 2014 2 fri 8:30 1
1 2014 2 fri 9:00 2
2 2014 1 sun 12:30 3
2 2014 1 sun 13:00 4
2 2014 1 sun 14:30 4
2 2014 1 mon 9:00 35
2 2014 1 mon 10:30 1
2 2014 2 wed 12:30 23
2 2014 2 wed 14:00 4
2 2014 2 fri 15:00 3
3 2014 1 thu 14:30 1
3 2014 1 thu 15:00 3
3 2014 1 thu 16:30 34
3 2014 1 thu 17:00 5
3 2014 2 sat 12:00 3
3 2014 2 sat 13:30 4
3 2014 2 sat 14:00 2
我想在df2中拾取與df1中的列id,year,CalendarWeek,DayName和interval匹配的所有行。 我想要的結果應如下所示:
id year CalendarWeek DayName interval NewCounts
1 2014 1 sun 10:30 4
1 2014 1 sun 11:30 5
1 2014 2 wed 12:00 3
1 2014 2 fri 9:00 2
2 2014 1 sun 13:00 4
2 2014 1 sun 14:30 4
2 2014 1 mon 10:30 1
2 2014 2 wed 14:00 4
2 2014 2 fri 15:00 3
3 2014 1 thu 16:30 34
3 2014 1 thu 17:00 5
3 2014 2 sat 12:00 3
3 2014 2 sat 13:30 4
在Python中,如何根據另一個數據框中的列選擇數據框中的這些特定行?
謝謝!
執行merge
並將列列表傳遞給param on
,合並的默認類型為'inner'
,僅匹配兩個dfs中都存在值的位置:
In [2]:
df.merge(df1, on=['id','year','CalendarWeek','DayName','interval'])
Out[2]:
id year CalendarWeek DayName interval counts NewCounts
0 1 2014 1 sun 10:30 3 4
1 1 2014 1 sun 11:30 4 5
2 1 2014 2 wed 12:00 5 3
3 1 2014 2 fri 9:00 2 2
4 2 2014 1 sun 13:00 3 4
5 2 2014 1 sun 14:30 1 4
6 2 2014 1 mon 10:30 2 1
7 2 2014 2 wed 14:00 3 4
8 2 2014 2 fri 15:00 5 3
9 3 2014 1 thu 16:30 2 34
10 3 2014 1 thu 17:00 1 5
11 3 2014 2 sat 12:00 2 3
12 3 2014 2 sat 13:30 3 4
如果您的“ id”列是索引,則必須在兩個df上重置索引,以使其成為df的列,這是因為如果您指定列的on
列表,則內部聯接將產生錯誤的結果並指定left_index=True
和right_index=True
:
In [4]:
df.merge(df1, on=['year','CalendarWeek','DayName','interval'], left_index=True, right_index=True)
Out[4]:
year CalendarWeek DayName interval counts NewCounts
id
1 2014 1 sun 10:30 3 2
1 2014 1 sun 10:30 3 4
1 2014 1 sun 10:30 3 5
1 2014 1 sun 10:30 3 6
1 2014 1 sun 10:30 3 3
1 2014 1 sun 10:30 3 1
1 2014 1 sun 10:30 3 2
1 2014 1 sun 11:30 4 2
1 2014 1 sun 11:30 4 4
1 2014 1 sun 11:30 4 5
1 2014 1 sun 11:30 4 6
1 2014 1 sun 11:30 4 3
1 2014 1 sun 11:30 4 1
1 2014 1 sun 11:30 4 2
1 2014 2 wed 12:00 5 2
1 2014 2 wed 12:00 5 4
1 2014 2 wed 12:00 5 5
1 2014 2 wed 12:00 5 6
1 2014 2 wed 12:00 5 3
1 2014 2 wed 12:00 5 1
1 2014 2 wed 12:00 5 2
1 2014 2 fri 9:00 2 2
1 2014 2 fri 9:00 2 4
1 2014 2 fri 9:00 2 5
1 2014 2 fri 9:00 2 6
1 2014 2 fri 9:00 2 3
1 2014 2 fri 9:00 2 1
1 2014 2 fri 9:00 2 2
2 2014 1 sun 13:00 3 3
2 2014 1 sun 13:00 3 4
.. ... ... ... ... ... ...
2 2014 2 fri 15:00 5 4
2 2014 2 fri 15:00 5 3
3 2014 1 thu 16:30 2 1
3 2014 1 thu 16:30 2 3
3 2014 1 thu 16:30 2 34
3 2014 1 thu 16:30 2 5
3 2014 1 thu 16:30 2 3
3 2014 1 thu 16:30 2 4
3 2014 1 thu 16:30 2 2
3 2014 1 thu 17:00 1 1
3 2014 1 thu 17:00 1 3
3 2014 1 thu 17:00 1 34
3 2014 1 thu 17:00 1 5
3 2014 1 thu 17:00 1 3
3 2014 1 thu 17:00 1 4
3 2014 1 thu 17:00 1 2
3 2014 2 sat 12:00 2 1
3 2014 2 sat 12:00 2 3
3 2014 2 sat 12:00 2 34
3 2014 2 sat 12:00 2 5
3 2014 2 sat 12:00 2 3
3 2014 2 sat 12:00 2 4
3 2014 2 sat 12:00 2 2
3 2014 2 sat 13:30 3 1
3 2014 2 sat 13:30 3 3
3 2014 2 sat 13:30 3 34
3 2014 2 sat 13:30 3 5
3 2014 2 sat 13:30 3 3
3 2014 2 sat 13:30 3 4
3 2014 2 sat 13:30 3 2
[96 rows x 6 columns]
因此,要重置索引,只需執行df = df.reset_index(0)
,對於其他df同樣如此,合並后,您可以將索引設置回id,這樣:
merged = df.merge(df1, on=['id','year','CalendarWeek','DayName','interval'])
merged = merged.reset_index()
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.