简体   繁体   中英

How to split pandas data frame by many criteria

I have ~150,000 rows of data detailing email bounces by domain, email template, bounce type and the count of each by day. It is formatted like the below:

+--------+-------------+-----------------+-------+---------+-------+
|   t    | bounce_type |    source_ip    |  tid  |  emld   | count |
+--------+-------------+-----------------+-------+---------+-------+
| 1/1/15 | hard        | 199.122.255.142 | 10033 | aol.com |     4 |
+--------+-------------+-----------------+-------+---------+-------+

What is the easiest way to select only rows with an emld of "aol.com", bounce type of "hard", from all source ips and all tids? Is this something I would create a function for and pass the dataframe through, or is there a simpler operation to filter the data by these criteria?

An easy way is to perform a masked, supposed your DataFrame called df , it will be something like this:

masked = (df['emld'] == 'aol.com') & (df['bounce_type'] == 'hard')
# then the result will be
df[masked]

shorthanded version in one line:

df[(df['emld'] == 'aol.com') & (df['bounce_type'] == 'hard')]

To just return source_ip and tids columns:

df[masked][['source_ip', 'tids']]

Or,

df[(df['emld'] == 'aol.com') & (df['bounce_type'] == 'hard')][['source_ip', 'tids']]

Hope this helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM