I have ~150,000 rows of data detailing email bounces by domain, email template, bounce type and the count of each by day. It is formatted like the below:
+--------+-------------+-----------------+-------+---------+-------+
| t | bounce_type | source_ip | tid | emld | count |
+--------+-------------+-----------------+-------+---------+-------+
| 1/1/15 | hard | 199.122.255.142 | 10033 | aol.com | 4 |
+--------+-------------+-----------------+-------+---------+-------+
What is the easiest way to select only rows with an emld of "aol.com", bounce type of "hard", from all source ips and all tids? Is this something I would create a function for and pass the dataframe through, or is there a simpler operation to filter the data by these criteria?
An easy way is to perform a masked, supposed your DataFrame called df
, it will be something like this:
masked = (df['emld'] == 'aol.com') & (df['bounce_type'] == 'hard')
# then the result will be
df[masked]
shorthanded version in one line:
df[(df['emld'] == 'aol.com') & (df['bounce_type'] == 'hard')]
To just return source_ip
and tids
columns:
df[masked][['source_ip', 'tids']]
Or,
df[(df['emld'] == 'aol.com') & (df['bounce_type'] == 'hard')][['source_ip', 'tids']]
Hope this helps.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.