简体   繁体   English

计算 dataframe 中以 4 开头的整数的出现次数

[英]Count occurences of integers starting with 4 in dataframe

I have a dataframe in the following form:我有一个 dataframe 形式如下:

        index              client_ip  http_response_code
                                                                                    
2022-07-23 05:10:10+00:00  172.19.0.1     300   
2022-07-23 06:13:26+00:00  192.168.0.1    400
          ...                 ...         ...   

I need to group by clientip and count the number of occurences of number 4xx in the column response , namely the times of occurences of integers start with 4.我需要按clientip并计算列response中数字 4xx 的出现次数,即整数出现的次数以 4 开头。

What I have tried is the following:我尝试过的是以下内容:

df.groupby('client_ip')['http_response_code'].apply(lambda x: (str(x).startswith(str(4))).sum())

But I get the following error:但我收到以下错误:

AttributeError: 'bool' object has no attribute 'sum'

However, if let's say that I need to find the number of occurences of 400, then the following does not give any error, although is still boolean:但是,如果假设我需要找到 400 的出现次数,那么以下不会给出任何错误,尽管仍然是 boolean:

df.groupby('client_ip')['http_response_code'].apply(lambda x: (x==400).sum())

Any idea of what is wrong here?知道这里有什么问题吗?

Any idea of what is wrong here?知道这里有什么问题吗?

Your function get Series as input, comparing it against value gives Series of boolean values, which could be summed, using str functions gives str, which has not .sum .您的 function 将 Series 作为输入,将其与值进行比较给出 boolean 值的 Series,可以将其相加,使用str函数给出 str,它没有.sum Use .astype(str) to convert each value into str rather than whole Series, example使用.astype(str)将每个值转换为 str 而不是整个系列,例如

import pandas as pd
df = pd.DataFrame({"User":["A","A","B"],"Status":[400,301,302]})
grouped = df.groupby("User")["Status"].apply(lambda x:x.astype(str).str.startswith("4").sum())
print(grouped)

output output

User
A    1
B    0
Name: Status, dtype: int64

IIUC, this should work for you: IIUC,这应该适合你:

import pandas as pd
import numpy as np
np.random.seed(123)
df = pd.DataFrame({'client_id': np.random.choice([1, 2, 3], size=10, replace=True, p=None), 'http_response_code': np.random.choice([300, 400], size=10, replace=True, p=None)})
print(df[df.http_response_code.apply(lambda x: (str(x).startswith(str(4))))].groupby('client_id').count())

Dataframe: Dataframe:

   client_id  http_response_code
0          3                 300
1          2                 400
2          3                 300
3          3                 400
4          1                 300
5          3                 400
6          3                 400
7          2                 300
8          3                 300
9          2                 300

Result:结果:

           http_response_code
client_id                    
2                           1
3                           3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM