[英]Count based user and ip address
我有這樣的文件
USER_ID,IP_ADDRESS
XXXXXX24,10.12.6.54
XXXXXX24,10.12.6.54
XXXXXX24,10.12.6.54
XXXXXX24,10.12.6.54
XXXXXX24,10.12.6.54
XXXXXX25,10.12.6.55
XXXXXX25,10.12.6.55
XXXXXX25,10.12.6.55
XXXXXX25,10.12.6.55
XXXXXX25,10.12.6.55
XXXXXX21,10.12.6.51
XXXXXX21,10.12.6.51
XXXXXX21,10.12.6.51
XXXXXX21,10.12.6.51
我需要一個基於 IP 地址計數的 output
10.12.6.51 10.12.6.55 10.12.6.54
XXXXXX21 4
XXXXXX25 4
XXXXXX24 4
所以這是代碼,它很好,我得到了這樣的 output。 我需要有關 output 的更多詳細信息。
#!/bin/python3.6
import csv
import collections
datafile=open('conn.csv','r')
usefuldata=[]
for line in datafile:
usefuldata.append(line)
from collections import Counter
outfile1=Counter(usefuldata)
print(outfile1)
最后在 Barmer 的幫助下,我想出了以下 output
Counter({'XXXXXX24,10.12.6.54\n': 5, 'XXXXXX25,10.12.6.55\n': 5, 'XXXXXX21,10.12.6.51\n': 4, 'XXXXXX24,10.12.6.56\n': 3, 'USER_ID,IP_ADDRESS\n': 1})
您還可以使用pandas
和collections.Counter
例如:
import collections
import pandas as pd
from tabulate import tabulate
with open("data_file.csv") as file:
next(file, None) # skip the header
counter = collections.Counter([line.strip() for line in file])
output = collections.defaultdict(dict)
for user_and_ip, ip_to_user_count in counter.items():
user, ip = user_and_ip.split(",")
output[ip].update({user: ip_to_user_count})
df = pd.DataFrame(output).fillna("")
print(tabulate(df, headers="keys"))
df.to_csv("user_to_ip.csv")
Output:
10.12.6.54 10.12.6.55 10.12.6.51
-------- ------------ ------------ ------------
XXXXXX24 5.0
XXXXXX25 5.0
XXXXXX21 4.0
和.csv
文件:
#!/bin/python3.6
import csv
import collections
datafile=open('conn.csv','r')
usefuldata=[]
for line in datafile:
usefuldata.append(line)
from collections import Counter
outfile1=Counter(usefuldata)
#print(outfile1.most_common())
for value,count in outfile1.most_common():
print(value,count)
我能夠通過上面的代碼實現我想要的
[root@lhqsb1db2db01 Scripts]# ./conn.py
XXXXXX24,10.12.6.54
5
XXXXXX25,10.12.6.55
5
XXXXXX21,10.12.6.51
4
XXXXXX24,10.12.6.56
3
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.