简体   繁体   English

如何使用python regex从服务器日志文件中提取IP地址?

[英]How to use python regex to extract IP address from server log files?

I am currently getting started with python.我目前正在开始使用 python。 I have a server log file for the pages I visited over a period of time.我有一段时间内访问过的页面的服务器日志文件。

How do I write a python program to find out which IP address was visited most?如何编写python程序来找出访问最多的IP地址? Will I have to use dictionary?我必须使用字典吗? I have done this but I am not sure how to use regex to fetch IP addresses.我已经这样做了,但我不确定如何使用正则表达式来获取 IP 地址。

import re

openFile = open('text.txt', "r")

readLines = openFile.read()
# pat = re.compile("^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$")
wordfreq = {}

for word in readLines.split():
    if word not in wordfreq:
        wordfreq[word] = 1
    else:
        wordfreq[word] += 1

print(wordfreq)

# wordList = [(v,k) for k,v in wordfreq.items()]
# wordList.sort(reverse=True)
# 
# print(wordList)

PS: I don't want to use counter from python module. PS:我不想使用 python 模块中的计数器。 I am figuring out to do this with dictionary.我正在想办法用字典来做到这一点。

Using Regex and Counter使用正则表达式和计数器

Demo:演示:

import re
from collections import Counter
s = """www.google.com : 255.111.111.111

-some random stuff-

www.facebook.com : 255.222.222.222

-some random stuff-

www.google.com : 255.111.111.111

-some random stuff-

www.google.com : 255.111.111.111

-some random stuff-
"""

ips = re.findall("www\.[A-za-z]+\.[a-z]+\s+\:\s+(.*$)", s, flags=re.MULTILINE)
print(Counter(ips).most_common(1))

Output:输出:

[('255.111.111.111', 3)]

Another way to extract the IP addresses in addition to what Rakesh posted earlier:除了 Rakesh 之前发布的内容之外,另一种提取 IP 地址的方法:

import re

pattern = '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}'
finalIP = re.findall(pattern, s)

For the counter, refer to his answer.对于柜台,参考他的回答。 I just posted a different regex!我刚刚发布了一个不同的正则表达式!

To find out which IP address was visited most and this shows in ascending order.找出访问次数最多的 IP 地址,并按升序显示。 You can take 1st IP as the most visited one.您可以将第一个 IP 作为访问量最大的 IP。

import re
import operator

openFile = open("text.txt", "r").readlines()

wordfreq = {}

for line in openFile:
    ipAddr = line.split(" ")[0]
    if ipAddr not in wordfreq:
        wordfreq[ipAddr] = 0

    wordfreq[ipAddr] += 1

#print(wordfreq)

# sorting the dict
sorted_ips = dict(
    sorted(wordfreq.items(), key=operator.itemgetter(1), reverse=True))


for ipAddr, count in sorted_ips.items():
    print("{} : {}".format(ipAddr, count))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM