简体   繁体   English

Python在文件中找到最后一次出现

[英]Python find last occurence in a file

I have a file with different IP's. 我有一个不同IP的文件。

192.168.11.2
192.1268.11.3
192.168.11.3
192.168.11.3
192.168.11.2
192.168.11.5

This is my code until now. 这是我的代码,直到现在。 Where I print the IP and the occurence, but how can I found out when the last occurennce was for each of the IP's. 我打印IP和出现的地方,但是我怎样才能知道每个IP的最后一次出现的时间。 Is it a simple way to do so? 这是一个简单的方法吗?

liste = []

dit = {}
file = open('ip.txt','r')

file = file.readlines()

for line in file:
        liste.append(line.strip())

for element in liste:
        if element in dit:
                dit[element] +=1
        else:
                dit[element] = 1

for key,value in dit.items():
        print "%s occurs %s times, last occurence at line"  %(key,value)

Output: 输出:

192.1268.11.3 occurs 1 times, last occurence at line
192.168.11.3 occurs 2 times, last occurence at line
192.168.11.2 occurs 2 times, last occurence at line
192.168.11.5 occurs 1 times, last occurence at line

Try this: 尝试这个:

liste = []

dit = {}
file = open('ip.txt','r')

file = file.readlines()

for line in file:
        liste.append(line.strip())

for i, element in enumerate(liste, 1):
        if element in dit:
                dit[element][0] += 1
                dit[element][1] =  i
        else:
                dit[element] = [1,i]

for key,value in dit.items():
        print "%s occurs %d times, last occurence at line %d" % (key, value[0], value[1])

Here is a solution: 这是一个解决方案:

from collections import Counter

with open('ip.txt') as input_file:
    lines = input_file.read().splitlines()

    # Find last occurrence, count
    last_line = dict((ip, line_number) for line_number, ip in enumerate(lines, 1))
    ip_count = Counter(lines)

    # Print the stat, sorted by last occurrence
    for ip in sorted(last_line, key=lambda k: last_line[k]):
        print '{} occurs {} times, last occurence at line {}'.format(
            ip, ip_count[ip], last_line[ip])            

Discussion 讨论

  • I use the enumerate function to generate line number (starting at line 1) 我使用enumerate函数来生成行号(从第1行开始)
  • With a sequence of (ip, line_number), it's easy to generate the dictionary last_line where the key is the IP address and the value is the last line it occurs 使用(ip,line_number)序列,可以很容易地生成字典last_line ,其中键是IP地址,值是它发生的最后一行
  • To count the number of occurences, I use the Counter class--very simple 要计算出现次数,我使用Counter类 - 非常简单
  • If you want the report sorted by IP address, use sorted(last_line) 如果您希望报告按IP地址sorted(last_line) ,请使用sorted(last_line)
  • This solution has a performance implication: it scans the list of IPs twice: once to calculate last_line and once to calculate ip_count . 此解决方案具有性能影响:它扫描IP列表两次:一次计算last_line ,一次计算ip_count That means this solution might not be ideal if the file is large 这意味着如果文件很大,这个解决方案可能并不理想
last_line_occurrence = {}
for element, line_number in zip(liste, range(1, len(liste)+1)):
     if element in dit:
            dit[element] +=1
     else:
            dit[element] = 1
     last_line_occurrence[element] = line_number

for key,value in dit.items():
     print "%s occurs %s times, last occurence at line %s"  %(key,value, last_line_occurrence[key])

This can easily be done in a single pass without reading all the file into memory: 这可以在一次通过中轻松完成,而无需将所有文件读入内存:

from collections import defaultdict
d = defaultdict(lambda: {"ind":0,"count":0})

with open("in.txt") as f:
    for ind, line in enumerate(f,1):
        ip = line.rstrip()
        d[ip]["ind"] = ind
        d[ip]["count"]  += 1

for ip ,v in d.items():
    print("IP {}  appears {} time(s) and the last occurrence is at  line {}".format(ip,v["count"],v["ind"]))

Output: 输出:

IP 192.1268.11.3  appears 1 time(s) and the last occurrence is at line 2
IP 192.168.11.3  appears 2 time(s) and the last occurrence is at line 4
IP 192.168.11.2  appears 2 time(s) and the last occurrence is at line 5
IP 192.168.11.5  appears 1 time(s) and the last occurrence is at line 6

If you want the order the ip's are first encountered use an OrderedDict: 如果您想要首次遇到ip的订单,请使用OrderedDict:

from collections import OrderedDict
od = OrderedDict()
with open("in.txt") as f:
    for ind, line in enumerate(f,1):
        ip = line.rstrip()
        od.setdefault(ip, {"ind": 0,"count":0})
        od[ip]["ind"] = ind
        od[ip]["count"] += 1

for ip ,v in od.items():
    print("IP {}  appears {} time(s) and the last occurrence is at  line {}".format(ip,v["count"],v["ind"]))

Output: 输出:

IP 192.168.11.2  appears 2 time(s) and the last occurrence is at line 5
IP 192.1268.11.3  appears 1 time(s) and the last occurrence is at line 2
IP 192.168.11.3  appears 2 time(s) and the last occurrence is at line 4
IP 192.168.11.5  appears 1 time(s) and the last occurrence is at line 6

You can use another dictionary. 你可以使用另一本字典。 In this dictionary you store, for each line, the line number of the last occurrence and overwrite every time you find another occurrence. 在此词典中,您为每一行存储最后一次出现的行号,并在每次找到另一次出现时覆盖。 At the end, in this dictionary you will have, for each line, the line number of the last occurrence. 最后,在这个词典中,对于每一行,您将获得最后一次出现的行号。

Obviously you will need to increment a counter for each read line in order to know the line you're reading right now. 显然,您需要为每个读取行增加一个计数器,以便知道您正在读取的行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM