简体   繁体   中英

What is better way to get difference of two lists?

There is in one directory where every time new files are generated, like some log files.

My purpose is to get an amount of file generated during 10 mins. To get such value real time.data is as follow:

00:00 ~ 00:10        10 files

00:10 ~ 00:20        23 files

...

23:50 ~ 23:59        12 files

So my idea is to run statistics script every 10 mins by crontab task on Linux system. Logic the 1st time run script: get current file list by glob.glob("*") .

Let me say A, so when script run next time (after 10 mins), it will run glob again to get current file list B. I need different value which in B. no A. so I can get amount. How to do? If you have another good way, please share.

You want to look into sets . You can do something like:

setA = set(listA)
setB = set(listB)
new_list = list(setB - setA)

You can also do additional set logic to identify files that are deleted and such.

As I commented on @tcaswell's answer , using Python's built-in set class is an excellent way to solve a problem like this. Here's some sample code loosely based on Tim Golden's Python Stuff article Watch a Directory for Changes :

import os

firstime = False
path_to_watch = '.'

try:
    with open('filelist.txt', 'rt') as filelist:
        before = set(line.strip() for line in filelist)
except IOError:
    before = set(os.listdir(path_to_watch))
    firstime = True

if firstime:
    after = before
else:
    after = set(os.listdir(path_to_watch))
    added = after-before
    removed = before-after
    if added:
        print 'Added: ', ', '.join(added)
    if removed:
        print 'Removed: ', ', '.join(removed)

# replace/create filelist
with open('filelist.txt', 'wt') as filelist:
    filelist.write('\n'.join(after) + '\n')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM