简体   繁体   中英

Check physical existence of files referenced in DB table

We have one rather large table containing documents info together with filepaths pointing to files on file system. After couple of years we noticed that we have files on the disk which are not referenced in DB table and vice-versa.

Since currently I'm learning Clojure I tought it would be nice to make small utility which can find diff between db and file system. Naturally, since i'm beginner I got stucked because there's more than 600 000 documents and obviously I need some more performant and less memory consuming solution :)

My first idea was to generate flatten filesystem tree list with all files, and compare it with list from db, if file doesn't exist put in separate list "non-existing" and if some file exists on HDD and not in DB, move it to some dump directory.

Any ideas?

As a sketch, here's how you could check the filesystem against the database, in chunks of whatever size you're happy with:

(->> (file-seq (java.io.File. "/"))
     (remove (memfn isDirectory))
     (partition 20)
     (map (fn [files] (printf "Checking %d files against db...\n" (count files))))
     (take 2))

(Checking 20 files against db...
Checking 20 files against db...
nil nil)

Instead of using printf , do some kind of database checks against the list of files.

I would suggest one of three options depending on your preference for performance vs. memory:

  1. Memory intensive: Use a recursive method calling File.listFiles to put all the files into a list. Then compare the list against your DB.

  2. IO intensive solution: Recursively check each file one at a time against the DB.

  3. Intermediate solution: read all the files in one dir, compare them against the DB. Recurse on any sub-dirs and repeat. Has the same number of IO calls as option 1 but only holds one branch + one dir worth of file paths in memory at any one time.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM