i am trying to write a python program which calls a bash script to run on new data in the directory.
I have several hundred sub-directories in my directory. Every hour few sub directories are generated. I am trying to get into these new sub-directories and run my script on the data inside it.
Lets say path for my directory is /data1/realtime:
In directory 'realtime' there are new sub-directories generated every hour. How can i know new sub-dir generated every hour and get inside each one of those one by one..?
Many Thanks!!
yash
You can uses os.listdir sets to compare:
import os
path = "."
prev = [d for d in os.listdir(".") if os.path.isdir(os.path.join(path,d))]
os.mkdir("foo")
curr = [d for d in os.listdir(".") if os.path.isdir(os.path.join(path,d))]
new = set([d for d in os.listdir(".") if os.path.isdir(os.path.join(path,d))]).difference(prev)
for d in new:
print(new)
Use the find
command (in your shell):
find /data1/realtime -mmin -60 -type d
It will print all directories that has been created or that has had files or subdirectories added, removed or renamed the last 60 minutes.
You can of course call this from Python's subprocess
module if needed, but since you are using bash already, maybe you can use it in the bash script directly?
Here's how to call find using subprocess
:
import subprocess
directories = subprocess.check_output(
['find', '/data1/realtime', '-type', 'd', '-mmin', '-60']
).splitlines()
# directories content: ['/data1/realtime/dir1000', ...]
This might catch directories that are in the process of being created, like msw said in the comments, so if you want to find directories that are created the last hour but not more recently than 5 minutes ago, you can add another test to find
:
find /data1/realtime -mmin -60 -mmin +5 -type d
Just to see how this works, here's a bash session:
$ find --version
find (GNU findutils) 4.4.2
...
$ mkdir /tmp/test
$ cd /tmp/test
$ date
Mon Feb 9 21:27:00 CET 2015
$ touch a
$ touch -t 02092100 b # 27 minutes ago
$ touch -t 02082100 c # yesterday
$ ls -alh
total 0
drwxr-xr-x 2 andre andre 100 Feb 9 21:27 .
drwxrwxrwt 24 root root 520 Feb 9 21:26 ..
-rw-r--r-- 1 andre andre 0 Feb 9 21:27 a
-rw-r--r-- 1 andre andre 0 Feb 9 21:00 b
-rw-r--r-- 1 andre andre 0 Feb 8 21:00 c
$ find . -mmin -60 -mmin +5
./b
As expected, the newly created file (a) and the file from yesterday (c) are excluded, but the file that was updated 27 minutes ago (b) is included. This should work if you are
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.