简体   繁体   中英

does hadoop support file level lock

I wonder if hadoop support file level lock or not?

Facing issue in production batch job setup-

Scenario is mentioned below:

I have to refer enterprized data in my batch job which gets refreshed by some other external application on which I do not have control. Now my initial set of jobs runs around 6 hrs and then few sequential jobs starts executing. After this I have another hadoop job which again refer the same enterprized data from same location which was referred by first set of jobs. This scripts runs for more than 4 hrs and because of total 10 hrs time window there is a possibility that refresh job again run which first delete the file and create it again. If my second job still executing during second refrsnot h there would be high probability that my job will fail because it will not find the file because refresh jobs would have deleted.

So is there a way to control this using any file lock.machenism in hadoop what we have with other rdbms.

We had a similar requirement. We created a table in mysql which would hold the locks for a folder. Any job starting out will need to get locks and will check the table and fail or reschedule if any of the folders it was operating on already had locks.. But as such there is no locking mechanism in Hadoop.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM