简体   繁体   English

强制HDFS globStatus跳过没有权限的目录

[英]Force HDFS globStatus to skip directories it doesn't have permissions to

So I need to collect a very large number of directories, themselves containing subdirectories, from HDFS, and I want to be able to use globStatus. 因此,我需要从HDFS收集大量目录,这些目录本身包含子目录,并且我希望能够使用globStatus。 My Path pattern essentially looks like this: 我的路径模式基本上如下所示:

"/directory/*/{opt1,opt2}/{opt1,opt2,opt3}*"

Unfortunately, for some of the directories captured by the *, I don't have execute permissions (can't view contents), but the glob attempts to look inside, causing an exception. 不幸的是,对于*捕获的某些目录,我没有执行权限(无法查看内容),但是glob试图在内部查找,从而导致异常。 Is there any way to request that the glob simply skip over directories for which it doesn't have permissions, rather than failing completely? 有什么方法可以要求Glob仅仅跳过它没有权限的目录,而不是完全失败?

I am aware that there are other methods through which I could achieve the same goal, but as far as I can tell it would be more complex, and I think require more requests to HDFS, than a simple glob. 我知道还有其他方法可以实现相同的目标,但是据我所知,它将比一个简单的glob更复杂,并且我认为对HDFS的请求更多。

Answering this in case anyone else comes across this question... 如果有人遇到这个问题,请回答这个问题。

The filtering behavior for globStatus is done client-side as part of the FileSystem / Globber class. globStatus的过滤行为是作为FileSystem / Globber类的一部分在客户端完成的。 Under the hood it is really just submitting a series of listStatus commands and filtering the return value(s). 实际上,它实际上只是提交一系列listStatus命令并过滤返回值。 To get the behavior described will require some custom logic, but won't be any less efficient than the globStatus API. 要获得描述的行为,将需要一些自定义逻辑,但是效率不会比globStatus API低。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM