简体   繁体   English

Azure Data Lake Store并发

[英]Azure Data Lake Store concurrency

I've been toying with Azure Data Lake Store and in the documentation Microsoft claims that the system is optimized for low-latency small writes to files. 我一直在玩Azure Data Lake Store,在文档中,Microsoft声称该系统针对文件的低延迟小写入进行了优化。 Testing it out I tried to perform a big amount of writes on parallel tasks to a single file, but this method fails in most cases returning a Bad Request. 测试它我尝试对单个文件执行大量的并行任务写入,但是这种方法在大多数情况下都会失败,返回错误的请求。 This link https://issues.apache.org/jira/secure/attachment/12445209/appendDesign3.pdf shows that HDFS isn't made to handle concurrent appends on a single file, so I tried a second time using the ConcurrentAppendAsync method found in the API, but although the method doesn't crash, my file's never modified on the store. 此链接https://issues.apache.org/jira/secure/attachment/12445209/appendDesign3.pdf显示HDFS不是为了处理单个文件上的并发附加,所以我尝试第二次使用找到的ConcurrentAppendAsync方法在API中,虽然方法没有崩溃,但我的文件从未在商店中修改过。

What you have found out is correct about how parallel writes will work. 你发现的是关于并行写入如何工作的正确性。 I am assuming you have already read the documentation of ConcurrentAppendAsync. 我假设您已经阅读了ConcurrentAppendAsync的文档

So, in your case, did you use the same file for the Webhdfs write test and the ConcurrentAppendAsync? 那么,在您的情况下,您是否使用相同的文件进行Webhdfs写入测试和ConcurrentAppendAsync? If that's the case, then ConcurrentAppendAsync will not work, as mentioned in the documentation. 如果是这种情况,那么ConcurrentAppendAsync将不起作用,如文档中所述。 But you should have got an error in that case. 但在这种情况下你应该有一个错误。

In any case, let us know what happened and we can investigate further. 无论如何,让我们知道发生了什么,我们可以进一步调查。

Thanks, 谢谢,

Sachin Sheth 萨钦谢思

Program Manager - Azure Data Lake 程序经理 - Azure Data Lake

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM