[英]Copy file from one DataLake Gen2 to another Data Lake Gen 2 via C# in Azure Functions
[英]c# parallel writes to Azure Data Lake File
在我们的Azure Data Lake中,我们每天都有记录事件的文件以及这些事件的坐标。 我们需要获取这些坐标并查找这些坐标所属的州,县,乡镇和分区。 我尝试了此代码的多个版本。
这就是我遇到的问题。 我无法可靠地将行写入ADLS中的文件。 这是我现在拥有的代码。
public static void WriteGeocodedOutput(string Contents, String outputFileName, ILogger log) {
AdlsClient client = AdlsClient.CreateClient(ADlSAccountName, adlCreds);
//if the file doesn't exist write the header first
try {
if (!client.CheckExists(outputFileName)) {
using (var stream = client.CreateFile(outputFileName, IfExists.Fail)) {
byte[] headerByteArray = Encoding.UTF8.GetBytes("EventDate, Longitude, Latitude, RadarSiteID, CellID, RangeNauticalMiles, Azimuth, SevereProbability, Probability, MaxSizeinInchesInUS, StateCode, CountyCode, TownshipCode, RangeCode\r\n");
//stream.Write(headerByteArray, 0, headerByteArray.Length);
client.ConcurrentAppend(outputFileName, true, headerByteArray, 0, headerByteArray.Length);
}
}
} catch (Exception e) {
log.LogInformation("multiple attempts to create the file. Ignoring this error, since the file was created.");
}
//the write the data
byte[] textByteArray = Encoding.UTF8.GetBytes(Contents);
for (int attempt = 0; attempt < 5; attempt++) {
try {
log.LogInformation("prior to write, the outputfile size is: " + client.GetDirectoryEntry(outputFileName).Length);
var offset = client.GetDirectoryEntry(outputFileName).Length;
client.ConcurrentAppend(outputFileName, false, textByteArray, 0, textByteArray.Length);
log.LogInformation("AFTER write, the outputfile size is: " + client.GetDirectoryEntry(outputFileName).Length);
//if successful, stop trying to write this row
attempt = 6;
}
catch (Exception e){
log.LogInformation($"exception on adls write: {e}");
}
Random rnd = new Random();
Thread.Sleep(rnd.Next(attempt * 60));
}
}
该文件将在需要时创建,但是我确实在日志中收到几条消息,其中有多个线程试图创建该文件。 我并不总是写标题行。
我也不再仅获得任何数据行:
"BadRequest ( IllegalArgumentException concurrentappend failed with error 0xffffffff83090a6f
(Bad request. The target file does not support this particular type of append operation.
If the concurrent append operation has been used with this file in the past, you need to append to this file using the concurrent append operation.
If the append operation with offset has been used in the past, you need to append to this file using the append operation with offset.
On the same file, it is not possible to use both of these operations.). []
我觉得这里缺少一些基本的设计思想。 该代码应尝试将行写入文件。 如果文件尚不存在,请创建该文件并将其放入标题行。然后,将该行放入。
完成这种写方案的最佳实践方法是什么?
关于如何在ADLS中处理这种并行写入工作负载还有其他建议吗?
我有点迟了,但是我想问题之一可能是由于在同一文件流上使用了“ Create”和“ ConcurrentAppend”? ADLS文档提到它们不能在同一文件上使用。 也许尝试将“创建”命令更改为“ ConcurrentAppend”,因为后者不存在时可用于创建文件。
另外,如果您找到了更好的方法,请在此处发布您的解决方案。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.