![](/img/trans.png)
[英]Copy file from one DataLake Gen2 to another Data Lake Gen 2 via C# in Azure Functions
[英]c# parallel writes to Azure Data Lake File
在我們的Azure Data Lake中,我們每天都有記錄事件的文件以及這些事件的坐標。 我們需要獲取這些坐標並查找這些坐標所屬的州,縣,鄉鎮和分區。 我嘗試了此代碼的多個版本。
這就是我遇到的問題。 我無法可靠地將行寫入ADLS中的文件。 這是我現在擁有的代碼。
public static void WriteGeocodedOutput(string Contents, String outputFileName, ILogger log) {
AdlsClient client = AdlsClient.CreateClient(ADlSAccountName, adlCreds);
//if the file doesn't exist write the header first
try {
if (!client.CheckExists(outputFileName)) {
using (var stream = client.CreateFile(outputFileName, IfExists.Fail)) {
byte[] headerByteArray = Encoding.UTF8.GetBytes("EventDate, Longitude, Latitude, RadarSiteID, CellID, RangeNauticalMiles, Azimuth, SevereProbability, Probability, MaxSizeinInchesInUS, StateCode, CountyCode, TownshipCode, RangeCode\r\n");
//stream.Write(headerByteArray, 0, headerByteArray.Length);
client.ConcurrentAppend(outputFileName, true, headerByteArray, 0, headerByteArray.Length);
}
}
} catch (Exception e) {
log.LogInformation("multiple attempts to create the file. Ignoring this error, since the file was created.");
}
//the write the data
byte[] textByteArray = Encoding.UTF8.GetBytes(Contents);
for (int attempt = 0; attempt < 5; attempt++) {
try {
log.LogInformation("prior to write, the outputfile size is: " + client.GetDirectoryEntry(outputFileName).Length);
var offset = client.GetDirectoryEntry(outputFileName).Length;
client.ConcurrentAppend(outputFileName, false, textByteArray, 0, textByteArray.Length);
log.LogInformation("AFTER write, the outputfile size is: " + client.GetDirectoryEntry(outputFileName).Length);
//if successful, stop trying to write this row
attempt = 6;
}
catch (Exception e){
log.LogInformation($"exception on adls write: {e}");
}
Random rnd = new Random();
Thread.Sleep(rnd.Next(attempt * 60));
}
}
該文件將在需要時創建,但是我確實在日志中收到幾條消息,其中有多個線程試圖創建該文件。 我並不總是寫標題行。
我也不再僅獲得任何數據行:
"BadRequest ( IllegalArgumentException concurrentappend failed with error 0xffffffff83090a6f
(Bad request. The target file does not support this particular type of append operation.
If the concurrent append operation has been used with this file in the past, you need to append to this file using the concurrent append operation.
If the append operation with offset has been used in the past, you need to append to this file using the append operation with offset.
On the same file, it is not possible to use both of these operations.). []
我覺得這里缺少一些基本的設計思想。 該代碼應嘗試將行寫入文件。 如果文件尚不存在,請創建該文件並將其放入標題行。然后,將該行放入。
完成這種寫方案的最佳實踐方法是什么?
關於如何在ADLS中處理這種並行寫入工作負載還有其他建議嗎?
我有點遲了,但是我想問題之一可能是由於在同一文件流上使用了“ Create”和“ ConcurrentAppend”? ADLS文檔提到它們不能在同一文件上使用。 也許嘗試將“創建”命令更改為“ ConcurrentAppend”,因為后者不存在時可用於創建文件。
另外,如果您找到了更好的方法,請在此處發布您的解決方案。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.