[英]Performance issue with deserializing large JSON file into objects and add them to database
我正在研究 asp.net 核心 7 MVC 項目,並希望從本地存儲的 json 文件(超過 600000 行)將國家/地區列表播種到數據庫
這是 JSON 文件的示例
[{
"id": 1,
"name": "Afghanistan",
"iso3": "AFG",
"iso2": "AF",
"cities": [
{
"id": 141,
"name": "‘Alāqahdārī Dīshū"
},
{
"id": 53,
"name": "Aībak"
},
{
"id": 50,
"name": "Andkhoy"
},
{
"id": 136,
"name": "Āqchah"
},
{
"id": 137,
"name": "Ārt Khwājah"
},
{
"id": 51,
"name": "Asadabad"
},
{
"id": 52,
"name": "Ashkāsham"
},
{
"id": 138,
"name": "Āsmār"
},
{
"id": 54,
"name": "Baghlān"
},
{
"id": 55,
"name": "Balkh"
},
]
}
]
這就是我試圖做的。
國家 Model:
public class Country
{
[Key]
[DatabaseGenerated(DatabaseGeneratedOption.None)]
public int Id { get; set; }
public string Name { get; set; }
public string ISO2 { get; set; }
public string ISO3 { get; set; }
public ICollection<City> Cities { get; set; }
}
市 Model
public class City
{
[Key]
[DatabaseGenerated(DatabaseGeneratedOption.None)]
public int Id { get; set; }
public string Name { get; set; }
public int CountryId { get; set; }
public Country Country { get; set; }
}
反序列化代碼
public static async Task SeedCountries(AppDbContext context, IWebHostEnvironment web)
{
if (!(context.Countries.Count() > 0) || !(context.Cities.Count() > 0))
{
string json = await System.IO.File.ReadAllTextAsync(Path.Combine(web.ContentRootPath, "countries.json"));
var jsonObject = JArray.Parse(json);
IList<Country> countries = new List<Country>();
foreach (var item in jsonObject)
{
Country country = item.ToObject<Country>();
var citites = item["cities"] as JArray;
var citis = new City();
countries.Add(country);
foreach (var city in citites)
{
City cities = city.ToObject<City>();
cities.CountryId = country.Id;
context.Cities.Add(cities);
context.Entry(cities).State = Microsoft.EntityFrameworkCore.EntityState.Detached;
}
}
await context.Countries.AddRangeAsync(countries);
await context.SaveChangesAsync();
}
}
問題在於性能,如下圖所示:
你在這里有兩個問題:
您正在將整個 600000 行 JSON 文件加載到單個json
字符串中。 此字符串將遠大於 85,000 字節,因此將被添加到大 object 堆中,從而導致Why Large Object Heap and why do we care?中所述的問題。 .
然后,您將該巨大的字符串解析為JArray
,它將占用更多 memory。
我估計你正在創建中間 JSON 字符串,因為你需要使用異步文件讀取,但 Json.NET 的序列化程序不支持async
反序列化。 但是, JsonTextReader
確實支持通過JsonTextReader.ReadAsync()
進行異步讀取,而JToken
確實支持通過Token.LoadAsync()
進行異步加載。 將它們放在一起,可以異步迭代一個巨大的 JSON 數組,將每個項目異步加載到JToken
中,然后將令牌反序列化為您的最終數組項目(此處為Country
),並使用有限的 memory 。
事實上, Deserializing to AsyncEnumerable using Newtonsoft.Json這個答案有一個擴展方法可以做到這一點:
public static partial class JsonExtensions
{
/// <summary>
/// Asynchronously load and synchronously deserialize values from a stream containing a JSON array. The root object of the JSON stream must in fact be an array, or an exception is thrown
/// </summary>
public static async IAsyncEnumerable<T?> DeserializeAsyncEnumerable<T>(Stream stream, JsonSerializerSettings? settings = default, [EnumeratorCancellation] CancellationToken cancellationToken = default)
{
// See https://stackoverflow.com/a/72502371/3744182 for the body of this method
因此,從該答案中JsonExtensions
擴展 class 中的所有代碼,現在您將能夠創建您的countries
地區列表,如下所示:
var fileName = Path.Combine(web.ContentRootPath, "countries.json");
var countries = new List<Country>();
await using var stream = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.Read, bufferSize: 4096, useAsync: true);
await foreach (var country in JsonExtensions.DeserializeAsyncEnumerable<Country>(stream))
{
if (country == null)
continue;
if (country.Cities != null)
foreach (var city in country.Cities)
{
// Do you need another instance of City? If so you may clone the current one as follows:
// city = JToken.FromObject(city).ToObject<City>();
city.CountryId = country.Id;
context.Cities.Add(city);
context.Entry(city).State = Microsoft.EntityFrameworkCore.EntityState.Detached;
}
countries.Add(country);
}
筆記:
在您的示例代碼中,您似乎對Cities
進行了兩次反序列化。 item.ToObject<Country>();
將反序列化Country.Cities
列表,但隨后您在foreach (var city in citites)
循環中再次手動反序列化它們。 目前還不清楚你為什么要這樣做,因為城市列表已經存在於國家之下,所以我從我的示例代碼中刪除了重復項
如果出於某種原因您確實需要為每個City
創建兩個實例,則可能需要在我的示例代碼中的foreach (var city in country.Cities)
循環內克隆它們(例如,通過使用 JSON 序列化程序來回傳輸它們) .
演示小提琴 #1在這里。
作為替代方案,正如CodeCaster在評論中所建議的,您可以切換到 System.Text.Json,它通過JsonSerializer.DeserializeAsyncEnumerable()
內置了對巨大 JSON arrays 異步反序列化的支持:
using System.Text.Json;
using System.Text.Json.Serialization;
var countries = new List<Country>();
await using var stream = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.Read, bufferSize: 4096, useAsync: true);
var options = new JsonSerializerOptions
{
PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
//PropertyNameCaseInsensitive is required to deserialize "iso2" and "iso3" correctly
PropertyNameCaseInsensitive = true,
};
await foreach (var country in JsonSerializer.DeserializeAsyncEnumerable<Country>(stream, options))
{
if (country == null)
continue;
if (country.Cities != null)
foreach (var city in country.Cities)
{
// Do you need another instance of City? If so you may clone the current one as follows:
// city = JsonSerializer.Deserialize<City>(JsonSerializer.SerializeToUtf8Bytes(city));
city.CountryId = country.Id;
context.Cities.Add(city);
context.Entry(city).State = Microsoft.EntityFrameworkCore.EntityState.Detached;
}
countries.Add(country);
}
演示小提琴 #2在這里。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.