簡體   English   中英

將大型 JSON 文件反序列化為對象並將其添加到數據庫時出現性能問題

[英]Performance issue with deserializing large JSON file into objects and add them to database

我正在研究 asp.net 核心 7 MVC 項目,並希望從本地存儲的 json 文件(超過 600000 行)將國家/地區列表播種到數據庫

這是 JSON 文件的示例

[{
    "id": 1,
    "name": "Afghanistan",
    "iso3": "AFG",
    "iso2": "AF",
    "cities": [
      {
        "id": 141,
        "name": "‘Alāqahdārī Dīshū"
      },
      {
        "id": 53,
        "name": "Aībak"
      },
      {
        "id": 50,
        "name": "Andkhoy"
      },
      {
        "id": 136,
        "name": "Āqchah"
      },
      {
        "id": 137,
        "name": "Ārt Khwājah"
      },
      {
        "id": 51,
        "name": "Asadabad"
      },
      {
        "id": 52,
        "name": "Ashkāsham"
      },
      {
        "id": 138,
        "name": "Āsmār"
      },
      {
        "id": 54,
        "name": "Baghlān"
      },
      {
        "id": 55,
        "name": "Balkh"
      },
    ]
  } 
]

這就是我試圖做的。

國家 Model:

public class Country
{
    [Key]
    [DatabaseGenerated(DatabaseGeneratedOption.None)]
    public int Id { get; set; }
    public string Name { get; set; }
    public string ISO2 { get; set; }
    public string ISO3 { get; set; }

    public ICollection<City> Cities { get; set; }
}

市 Model

public class City
{
    [Key]
    [DatabaseGenerated(DatabaseGeneratedOption.None)]
    public int Id { get; set; }
    public string Name { get; set; }
    public int CountryId { get; set; }
    public Country Country { get; set; }
}

反序列化代碼

public static async Task SeedCountries(AppDbContext context, IWebHostEnvironment web)
{
    if (!(context.Countries.Count() > 0) || !(context.Cities.Count() > 0))
    {
        string json = await System.IO.File.ReadAllTextAsync(Path.Combine(web.ContentRootPath, "countries.json"));
        var jsonObject = JArray.Parse(json);

        IList<Country> countries = new List<Country>();
        foreach (var item in jsonObject)
        {
            Country country = item.ToObject<Country>();
            var citites = item["cities"] as JArray;
            var citis = new City();
            countries.Add(country);
            foreach (var city in citites)
            {
                City cities = city.ToObject<City>();
                cities.CountryId = country.Id;
                context.Cities.Add(cities);
                context.Entry(cities).State = Microsoft.EntityFrameworkCore.EntityState.Detached;
            }
        }
        await context.Countries.AddRangeAsync(countries);
        await context.SaveChangesAsync();
    }
}

問題在於性能,如下圖所示:

圖片

你在這里有兩個問題:

  1. 您正在將整個 600000 行 JSON 文件加載到單個json字符串中。 此字符串將遠大於 85,000 字節,因此將被添加到大 object 堆中,從而導致Why Large Object Heap and why do we care?中所述的問題。 .

  2. 然后,您將該巨大的字符串解析為JArray ,它將占用更多 memory。

我估計你正在創建中間 JSON 字符串,因為你需要使用異步文件讀取,但 Json.NET 的序列化程序不支持async反序列化。 但是, JsonTextReader確實支持通過JsonTextReader.ReadAsync()進行異步讀取,而JToken確實支持通過Token.LoadAsync()進行異步加載。 將它們放在一起,可以異步迭代一個巨大的 JSON 數組,將每個項目異步加載到JToken中,然后將令牌反序列化為您的最終數組項目(此處為Country ),並使用有限的 memory 。

事實上, Deserializing to AsyncEnumerable using Newtonsoft.Json這個答案有一個擴展方法可以做到這一點:

public static partial class JsonExtensions
{
    /// <summary>
    /// Asynchronously load and synchronously deserialize values from a stream containing a JSON array.  The root object of the JSON stream must in fact be an array, or an exception is thrown
    /// </summary>
    public static async IAsyncEnumerable<T?> DeserializeAsyncEnumerable<T>(Stream stream, JsonSerializerSettings? settings = default, [EnumeratorCancellation] CancellationToken cancellationToken = default)
    {
        // See https://stackoverflow.com/a/72502371/3744182 for the body of this method

因此,從該答案JsonExtensions擴展 class 中的所有代碼,現在您將能夠創建您的countries地區列表,如下所示:

var fileName = Path.Combine(web.ContentRootPath, "countries.json");

var countries = new List<Country>();

await using var stream = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.Read, bufferSize: 4096, useAsync: true);
await foreach (var country in JsonExtensions.DeserializeAsyncEnumerable<Country>(stream))
{
    if (country == null)
        continue;
    if (country.Cities != null)
        foreach (var city in country.Cities)
        {
            // Do you need another instance of City?  If so you may clone the current one as follows:
            // city = JToken.FromObject(city).ToObject<City>();
            city.CountryId = country.Id;
            context.Cities.Add(city);
            context.Entry(city).State = Microsoft.EntityFrameworkCore.EntityState.Detached;
        }
    countries.Add(country);
}

筆記:

  • 在您的示例代碼中,您似乎對Cities進行了兩次反序列化。 item.ToObject<Country>(); 將反序列化Country.Cities列表,但隨后您在foreach (var city in citites)循環中再次手動反序列化它們。 目前還不清楚你為什么要這樣做,因為城市列表已經存在於國家之下,所以我從我的示例代碼中刪除了重復項

    如果出於某種原因您確實需要為每個City創建兩個實例,則可能需要在我的示例代碼中的foreach (var city in country.Cities)循環內克隆它們(例如,通過使用 JSON 序列化程序來回傳輸它們) .

演示小提琴 #1在這里

作為替代方案,正如CodeCaster評論中所建議的,您可以切換到 System.Text.Json,它通過JsonSerializer.DeserializeAsyncEnumerable()內置了對巨大 JSON arrays 異步反序列化的支持:

using System.Text.Json;
using System.Text.Json.Serialization;

var countries = new List<Country>();

await using var stream = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.Read, bufferSize: 4096, useAsync: true);
var options = new JsonSerializerOptions
{
    PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
    //PropertyNameCaseInsensitive is required to deserialize "iso2" and "iso3" correctly
    PropertyNameCaseInsensitive = true,
};
await foreach (var country in JsonSerializer.DeserializeAsyncEnumerable<Country>(stream, options))
{
    if (country == null)
        continue;
    if (country.Cities != null)
        foreach (var city in country.Cities)
        {
            // Do you need another instance of City?  If so you may clone the current one as follows:
            // city = JsonSerializer.Deserialize<City>(JsonSerializer.SerializeToUtf8Bytes(city));
            city.CountryId = country.Id;
            context.Cities.Add(city);
            context.Entry(city).State = Microsoft.EntityFrameworkCore.EntityState.Detached;
        }
    countries.Add(country);
}

演示小提琴 #2在這里

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM