繁体   English   中英

将大型 JSON 文件反序列化为对象并将其添加到数据库时出现性能问题

[英]Performance issue with deserializing large JSON file into objects and add them to database

我正在研究 asp.net 核心 7 MVC 项目,并希望从本地存储的 json 文件(超过 600000 行)将国家/地区列表播种到数据库

这是 JSON 文件的示例

[{
    "id": 1,
    "name": "Afghanistan",
    "iso3": "AFG",
    "iso2": "AF",
    "cities": [
      {
        "id": 141,
        "name": "‘Alāqahdārī Dīshū"
      },
      {
        "id": 53,
        "name": "Aībak"
      },
      {
        "id": 50,
        "name": "Andkhoy"
      },
      {
        "id": 136,
        "name": "Āqchah"
      },
      {
        "id": 137,
        "name": "Ārt Khwājah"
      },
      {
        "id": 51,
        "name": "Asadabad"
      },
      {
        "id": 52,
        "name": "Ashkāsham"
      },
      {
        "id": 138,
        "name": "Āsmār"
      },
      {
        "id": 54,
        "name": "Baghlān"
      },
      {
        "id": 55,
        "name": "Balkh"
      },
    ]
  } 
]

这就是我试图做的。

国家 Model:

public class Country
{
    [Key]
    [DatabaseGenerated(DatabaseGeneratedOption.None)]
    public int Id { get; set; }
    public string Name { get; set; }
    public string ISO2 { get; set; }
    public string ISO3 { get; set; }

    public ICollection<City> Cities { get; set; }
}

市 Model

public class City
{
    [Key]
    [DatabaseGenerated(DatabaseGeneratedOption.None)]
    public int Id { get; set; }
    public string Name { get; set; }
    public int CountryId { get; set; }
    public Country Country { get; set; }
}

反序列化代码

public static async Task SeedCountries(AppDbContext context, IWebHostEnvironment web)
{
    if (!(context.Countries.Count() > 0) || !(context.Cities.Count() > 0))
    {
        string json = await System.IO.File.ReadAllTextAsync(Path.Combine(web.ContentRootPath, "countries.json"));
        var jsonObject = JArray.Parse(json);

        IList<Country> countries = new List<Country>();
        foreach (var item in jsonObject)
        {
            Country country = item.ToObject<Country>();
            var citites = item["cities"] as JArray;
            var citis = new City();
            countries.Add(country);
            foreach (var city in citites)
            {
                City cities = city.ToObject<City>();
                cities.CountryId = country.Id;
                context.Cities.Add(cities);
                context.Entry(cities).State = Microsoft.EntityFrameworkCore.EntityState.Detached;
            }
        }
        await context.Countries.AddRangeAsync(countries);
        await context.SaveChangesAsync();
    }
}

问题在于性能,如下图所示:

图片

你在这里有两个问题:

  1. 您正在将整个 600000 行 JSON 文件加载到单个json字符串中。 此字符串将远大于 85,000 字节,因此将被添加到大 object 堆中,从而导致Why Large Object Heap and why do we care?中所述的问题。 .

  2. 然后,您将该巨大的字符串解析为JArray ,它将占用更多 memory。

我估计你正在创建中间 JSON 字符串,因为你需要使用异步文件读取,但 Json.NET 的序列化程序不支持async反序列化。 但是, JsonTextReader确实支持通过JsonTextReader.ReadAsync()进行异步读取,而JToken确实支持通过Token.LoadAsync()进行异步加载。 将它们放在一起,可以异步迭代一个巨大的 JSON 数组,将每个项目异步加载到JToken中,然后将令牌反序列化为您的最终数组项目(此处为Country ),并使用有限的 memory 。

事实上, Deserializing to AsyncEnumerable using Newtonsoft.Json这个答案有一个扩展方法可以做到这一点:

public static partial class JsonExtensions
{
    /// <summary>
    /// Asynchronously load and synchronously deserialize values from a stream containing a JSON array.  The root object of the JSON stream must in fact be an array, or an exception is thrown
    /// </summary>
    public static async IAsyncEnumerable<T?> DeserializeAsyncEnumerable<T>(Stream stream, JsonSerializerSettings? settings = default, [EnumeratorCancellation] CancellationToken cancellationToken = default)
    {
        // See https://stackoverflow.com/a/72502371/3744182 for the body of this method

因此,从该答案JsonExtensions扩展 class 中的所有代码,现在您将能够创建您的countries地区列表,如下所示:

var fileName = Path.Combine(web.ContentRootPath, "countries.json");

var countries = new List<Country>();

await using var stream = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.Read, bufferSize: 4096, useAsync: true);
await foreach (var country in JsonExtensions.DeserializeAsyncEnumerable<Country>(stream))
{
    if (country == null)
        continue;
    if (country.Cities != null)
        foreach (var city in country.Cities)
        {
            // Do you need another instance of City?  If so you may clone the current one as follows:
            // city = JToken.FromObject(city).ToObject<City>();
            city.CountryId = country.Id;
            context.Cities.Add(city);
            context.Entry(city).State = Microsoft.EntityFrameworkCore.EntityState.Detached;
        }
    countries.Add(country);
}

笔记:

  • 在您的示例代码中,您似乎对Cities进行了两次反序列化。 item.ToObject<Country>(); 将反序列化Country.Cities列表,但随后您在foreach (var city in citites)循环中再次手动反序列化它们。 目前还不清楚你为什么要这样做,因为城市列表已经存在于国家之下,所以我从我的示例代码中删除了重复项

    如果出于某种原因您确实需要为每个City创建两个实例,则可能需要在我的示例代码中的foreach (var city in country.Cities)循环内克隆它们(例如,通过使用 JSON 序列化程序来回传输它们) .

演示小提琴 #1在这里

作为替代方案,正如CodeCaster评论中所建议的,您可以切换到 System.Text.Json,它通过JsonSerializer.DeserializeAsyncEnumerable()内置了对巨大 JSON arrays 异步反序列化的支持:

using System.Text.Json;
using System.Text.Json.Serialization;

var countries = new List<Country>();

await using var stream = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.Read, bufferSize: 4096, useAsync: true);
var options = new JsonSerializerOptions
{
    PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
    //PropertyNameCaseInsensitive is required to deserialize "iso2" and "iso3" correctly
    PropertyNameCaseInsensitive = true,
};
await foreach (var country in JsonSerializer.DeserializeAsyncEnumerable<Country>(stream, options))
{
    if (country == null)
        continue;
    if (country.Cities != null)
        foreach (var city in country.Cities)
        {
            // Do you need another instance of City?  If so you may clone the current one as follows:
            // city = JsonSerializer.Deserialize<City>(JsonSerializer.SerializeToUtf8Bytes(city));
            city.CountryId = country.Id;
            context.Cities.Add(city);
            context.Entry(city).State = Microsoft.EntityFrameworkCore.EntityState.Detached;
        }
    countries.Add(country);
}

演示小提琴 #2在这里

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM