简体   繁体   中英

How to convert JSON to BSON using Json.net AS A STREAM

I would like to do a streaming conversion of a JSON file to a BSON file. Is this possible given the methods of JsonTextReader and BsonDataWriter ?

Here is the code :

using ( StreamReader textReader = File.OpenText(@"k:\\BrokeredMessage_Alarmhub-Infra-Prd-Sbn_08-06-2019 11-13-34.json" ) )
using ( JsonTextReader jsonTextReader = new JsonTextReader( textReader ))
using ( FileStream oFileStream = new FileStream( @"k:\\output.bson", FileMode.CreateNew ) )
using ( BsonDataWriter datawriter = new BsonDataWriter (oFileStream) )
{
   ...
}

I do not want to deserialize the full content of the JSON file, because I want to read the JSON file and write the BSON file with minimum loads in memory. Is this possible by using a stream?

BsonDataWriter inherits from JsonWriter so you can use JsonWriter.WriteToken(JsonReader) to copy from a JSON stream to a BSON stream (and vice versa using BsonDataReader ):

public static class JsonExtensions
{
    public static void CopyToBson(string inputPath, string outputPath, FileMode fileMode = FileMode.CreateNew)
    {
        using ( var textReader = File.OpenText(inputPath) )
        using ( var jsonReader = new JsonTextReader( textReader ))
        using ( var oFileStream = new FileStream( outputPath, fileMode ) )
        using ( var dataWriter = new BsonDataWriter(oFileStream) )
        {
            dataWriter.WriteToken(jsonReader);
        }
    }
}

Notes:

  1. You might want to add error handling to delete a partially created output file in the event of an error.

  2. The root token of a BSON document must be an object or array, so JSON input consisting only of a primitive value will cause this method to throw an error.

  3. According to the BSON specification an array is a normal BSON document with integer values for the keys, starting with 0 and continuing sequentially. Thus if you convert JSON that contains an array to BSON, then load the BSON into a JToken (or dynamic ), you will get an object with numeric keys instead of an array.

  4. BSON support was moved into its own package Newtonsoft.Json.Bson in Json.NET 10.0.1 . In earlier versions use BsonWriter .

  5. Even though you are working with streams, as explained in this answer to OutOfMemory Exception with Streams and BsonWriter in Json.Net you may not get the memory performance you hope for:

    According to the BSON specification , every object or array - called documents in the standard - must contain at the beginning a count of the total number of bytes comprising the document...

    Newtonsoft's BsonDataWriter and underlying BsonBinaryWriter implement this by caching all tokens to be written in a tree, then when the contents of the root token have been finalized, recursively calculating the sizes before writing the tree out.

Demo fiddle #1 here .


If the token cache created by BsonDataWriter exceeds your system's memory, you will need to manually implement an algorithm that streams from a JsonReader to a BSON stream, seeking back in the output stream to write out the final object size(s) once completed.

For instance, say your root JSON container is an array of JSON objects. Then the following method will serialize the array incrementally then seek back in the stream to write the total size:

public static partial class BsonExtensions
{
    public static void CopyJsonToBson(string inputPath, string outputPath, FileMode fileMode)
    {
        using ( var textReader = File.OpenText(inputPath) )
        using ( var jsonReader = new JsonTextReader( textReader ))
        using ( var oFileStream = new FileStream( outputPath, fileMode ) )
        {
            CopyJsonToBson(jsonReader, oFileStream);
        }
    }

    public static void CopyJsonToBson(JsonReader jsonReader, Stream stream)
    {
        var rootTokenType = jsonReader.ReadToContentAndAssert().TokenType;
        if (!stream.CanSeek || rootTokenType != JsonToken.StartArray)
        {
            using ( var dataWriter = new BsonDataWriter(stream) { CloseOutput = false } )
            {
                dataWriter.WriteToken(jsonReader, stream.CanSeek);
            }
        }
        else
        {
            stream.Flush(); // Just in case.

            var initialPosition = stream.Position;
            var buffer = new byte[256];

            WriteInt(stream, 0, buffer); // CALCULATED SIZE TO BE CALCULATED LATER.

            ulong index = 0;

            while (jsonReader.ReadToContentAndAssert().TokenType != JsonToken.EndArray)
            {
                var bsonType = GetBsonType(jsonReader.TokenType, jsonReader.ValueType);
                stream.WriteByte(unchecked((byte)bsonType));
                WriteString(stream, index.ToString(NumberFormatInfo.InvariantInfo), buffer);
                using (var dataWriter = new BsonDataWriter(stream) { CloseOutput = false })
                {
                    dataWriter.WriteToken(jsonReader);
                }
                index++;
            }

            stream.WriteByte((byte)0);
            stream.Flush();

            var finalPosition = stream.Position;
            stream.Position = initialPosition;

            var size = checked((int)(finalPosition - initialPosition));
            WriteInt(stream, size, buffer); // CALCULATED SIZE TO BE CALCULATED LATER.

            stream.Position = finalPosition;
        }
    }

    private static readonly Encoding Encoding = new UTF8Encoding(false);

    private static void WriteString(Stream stream, string s, byte[] buffer)
    {
        if (s != null)
        {
            if (s.Length < buffer.Length / Encoding.GetMaxByteCount(1))
            {
                var byteCount = Encoding.GetBytes(s, 0, s.Length, buffer, 0);
                stream.Write(buffer, 0, byteCount);
            }
            else
            {
                byte[] bytes = Encoding.GetBytes(s);
                stream.Write(bytes, 0, bytes.Length);
            }
        }

        stream.WriteByte((byte)0);
    }       

    private static void WriteInt(Stream stream, int value, byte[] buffer)
    {
        unchecked
        {
            buffer[0] = (byte) value;
            buffer[1] = (byte) (value >> 8);
            buffer[2] = (byte) (value >> 16);
            buffer[3] = (byte) (value >> 24);
        }
        stream.Write(buffer, 0, 4);
    }

    private static BsonType GetBsonType(JsonToken jsonType, Type valueType)
    {
        switch (jsonType)
        {
            case JsonToken.StartArray:
                return BsonType.Array;

            case JsonToken.StartObject:
                return BsonType.Object;

            case JsonToken.Null:
                return BsonType.Null;

            // Add primitives as required.

            default:
                throw new JsonWriterException(string.Format("BsonType for {0} not implemented.", jsonType));
        }
    }

    //Copied from: https://github.com/JamesNK/Newtonsoft.Json.Bson/blob/master/Src/Newtonsoft.Json.Bson/BsonType.cs
    //Original source: http://bsonspec.org/spec.html
    enum BsonType : sbyte
    {
        Number = 1,
        String = 2,
        Object = 3,
        Array = 4,
        Binary = 5,
        Undefined = 6,
        Oid = 7,
        Boolean = 8,
        Date = 9,
        Null = 10,
        Regex = 11,
        Reference = 12,
        Code = 13,
        Symbol = 14,
        CodeWScope = 15,
        Integer = 16,
        TimeStamp = 17,
        Long = 18,
        MinKey = -1,
        MaxKey = 127
    }       
}

public static partial class JsonExtensions
{
    public static JsonReader ReadToContentAndAssert(this JsonReader reader)
    {
        return reader.ReadAndAssert().MoveToContentAndAssert();
    }

    public static JsonReader MoveToContentAndAssert(this JsonReader reader)
    {
        if (reader == null)
            throw new ArgumentNullException();
        if (reader.TokenType == JsonToken.None)       // Skip past beginning of stream.
            reader.ReadAndAssert();
        while (reader.TokenType == JsonToken.Comment) // Skip past comments.
            reader.ReadAndAssert();
        return reader;
    }

    public static JsonReader ReadAndAssert(this JsonReader reader)
    {
        if (reader == null)
            throw new ArgumentNullException();
        if (!reader.Read())
            throw new JsonReaderException("Unexpected end of JSON stream.");
        return reader;
    }
}

Then use it as follows:

var inputPath = @"k:\\BrokeredMessage_Alarmhub-Infra-Prd-Sbn_08-06-2019 11-13-34.json";
var outputPath = @"k:\\output.bson";

BsonExtensions.CopyJsonToBson(inputPath, outputPath, FileMode.Create);

Notes:

  1. I implemented streaming+seeking specifically for the case of an array because that seems to be the most common scenario for huge JSON files.

  2. That being said, it could be extended to stream JSON objects by following the document specification in the standard , and could be extended to handle primitive values by enhancing BsonExtensions.GetBsonType() and formatting them as appropriate.

  3. Having done so, it would be possible for the routine to call itself recursively, which might be useful when a root object contains a very large array as a member. (Though, at this point, you've basically written your own version of BsonDataWriter .)

    However, doing so might result in a substantial number of seeks within the output stream which could greatly impact performance.

Demo fiddle #2 here .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM