简体   繁体   中英

Storing 2D Array with Protobuf (C#)

I have a large multi-dimensional array that needs to be stored with protobuf. The array could have up to 5120*5120 = 26,214,400 items in it. Protobuf does not support storing multi-dimensional arrays, unfortunately.

As a test, I wrote two functions and an extra class. The class stores and x,y which points to the location inside of the array (array[x, y]). The class has a "value" that is the data from the array[x,y]. I use a List to store this data.

When I generate a fairly small array (1024*1024) I get an output file that is over 169MB. From my testing, it loads and generates the file extremely fast so there's no issue there. However, the file size is huge - I definitely need to cut down on size.

Is this a normal file size, or do I to rethink my entire process? Should I compress the data before saving it (zipping the file takes it from 169MB to 6MB)? If so, what's the fastest/easiest way to zip a file in C#?

This is pseudo code that is based on my real code.

[ProtoContract]
public class Example
{
    [ProtoIgnore]
    public string[,] MyArray { get; set; }

    [ProtoMember(0)]
    private List<MultiArray> Storage { get; set; }

    public void MoveToList()
    {
        for (int x = 0; x < MyArray.GetLength(0); x++)
        {
            for (int y = 0; y < MyArray.GetLength(1); y++)
            {
                Storage.Add(new MultiArray
                {
                    _x = x,
                    _y = y,
                    value = MyArray[x, y]
                }); 
            }
        }
    }

    public void MoveToArray()
    {
        MyArray = new string[1024, 1024];
        for (int i = 0; i < Storage.Count; i++)
        {
            MyArray[Storage[i].X, Storage[i].Y] = Storage[i]._value;
        }
    }
}

[ProtoContract]

public class MultiArray
{
    [ProtoMember(0)]
    public int _y { get; set; }
    [ProtoMember(1)]
    public int _x { get; set; }
    [ProtoMember(2)]
    public string _value { get; set; }
}

Notes: The value must be the correct x/y of the array.

I appreciate any suggestions.

I don't know about the storage but this is probably not the right way to do it.
The way you are doing it, you are creating a MultiArray object for every cell of your array.
A simplier and more efficient solution would be to do that:

String[] Storage = new String[1024*1024];
int width = 1024
int height = 1024;
for (int x = 0; x < width; x++)
{
    for (int y = 0; y < height; y++)
    {
        Storage[x*width+y]=MyArray[x,y];
    }
}  

Ultimately, the protobuf format doesn't have a concept of arrays of higher dimension than one.

At the library level since you're using protobuf-net we could have the library do some magic here, essentially treating it as;

message Array_T {
    repeated int32 dimensions;
    repeated T items; // packed when possible
}

(noting that.proto doesn't actually support generics, but that doesn't really matter at the library level)

However, this would be a little awkward from a x-plat perspective.

But to test whether this would help, you could linearize your 2D array, and see what space it takes.

In your case, I suspect the real problem (re the size) is the quantity of strings. Protobuf writes string contents every time , without any attempt at lookup tables. It may also be worth checking what the sunlm total of string lengths (in UTF-8 bytes) is for your array contents.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM