您能用Google的协议缓冲区格式表示CSV数据吗？

Question

I've recently found out about protocol buffers and was wondering if they could be applied to my specific problem. 我最近发现了协议缓冲区，并想知道它们是否可以应用于我的具体问题。

Basically I have some CSV data that I need to convert to a more compact format for storage as some of the files are several gig. 基本上我有一些CSV数据，我需要转换为更紧凑的格式存储，因为一些文件是几个演出。

Each field in the CSV has a header, and there are only two types, strings and decimals (because sometimes there are alot of significant digits and I need to handle all numbers the same way). CSV中的每个字段都有一个标题，只有两种类型，字符串和小数（因为有时会有很多有效数字，我需要以相同的方式处理所有数字）。 But each file will have different column names for each field. 但是每个文件的每个字段都有不同的列名。

As well as capturing the original CSV data I need to be able to add extra information to the file before saving. 除了捕获原始CSV数据，我还需要能够在保存之前向文件中添加额外信息。 And I was hoping to make this future proof by handling different file versions. 我希望通过处理不同的文件版本来证明这一点。

So, is it possible to use protocol buffers to capture a random number of randomly named columns of data, like a CSV file? 那么，是否可以使用协议缓冲区来捕获随机数量的随机命名数据列，如CSV文件？

Answer 1

Well, it's certainly representable. 嗯，它肯定是可以代表的。 Something like: 就像是：

message CsvFile {
    repeated CsvHeader header = 1;
    repeated CsvRow row = 2;
}

message CsvHeader {
    require string name = 1;
    require ColumnType type = 2;
}

enum ColumnType {
    DECIMAL = 1;
    STRING = 2;
}

message CsvRow {
    repeated CsvValue value = 1;
}

// Note that the column is implicit based on position within row    
message CsvValue {
    optional string string_value = 1;
    optional Decimal decimal_value = 2;
}

message Decimal {
    // However you want to represent it (there are various options here)
}

I'm not sure how much benefit it will provide, mind you... You can certainly add more information (add to the CsvFile message) and future proofing is in the "normal PB way" - only add optional fields, etc. 我不确定它会提供多少好处，请注意......您当然可以添加更多信息（添加到CsvFile消息），以及将来的校对是“正常的PB方式” - 只添加可选字段等。

Answer 2

Well, protobuf-net (my version) is based on regular .NET types, so no (since it won't cope with different schemas all the time). 好吧，protobuf-net（我的版本）基于常规的.NET类型，所以没有（因为它不会一直处理不同的模式）。 But Jon's version might allow dynamic types. 但Jon的版本可能允许动态类型。 Personally, I'd just use CSV and run it through GZipStream - I expect that will be fine for the purpose. 就个人而言，我只是使用CSV并通过GZipStream运行它 - 我希望这样可以达到目的。

Edit: actually, I forgot: protobuf-net does support extensible objects, but you need to be a bit careful... it would depend on the full context, I expect. 编辑：实际上，我忘记了：protobuf-net确实支持可扩展对象，但你需要小心一点......这将取决于完整的上下文，我期待。

Plus Jon's approach of nested data would probably work too. 加上Jon的嵌套数据方法也可能有用。

您能用Google的协议缓冲区格式表示CSV数据吗？

问题描述

2 个解决方案

解决方案1
4 已采纳 2008-12-16 14:38:07

解决方案2
1 2008-12-16 14:32:19

您能用Google的协议缓冲区格式表示CSV数据吗？

问题描述

2 个解决方案

解决方案1 4 已采纳 2008-12-16 14:38:07

解决方案2 1 2008-12-16 14:32:19

解决方案1
4 已采纳 2008-12-16 14:38:07

解决方案2
1 2008-12-16 14:32:19