简体   繁体   English

C#BinaryWriter长度前缀 - UTF7编码

[英]C# BinaryWriter length prefix - UTF7 encoding

I've got a project using memory mapped files to let two apps share data with each other. 我有一个项目使用内存映射文件让两个应用程序相互共享数据。 The producer app is written in C#, the consumer app talks plain old C. Both use VS2010. 生产者应用程序是用C#编写的,消费者应用程序用简单的旧C语言编写。两者都使用VS2010。

MSDN says the "BinaryWriter.Write Method(String)" prepends the data with a UTF-7 encoded unsigned integer, and then writes the payload. MSDN称“BinaryWriter.Write Method(String)”在数据前面加上UTF-7编码的无符号整数,然后写入有效负载。 This is exactly where I'm stuck. 这正是我被困住的地方。 If I write a string which is 256 characters in length, the debugger of the C app shows me this byte sequence: 0x80 0x2 <256 times the payload char>. 如果我写了一个长度为256个字符的字符串,那么C app的调试器会显示这个字节序列:0x80 0x2 <256次有效负载char>。 What's the best way to convert the length prefix to something that I can safely use in the consumer app? 将长度前缀转换为我可以在消费者应用中安全使用的最佳方法是什么?

Producer app: 制片人应用:

using System;
using System.IO;
using System.IO.MemoryMappedFiles;
using System.Threading;
using System.Text;
using System.Linq;

class Program
{
    static void Main(string[] args)
    {
        using (MemoryMappedFile mmf_read = MemoryMappedFile.CreateNew("mappedview", 4096))
        {
            using (MemoryMappedViewStream stream = mmf_read.CreateViewStream())
            {
                string str;
                BinaryWriter writer = new BinaryWriter(stream);

                str = string.Join("", Enumerable.Repeat("x", 256));

                writer.Write(str);
            }
        }
    }
}

Consumer app: 消费者应用:

#include <windows.h>
#include <stdio.h>
#include <conio.h>
#include <tchar.h>
#pragma comment(lib, "user32.lib")

#define BUF_SIZE 4096
TCHAR szName[]=TEXT("Global\\mappedview");


int _tmain()
{
    HANDLE hMapFile;
    LPCSTR pBuf;

    hMapFile = OpenFileMapping(
               FILE_MAP_ALL_ACCESS,         // read/write access
               FALSE,                       // do not inherit the name
               szName);                     // name of mapping object

    if (hMapFile == NULL)
    {
        _tprintf(TEXT("Could not open file mapping object (%d).\n"),
         GetLastError());
        return 1;
    }

    pBuf = (LPCSTR) MapViewOfFile(hMapFile,     // handle to map object
           FILE_MAP_ALL_ACCESS,             // read/write permission
           0,
           0,
           BUF_SIZE);

    if (pBuf == NULL)
    {
        _tprintf(TEXT("Could not map view of file (%d).\n"),
                GetLastError());

        CloseHandle(hMapFile);
        return 1;
    }

    printf("Proc1: %s\n\n", pBuf);              // print mapped data

    UnmapViewOfFile(pBuf);

    CloseHandle(hMapFile);

    return 0;
}

br, Chris 克里斯

Despite what the Microsoft documentation says, 尽管微软的文档说,

  1. The prefix number written is in fact an LEB128 encoded count . 写入的前缀号实际上是LEB128编码计数
  2. This is a byte count, not a character count. 这是字节数, 而不是字符数。

The Wiki page I linked gives you decoding code, but I would consider using my own scheme. 我链接的Wiki页面为您提供解码代码,但我会考虑使用自己的方案。 You could convert the string to UTF8 manually using Encoding.GetBytes() and write that to the MMF, prefixing it with a normal unsigned short. 您可以使用Encoding.GetBytes()手动将字符串转换为UTF8,并将其写入MMF,并在其前面加上普通的unsigned short。 That way you have complete control over everything. 这样你就可以完全控制一切。

While the MSDN Documentation on BinaryWriter.Write states it “first writes the length of the string as a UTF-7 encoded unsigned integer”, it is wrong. 虽然BinaryWriter.Write上MSDN文档声明它“首先将字符串的长度写为UTF-7编码的无符号整数”,但这是错误的。 First of all, UTF-7 is a string encoding , you cannot encode integers using UTF-7. 首先,UTF-7是一种字符串编码 ,你不能使用UTF-7编码整数 What the documentation means (and the code does ) is that it writes the length using variable-length 7-bit encoding, sometimes known as LEB128 . 文档的含义 (和代码的作用 )是它使用可变长度7位编码(有时称为LEB128 )写入长度。 In your specific case, the data bytes 80 02 mean the following: 在您的特定情况下,数据字节80 02表示以下内容:

1000 0000 0000 0010

Nbbb bbbb Eaaa aaaa

  • N set to one means this is not the final byte N设置为1表示这不是最后一个字节
  • E set to zero means this is the final byte E设置为零表示这是最后一个字节
  • aaaaaaa and bbbbbbb are the real data; aaaaaaabbbbbbb是真实的数据; the result is therefore: 结果是:

00000100000000

aaaaaaabbbbbbb

Ie 100000000 in binary, which is 256 in decimal. 即二进制100000000 ,十进制256。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM