简体   繁体   English

Qt:Linux下Windows格式数据的安全解析

[英]Qt: Safe parsing of Windows format data under Linux

I have a Server-Client application in which JSON data is send between those.我有一个服务器-客户端应用程序,其中 JSON 数据在它们之间发送。 The Client has a Linux and a Windows version, while the Server application runs under Linux. Client 有 Linux 和 Windows 版本,而 Server 应用程序运行在 Linux 下。
The Linux Client communicates just find, but I have problems with the Windows Client. Linux 客户端通信只是找到,但我有 Windows 客户端的问题。

The problematic JSON data contains a text field with an apostrophe.有问题的 JSON 数据包含一个带有撇号的文本字段。 Let's say the content is "a dog`s name", then the Windows client sends this as "a dog\x92s name", while the Linux client goes for "a dog\xE2\x80\x99s name", at least that is what qDebug() shows me.假设内容是“a dog's name”,那么 Windows 客户端将其发送为“a dog\x92s name”,而 Linux 客户端发送“a dog\xE2\x80\x99s name”,至少这是qDebug()向我展示了什么。

I parse the JSON data with the lines我用这些行解析 JSON 数据

    QJsonDocument document = QJsonDocument::fromJson(body);

    if(document.isArray()) json_data = document.array();
    if(document.isObject()) json_data.append(document.object());

where body is a QByteArray and json_data is a QJsonArray .其中bodyQByteArray并且json_dataQJsonArray

If the Windows data is fed into this, it seems that the Qt JSON parser does not recognize it as valid JSON and thus json_data end up being empty. If the Windows data is fed into this, it seems that the Qt JSON parser does not recognize it as valid JSON and thus json_data end up being empty.

I really don't want to do anything manually with that text specific to those very characters, as I want it not only to work with that apostrophe but with all kinds of special characters that a user might enter in general.我真的不想对那些特定于这些字符的文本手动执行任何操作,因为我希望它不仅可以处理那个撇号,还可以处理用户通常可能输入的各种特殊字符。 Is there some way to handle this in general?一般有什么方法可以解决这个问题吗? I assume the Windows is in something like the Windows-1252 encoding?我假设 Windows 类似于 Windows-1252 编码?

I think windows client sends strings encoded in CP1251 or CP1252 .我认为 windows 客户端发送以CP1251CP1252编码的字符串。 And json decoder expects utf-8 .而 json 解码器需要utf-8

Maybe source code is not in utf-8 and has string literals.也许源代码不在utf-8中并且具有字符串文字。 Qt4 has QTextCodec::setCodecForCStrings . Qt4 有QTextCodec::setCodecForCStrings Qt5 assume string literals encoded in utf-8 . Qt5 假定在utf-8中编码的字符串文字。

$ echo -n "’" | iconv -f utf-8 -t cp1251 | xxd
00000000: 92
$ echo -n "’" | xxd
00000000: e280 99

If you don't want to fix windows client the proper way (fixing it's output encoding) you can deal with this situation by converting all input from windows client to unicode before building QJsonDocument on server. If you don't want to fix windows client the proper way (fixing it's output encoding) you can deal with this situation by converting all input from windows client to unicode before building QJsonDocument on server.

QByteArray bodycp1252;
QTextCodec* cp1252 = QTextCodec::codecForName("CP1252");
QTextCodec* utf8 = QTextCodec::codecForName("UTF-8");
QByteArray body = utf8->fromUnicode(cp1252->toUnicode(bodycp1252));
QJsonDocument document = QJsonDocument::fromJson(body);

It's possible to check if QByteArray contains valid utf-8 data with QUtf8::isValidUtf8(const char *chars, qsizetype len) function.可以使用QUtf8::isValidUtf8(const char *chars, qsizetype len) function 检查QByteArray是否包含有效的utf-8数据。 It is defined in private headers, so you need to add QT += core-private .它在私有标头中定义,因此您需要添加QT += core-private Unfortunately implementation is not visible by linker (not exported from QtCore.lib) so you need to add qutfcodec.cpp from qt sources to your project to resolve linker errors.不幸的是,linker(不是从 QtCore.lib 导出)看不到实现,因此您需要将qt源中的 qutfcodec.cpp 添加到您的项目中以解决 Z3175B4260467409EECE73773873 错误。

////////////////// is-valid-utf8.pro

QT -= gui

QT += core core-private

CONFIG += c++11 console
CONFIG -= app_bundle

qt_src = "C:/Qt/5.15.1/Src"

SOURCES += \
        main.cpp \
        $$qt_src/qtbase/src/corelib/codecs/qutfcodec.cpp

////////////////// main.cpp

#include <QCoreApplication>
#include <private/qutfcodec_p.h>
#include <QTextCodec>
#include <QDebug>

bool isValidUtf8(const QByteArray& data) {
    return QUtf8::isValidUtf8(data.data(), data.size()).isValidUtf8;
}

int main(int argc, char *argv[])
{
    QCoreApplication a(argc, argv);

    QTextCodec* utf8 = QTextCodec::codecForName("UTF-8");
    QTextCodec* cp1251 = QTextCodec::codecForName("CP1251");

    QByteArray utf8data1 = utf8->fromUnicode("Привет мир");
    QByteArray cp1251data1 = cp1251->fromUnicode("Привет мир");

    QByteArray utf8data2 = utf8->fromUnicode("Hello world");
    QByteArray cp1251data2 = cp1251->fromUnicode("Hello world");

    Q_ASSERT(isValidUtf8(utf8data1));
    Q_ASSERT(isValidUtf8(cp1251data1) == false);

    Q_ASSERT(isValidUtf8(utf8data2));
    Q_ASSERT(isValidUtf8(cp1251data2));

    qDebug() << "test passed";

    return 0;
}

source资源

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM