简体   繁体   English

尝试仅从Windows将大容量数据加载到ElasticSearch中的JSON编码问题

[英]JSON encoding issue attempting to load bulk data into ElasticSearch, only from Windows

I'm uploading a file to the ElasticSearch /_bulk api to insert/update records. 我正在将文件上传到ElasticSearch /_bulk api以插入/更新记录。 From my local machine (OSX) I've had no problems and continue to be able to send the "problem" data without issue. 在我的本地计算机(OSX)上,我没有遇到任何问题,并且继续能够毫无问题地发送“问题”数据。

From our QA server, which is running Windows Server 2012, ES returns an error for a row that contains a name with a diacritic (accent). 从运行Windows Server 2012的QA服务器中,ES对于包含带有变音符号(重音)的名称的行返回错误。

The data is similar to this (changed the name but left the accent): María 数据与此类似(更改了名称但留下了重音符号): María

The error returned is: 返回的错误是:

MapperParsingException[failed to parse [name.display]]; MapperParsingException [无法解析[name.display]]; nested: JsonParseException[Invalid UTF-8 middle byte 0x61 at [Source: [B@466e94e8; 嵌套:JsonParseException [无效的UTF-8中间字节0x61,位于[来源:[B @ 466e94e8; line: 1, column: 194]]; 行:1,列:194]]];

Based on some other stack overflow answers , I'm currently of the opinion that it's some sort of encoding issue. 基于其他一些堆栈溢出答案 ,我目前认为这是某种编码问题。

I'm uploading the file using Adobe ColdFusion 11, with the following code: 我正在使用Adobe ColdFusion 11通过以下代码上传文件:

cfhttp( method=arguments.method, url=arguments.uri, result="result" ) {
    cfhttpparam( type="body", value="#fileReadBinary( file )#" );
}

Since I suspect an encoding issue, I also added a header to try and force the encoding it to UTF-8 , like so: 由于我怀疑存在编码问题,因此我还添加了一个标头,以尝试将其编码为UTF-8 ,如下所示:

cfhttp( method=arguments.method, url=arguments.uri, result="result" ) {
    cfhttpparam( type="header", name="Content-Type", value="application/javascript; charset=UTF-8" );
    cfhttpparam( type="body", value="#fileReadBinary( file )#" );
}

No matter what I try, I continue to get the same error message. 无论我尝试什么,我都会继续收到相同的错误消息。 I'm not sure where to go from here. 我不确定从这里去哪里。

After enough noodling around I remembered that the function charsetEncode() might be of some use. 经过足够的整理后,我记得函数charsetEncode()可能有用。

I tested this on both Windows and OSX to make sure that the windows fix didn't break functionality on OSX, and so far it works perfectly in both locations: 我在Windows和OSX上都对此进行了测试,以确保Windows修复程序不会破坏OSX上的功能,并且到目前为止,它在两个位置都可以正常使用:

cfhttp( method=arguments.method, url=arguments.uri, result="result" ) {
    cfhttpparam( type="header", name="Content-Type", value="application/javascript; charset=UTF-8" );
    cfhttpparam( type="body", value="#charsetEncode(fileReadBinary( file ), 'utf-8')#" );
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM