简体   繁体   English

在Java中转换为StringEntity时,如何从字符串中删除编码错误的字符?

[英]How can I remove wrongly encoded characters from a string when converting to a StringEntity in Java?

I am executing a post request using the org.apache.http library and I am having trouble encoding some characters the correct way. 我正在使用org.apache.http库执行发布请求,但无法正确编码某些字符。 I am using jsoup to pull down text from a web page and then sending this text to an api. 我正在使用jsoup从网页上下拉文本,然后将此文本发送到api。 My code looks like this 我的代码看起来像这样

        DefaultHttpClient httpClient = new DefaultHttpClient();
        HttpPost postRequest = new HttpPost(url);

        ObjectMapper mapper = new ObjectMapper();
        String jsonString = mapper.writeValueAsString(object);

        StringEntity input = new StringEntity(jsonString);

        input.setContentType("application/json");
        postRequest.setEntity(input);
        HttpResponse response = httpClient.execute(postRequest);

The problem is that sometimes the text I am grabbing from these web pages is improperly formatted and normal characters like apostrophes and hyphens are being turned into question marks or other weird punctuation marks when initializing the stringentity class. 问题是,有时我从这些网页上获取的文本格式不正确,初始化严格性类时,诸如撇号和连字符之类的普通字符会变成问号或其他怪异的标点符号。 My question is how can I take the improperly encoded jsonString and encode it properly so that when it is sent in the post request the characters are set to the correct characters. 我的问题是如何获取编码错误的jsonString并正确编码,以便在发布请求中将其发送时将字符设置为正确的字符。 I don't want to remove the apostrophes or hyphens, I want to set them to a proper format like utf-8. 我不想删除撇号或连字符,我想将它们设置为类似utf-8的格式。

Setting the charset on the StringEntity's constructor fixed this for me (setting content type on the StringEntity after creation didn't!): 在StringEntity的构造函数上设置字符集为我解决了这一问题(创建后在StringEntity上设置内容类型没有!):

import org.apache.http.protocol.HTTP;
...
httpPost.setEntity(new StringEntity(jsonString, HTTP.UTF_8));

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM