[英]Hive UDF's treatment of URLs
I've created a Hive UDF that parses a URL. 我创建了一个解析URL的Hive UDF。 The URL contains query parameters. 该URL包含查询参数。 When I parse the input in my UDF, however, characters like '=' and '&' are converted to gibberish. 但是,当我解析UDF中的输入时,诸如'='和'&'之类的字符将转换为乱码。
Initially, I was relying on String's toString()
method to convert the Hive Text
to Java String. 最初,我依靠String的toString()
方法将Hive Text
转换为Java String。 The above characters are converted to gibberish with this approach. 上述字符通过这种方法转换为乱码。 I then tried using the new String(str, StandardCharsets.UTF_8)
to convert the Hive Text
to Java String
. 然后,我尝试使用new String(str, StandardCharsets.UTF_8)
将Hive Text
转换为Java String
。 This worked at first. 起初是这样的。 Then, it started producing gibberish as well. 然后,它也开始产生乱码。
My method is shown below. 我的方法如下所示。 Any ideas on what I might not be doing right? 关于我可能做错的任何想法?
public Text evaluate(final Text requestInput, final Text referrerInput) {
if (requestInput == null || referrerInput == null)
return null;
final String request = new String(requestInput.getBytes(), StandardCharsets.UTF_8); // converts '=' and '&' in URL strings to gibberish
final String referrer = new String(referrerInput.getBytes(), StandardCharsets.UTF_8); // converts '=' and '&' in URL strings to gibberish
} }
When I run HQL in Hive: 当我在Hive中运行HQL时:
SELECT get_json_object(json, '$.base.request_url') FROM events
I get this: 我得到这个:
GET /api/get_info?id=1465473313746 HTTP/1.1
In my UDF, the toString()
method (no additional processing) produces the following output: 在我的UDF中, toString()
方法(无需其他处理)将产生以下输出:
GET /api/get_info?id\=1465473313746 HTTP/1.1
I learned that the =
and &
were being converted to their Unicode equivalents. 我了解到=
和&
被转换为Unicode等效项。 Why this was happening is still unclear to me. 我至今还不清楚为什么会这样。 Using Apache Commons StringEscapeUtils utility, the problem became easier: 使用Apache Commons StringEscapeUtils实用程序,问题变得更加简单:
StringEscapeUtils.unescapeJava(requestInput.toString())
solved the issue. 解决了这个问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.