简体   繁体   English

Hive UDF对URL的处理

[英]Hive UDF's treatment of URLs

I've created a Hive UDF that parses a URL. 我创建了一个解析URL的Hive UDF。 The URL contains query parameters. 该URL包含查询参数。 When I parse the input in my UDF, however, characters like '=' and '&' are converted to gibberish. 但是,当我解析UDF中的输入时,诸如'='和'&'之类的字符将转换为乱码。

Initially, I was relying on String's toString() method to convert the Hive Text to Java String. 最初,我依靠String的toString()方法将Hive Text转换为Java String。 The above characters are converted to gibberish with this approach. 上述字符通过这种方法转换为乱码。 I then tried using the new String(str, StandardCharsets.UTF_8) to convert the Hive Text to Java String . 然后,我尝试使用new String(str, StandardCharsets.UTF_8)将Hive Text转换为Java String This worked at first. 起初是这样的。 Then, it started producing gibberish as well. 然后,它也开始产生乱码。

My method is shown below. 我的方法如下所示。 Any ideas on what I might not be doing right? 关于我可能做错的任何想法?

public Text evaluate(final Text requestInput, final Text referrerInput) {
    if (requestInput == null || referrerInput == null)
        return null;

    final String request = new String(requestInput.getBytes(), StandardCharsets.UTF_8); // converts '=' and '&' in URL strings to gibberish
    final String referrer = new String(referrerInput.getBytes(), StandardCharsets.UTF_8); // converts '=' and '&' in URL strings to gibberish

} }

When I run HQL in Hive: 当我在Hive中运行HQL时:

SELECT get_json_object(json, '$.base.request_url') FROM events

I get this: 我得到这个:

GET /api/get_info?id=1465473313746 HTTP/1.1

In my UDF, the toString() method (no additional processing) produces the following output: 在我的UDF中, toString()方法(无需其他处理)将产生以下输出:

GET /api/get_info?id\=1465473313746 HTTP/1.1

I learned that the = and & were being converted to their Unicode equivalents. 我了解到=&被转换为Unicode等效项。 Why this was happening is still unclear to me. 我至今还不清楚为什么会这样。 Using Apache Commons StringEscapeUtils utility, the problem became easier: 使用Apache Commons StringEscapeUtils实用程序,问题变得更加简单:

StringEscapeUtils.unescapeJava(requestInput.toString()) 

solved the issue. 解决了这个问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM