简体   繁体   English

如何在Hive中提取子字符串

[英]How to extract substring in Hive

I am having trouble trying to extract substring in Hive. 我在尝试在Hive中提取子字符串时遇到麻烦。 The table I am working on has a column called referee_dict, showing the rank and corresponding players' IDs. 我正在处理的表格有一个名为Referee_dict的列,其中显示了排名和相应的玩家ID。 For example, a record could look like this: 例如,一条记录可能如下所示:

[('Bronze1', [2738653, 2738652, 2738655]), ('Bronze2', [2738653, 2738652]), ('Bronze3', []), ('Silver1', []), ('Silver2', []), ('Silver3', [])

I am trying to find the players who have achieved bronze 2, so I want to extract [2738653, 2738652] from the list. 我试图找到获得铜牌2的球员,所以我想从列表中提取[2738653,2738652]。 I know it is pretty easy in Python, however, I looked up Hive's documentation but still do not know how to do it in sql/Hive. 我知道在Python中这很容易,但是,我查阅了Hive的文档,但仍然不知道如何在sql / Hive中做到这一点。 Any help would be appreciated! 任何帮助,将不胜感激!

Well I think I figured out a way.. however I don't know if it is the easiest one. 好吧,我想我想出了一种方法..但是我不知道这是否是最简单的方法。 Since it is a string, I am going to use regex to capture the substring after "Bronze1' [" and before next "]". 由于它是一个字符串,因此我将使用正则表达式来捕获“ Bronze1'[”之后和下一个“]”之前的子字符串。 The function I am going to use is 我要使用的功能是
regexp_extract(string subject, string pattern, int index). regexp_extract(字符串主题,字符串模式,int索引)。 Hope this helps if anyone has similar questions. 如果有人有类似问题,希望这对您有所帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM