简体   繁体   English

使用 HBase shell 扫描特定列值的 HTable 行

[英]Scan HTable rows for specific column value using HBase shell

I want to scan rows in a HTable from hbase shell where a column family (ie, Tweet) has a particular value (ie, user_id).我想从hbase shell扫描 HTable 中的行,其中列族(即 Tweet)具有特定值(即 user_id)。

Now I want to find all rows where tweet:user_id has value test1 as this column has value 'test1'现在我想找到 tweet:user_id 具有值test1所有行,因为该列具有值'test1'

column=tweet:user_id, timestamp=1339581201187, value=test1

Though I can scan table for a particular using,虽然我可以扫描特定用途的表格,

scan 'tweetsTable',{COLUMNS => 'tweet:user_id'}

but I did not find any way to scan a row for a value.但我没有找到任何方法来扫描一行的值。

Is it possible to do this via HBase Shell?是否可以通过 HBase Shell 执行此操作?

I checked this question as well.我也检查了这个问题

It is possible without Hive:没有 Hive 也是可能的:

scan 'filemetadata', 
     { COLUMNS => 'colFam:colQualifier', 
       LIMIT => 10, 
       FILTER => "ValueFilter( =, 'binaryprefix:<someValue.e.g. test1 AsDefinedInQuestion>' )" 
     }

Note: in order to find all rows that contain test1 as value as specified in the question, use binaryprefix:test1 in the filter (see this answer for more examples)注意:为了找到包含test1作为问题中指定的值的所有行,请在过滤器中使用binaryprefix:test1 (有关更多示例,请参阅此答案

Nishu, here is solution I periodically use. Nishu,这是我定期使用的解决方案。 It is actually much more powerful than you need right now but I think you will use it's power some day.它实际上比您现在需要的要强大得多,但我认为有一天您会使用它的功能。 Yes, it is for HBase shell.是的,它适用于 HBase shell。

import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
import org.apache.hadoop.hbase.filter.SubstringComparator
import org.apache.hadoop.hbase.util.Bytes

scan 'yourTable', {LIMIT => 10, FILTER => SingleColumnValueFilter.new(Bytes.toBytes('family'), Bytes.toBytes('field'), CompareFilter::CompareOp.valueOf('EQUAL'), Bytes.toBytes('AAA')), COLUMNS => 'family:field' }

Only family:field column is returned with filter applied.仅返回family:field列并应用过滤器。 This filter could be improved to perform more complicated comparisons.可以改进此过滤器以执行更复杂的比较。

Here are also hints for you that I consider most useful:这里还有一些我认为最有用的提示:

As there were multiple requests to explain this answer this additional answer has been posted.由于有多个要求解释此答案,因此已发布此附加答案。

Example 1示例 1

If如果

scan '<table>', { COLUMNS => '<column>', LIMIT => 3 }

would return:会返回:

ROW     COLUMN+CELL
ROW1    column=<column>, timestamp=<timestamp>, value=hello_value
ROW2    column=<column>, timestamp=<timestamp>, value=hello_value2
ROW3    column=<column>, timestamp=<timestamp>, value=hello_value3

then this filter:那么这个过滤器:

scan '<table>', { COLUMNS => '<column>', LIMIT => 3, FILTER => "ValueFilter( =, 'binaryprefix:hello_value2') AND ValueFilter( =, 'binaryprefix:hello_value3')" }

would return:会返回:

ROW     COLUMN+CELL
ROW2    column=<column>, timestamp=<timestamp>, value=hello_value2
ROW3    column=<column>, timestamp=<timestamp>, value=hello_value3

Example 2示例 2

If not is supported as well:如果不支持:

scan '<table>', { COLUMNS => '<column>', LIMIT => 3, FILTER => "ValueFilter( !=, 'binaryprefix:hello_value2' )" }

would return:会返回:

ROW     COLUMN+CELL
ROW1    column=<column>, timestamp=<timestamp>, value=hello_value
ROW3    column=<column>, timestamp=<timestamp>, value=hello_value3

An example of a text search for a value BIGBLUE in table t1 with column family of d:a_content.文本搜索表 t1 中值 BIGBLUE 的示例,列族为 d:a_content。 A scan of the table will show all the available values :-扫描表格将显示所有可用值:-

scan 't1'
...
column=d:a_content, timestamp=1404399246216, value=BIGBLUE
...

To search just for a value of BIGBLUE with limit of 1, try the below command :-要仅搜索限制为 1 的 BIGBLUE 值,请尝试以下命令:-

scan 't1',{ COLUMNS => 'd:a_content', LIMIT => 1, FILTER => "ValueFilter( =, 'regexstring:BIGBLUE' )" }

COLUMN+CELL
column=d:a_content, timestamp=1404399246216, value=BIGBLUE

Obviously removing the limit will show all occurrences in that table/cf.显然,删除限制将显示该表/cf 中的所有事件。

To scan a table in hbase on the basis of any column value, SingleColumnValueFilter can be used as :要根据任何列值扫描 hbase 中的表,可以将 SingleColumnValueFilter 用作:

scan 'tablename' ,
   { 
     FILTER => "SingleColumnValueFilter('column_family','col_name',>, 'binary:1')" 
   } 

From HBAse shell i think it is not possible because it is some how like query from which we use want to find spsecific data.从 HBase shell 我认为这是不可能的,因为它是我们用来查找特定数据的查询。 As all we know that HBAse is noSQL so when we want to apply query or if we have a case like you then i think you should use Hive or PIG where as Hive is quiet good approach because in PIG we need to mess with scripts.众所周知,HBAse 是 noSQL,所以当我们想要应用查询时,或者如果我们有像您这样的情况,那么我认为您应该使用 Hive 或 PIG,因为 Hive 是一种安静的好方法,因为在 PIG 中我们需要弄乱脚本。
Anyway you can get good guaidence about hive from here HIVE integration with HBase and from Here无论如何,您可以从这里获得有关HIVE 的良好指导, HIVE 与 HBase 的集成以及从这里
If yout only purpose is to view data not to get from code (of any client) then you can use HBase Explorer or a new and very good product but it is in its beta release is "HBase manager".如果您的唯一目的是查看数据而不是从(任何客户端的)代码中获取,那么您可以使用 HBase Explorer 或一个新的非常好的产品,但它的 beta 版本是“HBase 管理器”。 You can get this from HBase Manager您可以从HBase 管理器获取此信息
Its simple, and more importantly, it helps to insert and delete data, applying filters on column qualifiers from UI like other DBclients.它很简单,更重要的是,它有助于插入和删除数据,像其他 DBclients 一样从 UI 对列限定符应用过滤器。 Have a try.试试。
I hope it would be helpful for you :)我希望它对你有帮助:)

Slightly different question but if you you want to query a specific column which is not present in all rows, DependentColumnFilter is your best friend:稍微不同的问题,但如果您想查询不存在于所有行中的特定列, DependentColumnFilter是您最好的朋友:

import org.apache.hadoop.hbase.filter.DependentColumnFilter
scan 'orgtable2', {FILTER => "DependentColumnFilter('cf1','lan',false,=,'binary:fre')"}

The previous scan will return all columns for the rows in which the lan column is present and for which its associated value is equal to fre .前一次扫描将返回lan列所在行且其关联值等于fre所有列。 The third argument is dropDependentColumn and would prevent the lan column itself to be displayed in the results if set to true .第三个参数是dropDependentColumn ,如果设置为true ,它将阻止lan列本身显示在结果中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM