简体   繁体   中英

Scan HTable rows for specific column value using HBase shell

I want to scan rows in a HTable from hbase shell where a column family (ie, Tweet) has a particular value (ie, user_id).

Now I want to find all rows where tweet:user_id has value test1 as this column has value 'test1'

column=tweet:user_id, timestamp=1339581201187, value=test1

Though I can scan table for a particular using,

scan 'tweetsTable',{COLUMNS => 'tweet:user_id'}

but I did not find any way to scan a row for a value.

Is it possible to do this via HBase Shell?

I checked this question as well.

It is possible without Hive:

scan 'filemetadata', 
     { COLUMNS => 'colFam:colQualifier', 
       LIMIT => 10, 
       FILTER => "ValueFilter( =, 'binaryprefix:<someValue.e.g. test1 AsDefinedInQuestion>' )" 
     }

Note: in order to find all rows that contain test1 as value as specified in the question, use binaryprefix:test1 in the filter (see this answer for more examples)

Nishu, here is solution I periodically use. It is actually much more powerful than you need right now but I think you will use it's power some day. Yes, it is for HBase shell.

import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
import org.apache.hadoop.hbase.filter.SubstringComparator
import org.apache.hadoop.hbase.util.Bytes

scan 'yourTable', {LIMIT => 10, FILTER => SingleColumnValueFilter.new(Bytes.toBytes('family'), Bytes.toBytes('field'), CompareFilter::CompareOp.valueOf('EQUAL'), Bytes.toBytes('AAA')), COLUMNS => 'family:field' }

Only family:field column is returned with filter applied. This filter could be improved to perform more complicated comparisons.

Here are also hints for you that I consider most useful:

As there were multiple requests to explain this answer this additional answer has been posted.

Example 1

If

scan '<table>', { COLUMNS => '<column>', LIMIT => 3 }

would return:

ROW     COLUMN+CELL
ROW1    column=<column>, timestamp=<timestamp>, value=hello_value
ROW2    column=<column>, timestamp=<timestamp>, value=hello_value2
ROW3    column=<column>, timestamp=<timestamp>, value=hello_value3

then this filter:

scan '<table>', { COLUMNS => '<column>', LIMIT => 3, FILTER => "ValueFilter( =, 'binaryprefix:hello_value2') AND ValueFilter( =, 'binaryprefix:hello_value3')" }

would return:

ROW     COLUMN+CELL
ROW2    column=<column>, timestamp=<timestamp>, value=hello_value2
ROW3    column=<column>, timestamp=<timestamp>, value=hello_value3

Example 2

If not is supported as well:

scan '<table>', { COLUMNS => '<column>', LIMIT => 3, FILTER => "ValueFilter( !=, 'binaryprefix:hello_value2' )" }

would return:

ROW     COLUMN+CELL
ROW1    column=<column>, timestamp=<timestamp>, value=hello_value
ROW3    column=<column>, timestamp=<timestamp>, value=hello_value3

An example of a text search for a value BIGBLUE in table t1 with column family of d:a_content. A scan of the table will show all the available values :-

scan 't1'
...
column=d:a_content, timestamp=1404399246216, value=BIGBLUE
...

To search just for a value of BIGBLUE with limit of 1, try the below command :-

scan 't1',{ COLUMNS => 'd:a_content', LIMIT => 1, FILTER => "ValueFilter( =, 'regexstring:BIGBLUE' )" }

COLUMN+CELL
column=d:a_content, timestamp=1404399246216, value=BIGBLUE

Obviously removing the limit will show all occurrences in that table/cf.

To scan a table in hbase on the basis of any column value, SingleColumnValueFilter can be used as :

scan 'tablename' ,
   { 
     FILTER => "SingleColumnValueFilter('column_family','col_name',>, 'binary:1')" 
   } 

From HBAse shell i think it is not possible because it is some how like query from which we use want to find spsecific data. As all we know that HBAse is noSQL so when we want to apply query or if we have a case like you then i think you should use Hive or PIG where as Hive is quiet good approach because in PIG we need to mess with scripts.
Anyway you can get good guaidence about hive from here HIVE integration with HBase and from Here
If yout only purpose is to view data not to get from code (of any client) then you can use HBase Explorer or a new and very good product but it is in its beta release is "HBase manager". You can get this from HBase Manager
Its simple, and more importantly, it helps to insert and delete data, applying filters on column qualifiers from UI like other DBclients. Have a try.
I hope it would be helpful for you :)

Slightly different question but if you you want to query a specific column which is not present in all rows, DependentColumnFilter is your best friend:

import org.apache.hadoop.hbase.filter.DependentColumnFilter
scan 'orgtable2', {FILTER => "DependentColumnFilter('cf1','lan',false,=,'binary:fre')"}

The previous scan will return all columns for the rows in which the lan column is present and for which its associated value is equal to fre . The third argument is dropDependentColumn and would prevent the lan column itself to be displayed in the results if set to true .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM