简体   繁体   中英

Pig: Regular Expression syntax

I am using string comparison using regular expression in pig script.

I know that regular expression in PIG are same as Java.

The problem I am facing is: I need to remove all the character which contain white space at the trailing end ?

My regular expression is this: (name matches '!\\\\s+$')

Sample Script-----

raw_data = load '$input' using PigStorage(',') as (fname:chararray);
filter_data = filter raw_data by (fname matches '!\\s+$');
dump filter_data;

Sample Input-----

abcd    ,123
pqrs,234
xyz ,234
lmn,2345

It is not writing anything on STDOUT , where as it should have written "pqrs" and "lmn" .

I don't know PIG , but in Java one syntactically-correct regex to match pqrs,234 and lmn,2345 and would be:

^\S+$

assuming you were in multiline mode.

  • In Java you escape backslashes, so that turns into ^\\\\S+$
  • In Java you can turn on multiline with (?m) so a regex could be (?m)^\\\\S+$

See demo .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM