简体   繁体   中英

How to replace a pattern from a TSV using sed?

I have a TSV file where the char * is used as null identifier and i want to delete it, the file looks like this:

Foo Foo foo FOO
Bar *   *   *
*Bar    Foo*    Foo * Bar   Foo bar
*   *   Bar Foobar

if i use s/(^| )\\*( |$)/\\1\\2/g gives this output:

Foo Foo foo FOO
Bar     *   
*Bar    Foo*    Foo * Bar   Foo bar
*   Bar Foobar

Matching one yes, and one not, what can i do to replace all of them if they are surrounded by TABS?

The desired output should look like this:

Foo Foo foo FOO
Bar         
*Bar    Foo*    Foo * Bar   Foo bar
        Bar Foobar

As it is not very clear what are the tabs, let's try with | as field separator:

$ cat a
Foo|Foo|foo|FOO
Bar|*|*|*
*Bar|Foo*|Foo * Bar|Foo bar
*|*|Bar|Foobar

So with awk we can do:

$ awk 'BEGIN{FS=OFS="|"}{for (i=1; i<=NF; i++) if ($i=="*") $i=""}1' a
Foo|Foo|foo|FOO
Bar|||
*Bar|Foo*|Foo * Bar|Foo bar
||Bar|Foobar

This loops through all the fields and blanks them (sets them as empty) in case their value is exactly * .

Note: for this solution to work with your sample input, just replace the definition of the field separator: from BEGIN{FS=OFS="|"} to BEGIN{FS=OFS="\\t"} .

You could try the below perl comman,

$ perl -pe 's/(\W|^)\*\t\*/\1/g;s/\t\*$//g' file
Foo Foo foo FOO
Bar         
*Bar    Foo*    Foo * Bar   Foo bar
        Bar Foobar

Granting they're tab separated:

sed -r 's@([^\t])[*]@\1__0x2A__@g; s@[*]([^\t\r])@__0x2A__\1@g; s@[*]@@g; s@__0x2A__@*@g' file

Shorter:

sed -r 's@([^\t])[*]@\1\a@g; s@[*]([^\t\r])@\a\1@g; s@[*]@@g; s@\a@*@g' file

Output:

Foo     Foo     foo     FOO
Bar
*Bar    Foo*    Foo * Bar       Foo bar
                Bar     Foobar

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM