I have a TSV file where the char * is used as null identifier and i want to delete it, the file looks like this:
Foo Foo foo FOO
Bar * * *
*Bar Foo* Foo * Bar Foo bar
* * Bar Foobar
if i use s/(^| )\\*( |$)/\\1\\2/g
gives this output:
Foo Foo foo FOO
Bar *
*Bar Foo* Foo * Bar Foo bar
* Bar Foobar
Matching one yes, and one not, what can i do to replace all of them if they are surrounded by TABS?
The desired output should look like this:
Foo Foo foo FOO
Bar
*Bar Foo* Foo * Bar Foo bar
Bar Foobar
As it is not very clear what are the tabs, let's try with |
as field separator:
$ cat a
Foo|Foo|foo|FOO
Bar|*|*|*
*Bar|Foo*|Foo * Bar|Foo bar
*|*|Bar|Foobar
So with awk
we can do:
$ awk 'BEGIN{FS=OFS="|"}{for (i=1; i<=NF; i++) if ($i=="*") $i=""}1' a
Foo|Foo|foo|FOO
Bar|||
*Bar|Foo*|Foo * Bar|Foo bar
||Bar|Foobar
This loops through all the fields and blanks them (sets them as empty) in case their value is exactly *
.
Note: for this solution to work with your sample input, just replace the definition of the field separator: from BEGIN{FS=OFS="|"}
to BEGIN{FS=OFS="\\t"}
.
You could try the below perl comman,
$ perl -pe 's/(\W|^)\*\t\*/\1/g;s/\t\*$//g' file
Foo Foo foo FOO
Bar
*Bar Foo* Foo * Bar Foo bar
Bar Foobar
Granting they're tab separated:
sed -r 's@([^\t])[*]@\1__0x2A__@g; s@[*]([^\t\r])@__0x2A__\1@g; s@[*]@@g; s@__0x2A__@*@g' file
Shorter:
sed -r 's@([^\t])[*]@\1\a@g; s@[*]([^\t\r])@\a\1@g; s@[*]@@g; s@\a@*@g' file
Output:
Foo Foo foo FOO
Bar
*Bar Foo* Foo * Bar Foo bar
Bar Foobar
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.