简体   繁体   中英

Delete any special character using Sed

I have yet another list of subdomain. I want to remove any Wildcard subdomain which include these special characters:

()!&$#*+?

Mostly, the data are prefixly random. Also, could be middle. Here's some sample of output data

(www.imgur.com
***************diet.blogspot.com
*-1.gbc.criteo.com
------------------------------------------------------------i.imgur.com

This has been quite an inconvenience while scanning through the list. As always, I'm trying sed to fix it:

sed -i "/[!()#$&?+]/d" foo.txt ###Didn't work
sed -i "/[\!\(\)\#\$\&\?\+]/d" ###Escaping char didn't work

Performing commands above still result in an unchanged list and the file still on original state. I'm thinking that; to fix this is to pipe series of sed command in order to remove it one by one:

cat foo.txt | sed -e "/!/d" -e "/#/d" -e "/\*/d" -e "/\$/d" -e "/(/d" -e "/)/d" -e "/+/d" -e "/\'/d" -e "/&/d" >> foo2.txt
cat foo.txt | sed -e "/\!/d" | sed -e "/\#/d" | sed -e "/\*/d" | sed -e "/\$/d" | sed -e "/\+/d" | sed -e "/\'/d" | sed -e "/\&/d" >> foo2.txt

If escaping all special char doesn't work, it must've been my false logic. Also tried with /g still doesn't increase my luck.

As a side note: I don't want - to be deleted as some valid subdomain can have - character:

line-apps.com
line-apps-beta.com
line-apps-rc.com
line-apps-dev.com

Any help would be cherished.

Using sed

$ sed '/[[:punct:]]/d' input_file

This should delete all lines with special characters, however, it would help if you provided sample data.

End-up using single-quotation '' mentioned by @potong

sed '/[\!\?\+\,\#\$\&\*\(\)\[\]\ ]/d'

No idea why it does that but shell is always the target to blame.

To do what you're trying to do in your answer (which adds [ and ] and more to the set of characters in your question) would be:

sed '/[][!?+,#$&*() ]/d'

or just:

grep -v '[][!?+,#$&*() ]'

Per POSIX to include ] in a bracket expression it must be the first character otherwise it indicates the end of the bracket expression.

Consider printing lines you want instead of deleting lines you do not want, though, eg:

grep '^[[:alnum:]_.-]$' file

to print lines that only contain letters, numbers, underscores, dashes, and/or periods.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM