简体   繁体   中英

How to use “sed” shell command to obfuscate info indicated by certain character

I'm trying to write a shell command "sed" or "grep" to obfuscate information followed by "Scraped from" with a single "*".

For example, the sample file has:

2016-12-09 18:57:32 [scrapy.core.engine] INFO: Spider opened
2016-12-09 18:57:32 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2016-12-09 18:57:32 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-12-09 18:57:32 [scrapy.core.engine] DEBUG: Crawled (404) <GET http://quotes.toscrape.com/robots.txt> (referer: None)
2016-12-09 18:57:32 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/> (referer: None)
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
{'text': '“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”', 'tags': ['change', 'deep-thoughts', 'thinking', 'world'], 'author': 'Albert Einstein'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
{'text': '“It is our choices, Harry, that show what we truly are, far more than our abilities.”', 'tags': ['abilities', 'choices'], 'author': 'J.K. Rowling'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
{'text': '“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”', 'tags': ['inspirational', 'life', 'live', 'miracle', 'miracles'], 'author': 'Albert Einstein'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
{'text': '“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”', 'tags': ['aliteracy', 'books', 'classic', 'humor'], 'author': 'Jane Austen'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
{'text': "“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”", 'tags': ['be-yourself', 'inspirational'], 'author': 'Marilyn Monroe'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
{'text': '“Try not to become a man of success. Rather become a man of value.”', 'tags': ['adulthood', 'success', 'value'], 'author': 'Albert Einstein'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/>

The output should have:

2016-12-09 18:57:32 [scrapy.core.engine] INFO: Spider opened
2016-12-09 18:57:32 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2016-12-09 18:57:32 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-12-09 18:57:32 [scrapy.core.engine] DEBUG: Crawled (404) <GET http://quotes.toscrape.com/robots.txt> (referer: None)
2016-12-09 18:57:32 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/> (referer: None)
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from *
{'text': '“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”', 'tags': ['change', 'deep-thoughts', 'thinking', 'world'], 'author': 'Albert Einstein'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from *
{'text': '“It is our choices, Harry, that show what we truly are, far more than our abilities.”', 'tags': ['abilities', 'choices'], 'author': 'J.K. Rowling'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from *
{'text': '“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”', 'tags': ['inspirational', 'life', 'live', 'miracle', 'miracles'], 'author': 'Albert Einstein'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from *
{'text': '“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”', 'tags': ['aliteracy', 'books', 'classic', 'humor'], 'author': 'Jane Austen'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from *
{'text': "“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”", 'tags': ['be-yourself', 'inspirational'], 'author': 'Marilyn Monroe'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from *
{'text': '“Try not to become a man of success. Rather become a man of value.”', 'tags': ['adulthood', 'success', 'value'], 'author': 'Albert Einstein'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from *

I know that you can use sed 's/bla/BLA/g' to do replacement, but in my case I need to replace the information followed by certain character.And I'm not sure how should I do it.

to obfuscate information followed by "Scraped from" with a single "*".

So just replace everything followed by "Screped from" by a single *:

sed 's/Scraped from .*/Scraped from */'

Here is a solution that will keep the punctuation mark (or lack thereof) after the keyword from . Also assuming you only want this change after the keyphrase Scraped from rather than "any" from .

sed -E 's/(Scraped from[:=]?).*/\1 */g' sample_file

Dealing with two from clauses in the same line is a bit more complicated. Here's one way to do it.

Sample file (simplified):

cat sample_file

2016-12-09 [scrapy.core.engine] INFO: Spider opened
2016-12-09 [scrapy.logstats] INFO: Scraped 0 items (at 0 items/min)
2016-12-09 [scrapy.ext.telnet] DEBUG: Telnet listening on 127.0.0.1:6023
2016-12-09 [scrapy] DEBUG: Crawled (200) <GET http://quotes.com/> (ref: None)
2016-12-09 [scrapy] DEBUG: Scraped from= <200 http://quotes.toscrape.com/>
2016-12-09 [scrapy] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
2016-12-09 [scrapy] DEBUG: Scraped from: <200 http://first/> and from: me.org
2016-12-09 [scrapy.core.scraper] DEBUG: Scraped from <3 http://toscrape.com/>

Solution and output:

sed -E 's/(Scraped from[:=]?) .*and from/\1 * and from/;
        s/(Scraped( from[:=]? \* and)? from[:=]?).*$/\1 */' sample_file

2016-12-09 [scrapy.core.engine] INFO: Spider opened
2016-12-09 [scrapy.logstats] INFO: Scraped 0 items (at 0 items/min)
2016-12-09 [scrapy.ext.telnet] DEBUG: Telnet listening on 127.0.0.1:6023
2016-12-09 [scrapy] DEBUG: Crawled (200) <GET http://quotes.com/> (ref: None)
2016-12-09 [scrapy] DEBUG: Scraped from= *
2016-12-09 [scrapy] DEBUG: Scraped from *
2016-12-09 [scrapy] DEBUG: Scraped from: * and from: *
2016-12-09 [scrapy.core.scraper] DEBUG: Scraped from *

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM