I'm trying to write a shell command "sed" or "grep" to obfuscate information followed by "Scraped from" with a single "*".
For example, the sample file has:
2016-12-09 18:57:32 [scrapy.core.engine] INFO: Spider opened
2016-12-09 18:57:32 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2016-12-09 18:57:32 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-12-09 18:57:32 [scrapy.core.engine] DEBUG: Crawled (404) <GET http://quotes.toscrape.com/robots.txt> (referer: None)
2016-12-09 18:57:32 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/> (referer: None)
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
{'text': '“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”', 'tags': ['change', 'deep-thoughts', 'thinking', 'world'], 'author': 'Albert Einstein'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
{'text': '“It is our choices, Harry, that show what we truly are, far more than our abilities.”', 'tags': ['abilities', 'choices'], 'author': 'J.K. Rowling'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
{'text': '“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”', 'tags': ['inspirational', 'life', 'live', 'miracle', 'miracles'], 'author': 'Albert Einstein'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
{'text': '“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”', 'tags': ['aliteracy', 'books', 'classic', 'humor'], 'author': 'Jane Austen'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
{'text': "“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”", 'tags': ['be-yourself', 'inspirational'], 'author': 'Marilyn Monroe'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
{'text': '“Try not to become a man of success. Rather become a man of value.”', 'tags': ['adulthood', 'success', 'value'], 'author': 'Albert Einstein'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
The output should have:
2016-12-09 18:57:32 [scrapy.core.engine] INFO: Spider opened
2016-12-09 18:57:32 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2016-12-09 18:57:32 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-12-09 18:57:32 [scrapy.core.engine] DEBUG: Crawled (404) <GET http://quotes.toscrape.com/robots.txt> (referer: None)
2016-12-09 18:57:32 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/> (referer: None)
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from *
{'text': '“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”', 'tags': ['change', 'deep-thoughts', 'thinking', 'world'], 'author': 'Albert Einstein'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from *
{'text': '“It is our choices, Harry, that show what we truly are, far more than our abilities.”', 'tags': ['abilities', 'choices'], 'author': 'J.K. Rowling'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from *
{'text': '“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”', 'tags': ['inspirational', 'life', 'live', 'miracle', 'miracles'], 'author': 'Albert Einstein'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from *
{'text': '“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”', 'tags': ['aliteracy', 'books', 'classic', 'humor'], 'author': 'Jane Austen'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from *
{'text': "“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”", 'tags': ['be-yourself', 'inspirational'], 'author': 'Marilyn Monroe'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from *
{'text': '“Try not to become a man of success. Rather become a man of value.”', 'tags': ['adulthood', 'success', 'value'], 'author': 'Albert Einstein'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from *
I know that you can use sed 's/bla/BLA/g' to do replacement, but in my case I need to replace the information followed by certain character.And I'm not sure how should I do it.
to obfuscate information followed by "Scraped from" with a single "*".
So just replace everything followed by "Screped from" by a single *:
sed 's/Scraped from .*/Scraped from */'
Here is a solution that will keep the punctuation mark (or lack thereof) after the keyword from
. Also assuming you only want this change after the keyphrase Scraped from
rather than "any" from
.
sed -E 's/(Scraped from[:=]?).*/\1 */g' sample_file
Dealing with two from
clauses in the same line is a bit more complicated. Here's one way to do it.
Sample file (simplified):
cat sample_file
2016-12-09 [scrapy.core.engine] INFO: Spider opened
2016-12-09 [scrapy.logstats] INFO: Scraped 0 items (at 0 items/min)
2016-12-09 [scrapy.ext.telnet] DEBUG: Telnet listening on 127.0.0.1:6023
2016-12-09 [scrapy] DEBUG: Crawled (200) <GET http://quotes.com/> (ref: None)
2016-12-09 [scrapy] DEBUG: Scraped from= <200 http://quotes.toscrape.com/>
2016-12-09 [scrapy] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
2016-12-09 [scrapy] DEBUG: Scraped from: <200 http://first/> and from: me.org
2016-12-09 [scrapy.core.scraper] DEBUG: Scraped from <3 http://toscrape.com/>
Solution and output:
sed -E 's/(Scraped from[:=]?) .*and from/\1 * and from/;
s/(Scraped( from[:=]? \* and)? from[:=]?).*$/\1 */' sample_file
2016-12-09 [scrapy.core.engine] INFO: Spider opened
2016-12-09 [scrapy.logstats] INFO: Scraped 0 items (at 0 items/min)
2016-12-09 [scrapy.ext.telnet] DEBUG: Telnet listening on 127.0.0.1:6023
2016-12-09 [scrapy] DEBUG: Crawled (200) <GET http://quotes.com/> (ref: None)
2016-12-09 [scrapy] DEBUG: Scraped from= *
2016-12-09 [scrapy] DEBUG: Scraped from *
2016-12-09 [scrapy] DEBUG: Scraped from: * and from: *
2016-12-09 [scrapy.core.scraper] DEBUG: Scraped from *
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.