简体   繁体   中英

hex search and replace characters with sed linux

I am trying to reformat and condense weather forecast that I get from the National Weather Service and then display it on a section of my conky screen. To do this I remove the unimportant line feeds and turn the paragraphs into series of sentences, condense the text, and then reformat to the line length needed for display.

The data is in a file testsed.in1.txt and testsed.in2.txt, for simplicity sake we can use the same data for both. There are no non-ascii characters in the file. I shortened it to illustrate the problems, normally its quite long and verbose, and that's why I need to condense it.

918 PM CST Sun Dec 24 2017~. TONIGHT...Cloudy with ~flurries. Lows 11 to 15. ~.CHRISTMAS DAY...Windy. Flurries and~light snow showers. Temperatures~nearly steady 12 to 16. ~.MONDAY NIGHT...Partly cloudy. Lows 1 below to 3 above zero. Wind~chills as low as 10 below zero. Northwest winds 10 to~15 mph. ~.TUESDAY...Mostly sunny. Wind chills as low~as 10 below to 20 below zero. ~.SATURDAY NIGHT...Mostly cloudy. A 30 percent chance of snow in~the evening. ~.SUNDAY...Mostly cloudy. Highs 15 to 19. ~$$~

They put ... sequences in the text which I would like to replace with a - dash character. I don't want the period characters causing problems in the next section of code where I am looking for the important line feeds. This doesn't work at all and converts the entire file to a series of dashes, except the $~ at the end of the file, not just the sequences of ... three periods in a row.

cat testsed.in1.txt | sed -e "s/\x2E\x2E\x2E/\x2D/g" > testsed.out1.txt

----------------------------------------------------------------------------------------------------------------------------------------------------------------------$~

Secondly, I need to find where the important line feeds should go which are \\x7E\\x2E and convert them to \\x07\\x2E characters. This works partially, but it overlays the following character each time. My reason for doing this is because I really want to convert all the \\x7E that are not followed by a \\x2E to spaces, and then use tr to convert the \\x07 into \\x0A line feeds.

cat testsed.in2.txt | sed -e "s/\x7E\x2E/\x07\x2E/g" > testsed.out2.txt

918 PM CST Sun Dec 24 2017. TONIGHT...Cloudy with .lurries. Lows 11 to 15. .CHRISTMAS DAY...Windy. Flurries and.ight snow showers. Temperatures.early steady 12 to 16. .MONDAY NIGHT...Partly cloudy. Lows 1 below to 3 above zero. Wind.hills as low as 10 below zero. Northwest winds 10 to.5 mph. .TUESDAY...Mostly sunny. Wind chills as low.s 10 below to 20 below zero. .SATURDAY NIGHT...Mostly cloudy. A 30 percent chance of snow in.he evening. .SUNDAY...Mostly cloudy. Highs 15 to 19. .$~

This is my first question here, so I apologize in advance if I made any mistakes. Hopefully someone here is familiar with converting strings under linux and willing to show me how to make it work.

\\x2E\\x2E\\x2E is the same as ... which will match any three consecutive characters (the conversion from hex-notation is performed before the regexp is parsed). Since the sample text in testsed.in1.txt is 500 characters long, sed with convert it to 166 dashes and leave 2 characters untouched (500 = 166*3 + 2).

I would use something like

sed -e "s/\.\.\./-/g" testsed.in1.txt > testsed.out1.txt

or perhaps

sed -e "s/[.]\{3\}/-/g" testsed.in1.txt > testsed.out1.txt

The second part of your question suffers from the same problem with \\x2E .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM