简体   繁体   中英

remove end of line characters with a bash script?

I'm trying to make a script to remove this characters (/r/n) that windows puts. BUT ONLY if they are between this ( " ) why this? because the dump file puts this characters I don't know why. and why between quotes? because it only affect me if they are chopping my result

For Example. "this","is","a","result","from","database"

the problem :

"this","is","a","result","from","da
tabase"

[EDIT]

Thanks to the answer of @Cyrus I got something like this 在此处输入图片说明

, but it gets bad flag in substitute command '}' I'm on MAC OSX

Can you help me?

Thanks

OS X uses a different sed than the one that's typically installed in Linux.

The big differences are that sequences like \\r and \\n don't get expanded or used as part of the expression as you might expect, and you tend to need to separate commands with semicolons a little more.

If you can get by with a sed one-liner that implements a rule like "Remove any \\r\\n on lines containing quotes", it will certainly simplify your task...

For my experiments, I used what I infer is your sample input data:

$ od -c input.txt
0000000    F   o   r       E   x   a   m   p   l   e   .       "   t   h
0000020    i   s   "   ,   "   i   s   "   ,   "   a   "   ,   "   r   e
0000040    s   u   l   t   "   ,   "   f   r   o   m   "   ,   "   d   a
0000060    t   a  \r  \n   b   a   s   e   "  \n                        
0000072

First off, a shell-only solution might be to use smaller tools that are built in to the operating system. For example, here's a one-liner:

od -A n -t o1 -v input.txt | rs 0 1 | while read n; do [ $n -eq 015 ] && read n && continue; printf "\\$n"; done

Broken out for easier reading, here's what this looks like:

  • od -A n -t o1 -v input.txt | rs 0 1 od -A n -t o1 -v input.txt | rs 0 1 - convert the file into a stream of ocal numbers
  • | while read n; do | while read n; do - step through the numbers...
    • [ $n -eq 015 ] && - if the current number is 15 (ie octal for a Carriage Return)
    • read n - read a line (thus skipping it),
    • && continue - and continue to the next octal number (thus skipping the newline after a CR)
    • printf "\\\\$n"; done printf "\\\\$n"; done - print the current octal number.

This kind of data conversion and stream logic works nicely in a pipeline, but is a bit harder to implement in sed, which only knows how to deal with the original input rather than its converted form.

Another bash option might be to use conditional expressions matching the original lines of input:

while read line; do
  if [[ $line =~ .*\".*$'\r'$ ]]; then
    echo -n "${line:0:$((${#line}-1))}"
  else
    echo "$line"
  fi
done < input.txt

This walks through text, and if it sees a CR, it prints everything up to and not including it, with no trailing newline. For all other lines, it just prints them as usual. The result is that lines that had a carriage return are joined, other lines are not.

From sed's perspective, we're dealing with two input lines, the first of which ends in a carriage return. The strategy for this would be to search for carriage returns, remove them and join the lines. I struggled for a while trying to come up with something that would do this, then gave up. Not to say it's impossible, but I suspect a generally useful script will be lengthy (by sed standards).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM