简体   繁体   中英

How can I change a value in a column if that line contains a specific word using sed/awk or bash while keeping the whitespacing?

I have a pdb that looks like:

ATOM      1  P     A 2   1     224.160 179.728 151.662  1.00 40.00           P  
ATOM      2  OP1   A 2   1     225.507 179.132 151.738  1.00 40.00           O  
ATOM      3  CA    A 2   1     223.640 180.497 152.816  1.00 40.00           O  
ATOM      4  O5'   A 2   1     224.374 180.738 150.465  1.00 40.00           O 

I want to change the 11th column to 1.0000 if a line contains atom CA and save these changes in the same file.

How can I do that using sed, awk or bash so that I keep the same spacing between the columns? Thank you

Awk will do the job.

awk '$1  == "ATOM" && $3 == "CA" { $11 = 1.0 } { print }' <infile > outfile

Google awk for more information, as this is a basic tool worth learning

Assuming fixed width columns, as per comment below, the awk script can be modified to specify FIELDWIDTHS. The values need to be checked, as question not clear about exact widths.

awk -v 'FIELDWIDTHS=4 8 6 4 1 6 9 9 9 6 5 12' '
$1  == "ATOM" && $3 == "CA" { $11 = 1.0 }
{ print }
'

sed -E '/ CA /s/[^ ]+/1.000/11' file

(GNU sed, assuming spaces and not tabs)

This uses 11 after the replacement to replace the 11th word. The replacement only happens on lines matching / CA /

-E is required for the + to work as intended.

You may want to tailor the whitespace or replacement string to your exact requirements. Since's it's only affecting the 11th column, you can do exactly whatever you want.

The following sed command(s) will work:

sed '/ CA /s/\([^ ]\+ \+[^ ]\+ \+[^ ]\+ \+[^ ]\+ \+[^ ]\+ \+[^ ]\+ \+[^ ]\+ \+[^ ]\+ \+[^ ]\+ \+[^ ]\+ \+\)....../\11.0000/'

or:

sed -E '/ CA /s/([^ ]+ +[^ ]+ +[^ ]+ +[^ ]+ +[^ ]+ +[^ ]+ +[^ ]+ +[^ ]+ +[^ ]+ +[^ ]+ +)....../\11.0000/'

or (with bash):

X="[^ ]+ +"; sed -E "/ CA /s/($X$X$X$X$X$X$X$X$X$X)....../\11.0000/"

or:

X="[^ ]\+ \+"; sed "/ CA /s/\($X$X$X$X$X$X$X$X$X$X\)....../\11.0000/"

to give:

ATOM      1  P     A 2   1     224.160 179.728 151.662  1.00 40.00           P  
ATOM      2  OP1   A 2   1     225.507 179.132 151.738  1.00 40.00           O  
ATOM      3  CA    A 2   1     223.640 180.497 152.816  1.00 1.0000          O  
ATOM      4  O5'   A 2   1     224.374 180.738 150.465  1.00 40.00           O

Explanation:

  • / CA / if a line contains the token "CA", then
  • s/($X$X$X$X$X$X$X$X$X$X)....../ replace the first ten columns and the first six characters of the 11th column by
  • \\11.0000/ what was already in the ten columns, and by "1.0000" in the 11th.

Refinements:

  • This assumes the "CA" is not at the start of the first column; this can be fixed using /\\<CA\\>/ .
  • If there are tabs, replace spaces in the above with [[:space]] .
  • The above fails if the existing 11th column has more than six non-blank characters. If you know in advance that it has say at most eight characters, add two extra dots to ...... and two spaces after the "1.0000".
  • Otherwise, you can first reduce the 11th column to a single non-blank character by running:

     X="[^ ]\\+ \\+"; sed "/ CA /{:a;s/\\($X$X$X$X$X$X$X$X$X$X\\)\\([^ ]\\+\\)[^ ] /\\1\\2 /;ta}" 

If you know that the 11th column is always 16 characters wide, the following sed command:

sed '/ CA /s/[^ ]\+ \+/1.0000          /11'

will give:

ATOM      1  P     A 2   1     224.160 179.728 151.662  1.00 40.00           P  
ATOM      2  OP1   A 2   1     225.507 179.132 151.738  1.00 40.00           O  
ATOM      3  CA    A 2   1     223.640 180.497 152.816  1.00 1.0000          O  
ATOM      4  O5'   A 2   1     224.374 180.738 150.465  1.00 40.00           O

Explanation: On lines with the token CA , this replaces the 11th column with 1.0000 followed by 10 spaces.

With some versions of sed , you may need to replace \\+ with \\{1,\\} , as in:

sed '/ CA /s/[^ ]\{1,\} \{1,\}/1.0000          /11'

Alternatively, if you know that the 11th column always begins at the 62nd character and is 16 characters wide, the following will also work:

sed -i '/ CA /s/\(.\{61\}\).\{16\}/\11.0000          /' filename

Explanation:

  • On lines with the token "CA", / CA /
  • Capture the first 61 characters with \\(.\\{61\\}\\) , and keep them with \\1
  • And replace the next 16 characters, .\\{16\\} , with 1.0000 followed by 10 spaces.
  • The -i switch modifies the file in place.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM