I have a pdb that looks like:
ATOM 1 P A 2 1 224.160 179.728 151.662 1.00 40.00 P
ATOM 2 OP1 A 2 1 225.507 179.132 151.738 1.00 40.00 O
ATOM 3 CA A 2 1 223.640 180.497 152.816 1.00 40.00 O
ATOM 4 O5' A 2 1 224.374 180.738 150.465 1.00 40.00 O
I want to change the 11th column to 1.0000 if a line contains atom CA and save these changes in the same file.
How can I do that using sed, awk or bash so that I keep the same spacing between the columns? Thank you
Awk will do the job.
awk '$1 == "ATOM" && $3 == "CA" { $11 = 1.0 } { print }' <infile > outfile
Google awk
for more information, as this is a basic tool worth learning
Assuming fixed width columns, as per comment below, the awk script can be modified to specify FIELDWIDTHS. The values need to be checked, as question not clear about exact widths.
awk -v 'FIELDWIDTHS=4 8 6 4 1 6 9 9 9 6 5 12' '
$1 == "ATOM" && $3 == "CA" { $11 = 1.0 }
{ print }
'
sed -E '/ CA /s/[^ ]+/1.000/11' file
(GNU sed, assuming spaces and not tabs)
This uses 11
after the replacement to replace the 11th word. The replacement only happens on lines matching / CA /
-E
is required for the +
to work as intended.
You may want to tailor the whitespace or replacement string to your exact requirements. Since's it's only affecting the 11th column, you can do exactly whatever you want.
The following sed command(s) will work:
sed '/ CA /s/\([^ ]\+ \+[^ ]\+ \+[^ ]\+ \+[^ ]\+ \+[^ ]\+ \+[^ ]\+ \+[^ ]\+ \+[^ ]\+ \+[^ ]\+ \+[^ ]\+ \+\)....../\11.0000/'
or:
sed -E '/ CA /s/([^ ]+ +[^ ]+ +[^ ]+ +[^ ]+ +[^ ]+ +[^ ]+ +[^ ]+ +[^ ]+ +[^ ]+ +[^ ]+ +)....../\11.0000/'
or (with bash):
X="[^ ]+ +"; sed -E "/ CA /s/($X$X$X$X$X$X$X$X$X$X)....../\11.0000/"
or:
X="[^ ]\+ \+"; sed "/ CA /s/\($X$X$X$X$X$X$X$X$X$X\)....../\11.0000/"
to give:
ATOM 1 P A 2 1 224.160 179.728 151.662 1.00 40.00 P
ATOM 2 OP1 A 2 1 225.507 179.132 151.738 1.00 40.00 O
ATOM 3 CA A 2 1 223.640 180.497 152.816 1.00 1.0000 O
ATOM 4 O5' A 2 1 224.374 180.738 150.465 1.00 40.00 O
Explanation:
/ CA /
if a line contains the token "CA", then s/($X$X$X$X$X$X$X$X$X$X)....../
replace the first ten columns and the first six characters of the 11th column by \\11.0000/
what was already in the ten columns, and by "1.0000" in the 11th. Refinements:
/\\<CA\\>/
. [[:space]]
. ......
and two spaces after the "1.0000". Otherwise, you can first reduce the 11th column to a single non-blank character by running:
X="[^ ]\\+ \\+"; sed "/ CA /{:a;s/\\($X$X$X$X$X$X$X$X$X$X\\)\\([^ ]\\+\\)[^ ] /\\1\\2 /;ta}"
If you know that the 11th column is always 16 characters wide, the following sed command:
sed '/ CA /s/[^ ]\+ \+/1.0000 /11'
will give:
ATOM 1 P A 2 1 224.160 179.728 151.662 1.00 40.00 P
ATOM 2 OP1 A 2 1 225.507 179.132 151.738 1.00 40.00 O
ATOM 3 CA A 2 1 223.640 180.497 152.816 1.00 1.0000 O
ATOM 4 O5' A 2 1 224.374 180.738 150.465 1.00 40.00 O
Explanation: On lines with the token CA
, this replaces the 11th column with 1.0000
followed by 10 spaces.
With some versions of sed , you may need to replace \\+
with \\{1,\\}
, as in:
sed '/ CA /s/[^ ]\{1,\} \{1,\}/1.0000 /11'
Alternatively, if you know that the 11th column always begins at the 62nd character and is 16 characters wide, the following will also work:
sed -i '/ CA /s/\(.\{61\}\).\{16\}/\11.0000 /' filename
Explanation:
/ CA /
\\(.\\{61\\}\\)
, and keep them with \\1
.\\{16\\}
, with 1.0000
followed by 10 spaces. -i
switch modifies the file in place.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.