[英]math operations on 3 lists at the same time
我有六个文件(来自蛋白质数据库),其中包含两种称为 CYS 和 LYS 的蛋白质的 x 、 y 、 z 坐标。 最终目标是计算每个文件中每个 LYS 与每个 CYS 之间的距离。
我已经提取了坐标,然后放在六个单独的列表中。 现在我需要计算与 xyz 坐标的距离为:
dist = math.sqrt((xc - xl)**2 + (yc - yl)**2 + (zc - zl)**2)
但我不知道如何遍历六个列表来计算每个文件中 CYS 和 LYS 之间的距离。
以下是文件内容的样子(仅以从文件中复制包含 LYS 的部分为例):
ATOM 43 CA LYS A 7 106.336 41.686 -11.244 1.00 21.93 C
ATOM 44 C LYS A 7 106.561 41.901 -12.727 1.00 21.10 C
ATOM 45 O LYS A 7 106.327 43.032 -13.214 1.00 24.85 O
ATOM 46 CB LYS A 7 107.553 41.913 -10.402 1.00 24.26 C
ATOM 47 CG LYS A 7 107.550 41.181 -9.058 1.00 33.89 C
ATOM 48 CD LYS A 7 108.522 41.766 -8.051 1.00 35.19 C
ATOM 49 CE LYS A 7 109.455 40.737 -7.453 1.00 58.09 C
ATOM 50 NZ LYS A 7 110.799 40.722 -8.120 1.00 55.93 N
ATOM 51 N THR A 8 106.979 40.859 -13.401 1.00 19.73 N
ATOM 52 CA THR A 8 107.196 40.777 -14.860 1.00 21.18 C
ATOM 53 C THR A 8 105.925 41.136 -15.620 1.00 21.07 C
ATOM 54 O THR A 8 105.925 42.020 -16.497 1.00 14.72 O
这是我的代码:
BaseDir=os.getcwd()
all_files = np.sort(glob('*[0-600]*.ent'))
for filename in all_files:
Xc = [] # X coordinate of CYS
Yc = []
Zc = []
Xl = [] # X coordinate of LYS
Yl = []
Zl = []
f = open(filename)
Lines = f.readlines()
for i in range(1, len(Lines)):
if 'CA CYS' in Lines[i]:
linec = Lines[i].split()
if 'CA CYS' in Lines[i] and linec[0]=='ATOM':
xc, yc, zc = linec[6] , linec[7], linec[8]
Xc.append(xc)
Yc.append(yc)
Zc.append(zc)
if 'CA LYS' in Lines[i]:
linel = Lines[i].split()
if 'CA LYS' in Lines[i] and linel[0]=='ATOM':
xl, yl, zl = linel[6] , linel[7], linel[8]
Xl.append(xl)
Yl.append(yl)
Zl.append(zl)
dist = math.sqrt((xc - xl)**2 + (yc - yl)**2 + (zc - zl)**2)
当我打印(Xc,文件名)时,它返回:
['87.372', '73.504', '86.059', '82.490', '74.176', '80.312'] 1.ent
['22.872', '13.708'] 2.ent
[] 3.ent
['62.740', '33.741', '18.064', '46.480', '36.255', '63.534', '49.543', '22.826'] 4.ent
['23.404', '-2.617', '50.714', '11.544', '38.216', '-17.818', '-7.237', '21.019', '-19.612', '37.235', '8.371', '51.634'] 5.ent
['66.407', '63.032', '60.134', '14.158', '17.494', '20.312'] 6.ent
当我打印(Xl,文件名)时:
['106.336', '105.826', '101.645', '81.196', '90.656', '96.290', '97.616', '93.983'] 1.ent
['4.430', '5.438', '19.787', '14.569', '23.059', '22.801', '16.723', '15.916'] 2.ent
['22.609', '32.122', '43.387', '41.576', '41.878', '38.004', '33.163', '38.948', '30.836', '23.899'] 3.ent
['21.847', '11.694', '10.507', '11.545', '11.775', '19.945', '27.931', '37.720', '46.445', '32.629', '30.896', '20.769', '16.377', '9.590', '15.170', '14.925', '47.464', '41.800', '24.277', '51.964', '36.706', '30.401', '25.410', '30.474', '50.309', '49.434', '40.009', '44.067', '43.220', '47.551', '52.487', '48.386', '40.121', '37.329', '21.309', '29.918', '35.721', '16.986', '14.680', '11.808', '11.466', '12.679', '17.290', '27.441', '27.388', '16.853', '52.991', '63.359', '67.769', '73.203', '68.424', '71.665', '34.917', '43.296', '60.160', '34.711', '50.052', '56.439', '60.780', '55.977', '37.295', '37.875', '47.683', '44.875', '42.006', '37.175', '32.072', '39.541', '48.253', '49.848', '65.227', '57.237', '48.009', '67.401', '70.352', '73.582', '74.629', '73.458', '70.474', '61.632', '60.699', '68.440'] 4.ent
['-0.840', '32.630', '27.111', '5.772', '0.552', '5.795', '27.208', '25.416', '24.445', '15.503', '33.113', '19.430', '17.972', '22.147', '27.065', '16.759', '12.083', '-3.498', '10.533', '-10.681', '-8.709', '2.418', '-7.800', '-22.468', '-19.818', '-22.713', '-19.877', '-10.223', '-12.596', '-21.356', '1.043', '-4.927', '-21.858', '-21.388', '-15.276', '3.474', '1.652', '-0.966', '-8.278', '23.326', '-1.463', '9.358', '13.785', '18.642', '7.074', '1.475', '-6.532', '-3.374', '-14.994', '2.388', '18.468', '-1.254', '55.980'] 5.ent
['67.045', '49.407', '52.772', '52.214', '55.680', '55.832', '78.610', '67.134', '79.549', '80.258', '80.339', '74.666', '73.443', '65.523', '67.405', '70.133', '66.798', '61.540', '49.690', '49.952', '50.093', '43.900', '49.549', '45.703', '39.861', '54.826', '59.250', '66.840', '43.908', '37.976'] 6.ent
这是一个开始:
import numpy as np
from scipy.spatial.distance import cdist
cys_coords = np.loadtxt("cys_data.txt", usecols=(6, 7, 8))
lys_coords = np.loadtxt("lys_data.txt", usecols=(6, 7, 8)) # assuming the same format
distances = cdist(cys_coords, lys_coords)
您可以修改它以循环遍历文件路径字符串列表以读取您的数据。 如果您事先知道您有多少数据点,您可以为您的 CYS 和 LYS 数据预先分配 numpy 数组。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.