[英]How can I sort lines and select some characters of it, of a text file in python?
The lines of my text file are: 我的文本文件的行是:
<< end of ENERGY.
iupac_m_486_> OE1/2 will be swapped: -136.1396 1 1
openf___224_> Open Dominio1.BL00100001.pdb
wrpdb___568_> Residues, atoms, selected atoms: 268 2115 2115
>> Summary of successfully produced loop models:
Filename molpdf
----------------------------------------
Dominio1.BL00010001.pdb 24.69530
Dominio1.BL00020001.pdb 14.33748
Dominio1.BL00030001.pdb 30.53454
Dominio1.BL00040001.pdb 23.82516
Dominio1.BL00050001.pdb 27.48684
Dominio1.BL00060001.pdb 18.17364
Dominio1.BL00070001.pdb 30.98407
Dominio1.BL00080001.pdb 17.19927
Dominio1.BL00090001.pdb 19.02460
Dominio1.BL00100001.pdb 22.57086
I want to create a code that selects the number line (last 10 lines)that has the smallest number (identify),and read the name of the .pdb (just the 24 characters of the line that has the smallest number).Cause, I need identify what's the .pdb that has the smallest number, and use it like a string in other script, like this: 我想创建一个代码来选择编号最小的数字行(最后10行)(标识),并读取.pdb的名称(恰好是编号最小的行的24个字符)。我需要确定具有最小编号的.pdb是什么,并像在其他脚本中的字符串一样使用它,如下所示:
model='%s'%R
模型= '%s' 的%R
where '%s'%R is the name of .pdb that i need 其中'%s'%R是我需要的.pdb的名称
How can I do it? 我该怎么做?
You need to use min
function with a proper key : 您需要使用带有适当键的
min
函数:
>>> min(s.split('\n\n'),key=lambda x:float(x.split()[-1])).split()[0]
'Dominio1.BL00020001.pdb'
Demo : 演示:
>>> s="""Dominio1.BL00010001.pdb 24.69530
...
... Dominio1.BL00020001.pdb 14.33748
...
... Dominio1.BL00030001.pdb 30.53454
...
... Dominio1.BL00040001.pdb 23.82516
...
... Dominio1.BL00050001.pdb 27.48684
...
... Dominio1.BL00060001.pdb 18.17364
...
... Dominio1.BL00070001.pdb 30.98407
...
... Dominio1.BL00080001.pdb 17.19927
...
... Dominio1.BL00090001.pdb 19.02460
...
... Dominio1.BL00100001.pdb 22.57086"""
>>> min(s.split('\n\n'),key=lambda x:float(x.split()[-1]))
'Dominio1.BL00020001.pdb 14.33748'
>>> min(s.split('\n\n'),key=lambda x:float(x.split()[-1])).split()[0]
'Dominio1.BL00020001.pdb'
A normal file read operation will do 正常的文件读取操作即可
data = file.readlines()
pdb_files = []
float_values = []
for line in data:
pdb,float_value = line.split()
pdb_files.append(pdb)
float_values.append(float(float_value))
min_float_index = float_values.indexof(min(float_values))
print pdb_files.index(min_float_index)
This code stores the data in two lists, and finds the least of the float values given. 此代码将数据存储在两个列表中,并找到给定的最小浮点值。 Then prints the corresponding entry of the pdb filename
然后输出pdb文件名的相应条目
Try this: 尝试这个:
def get_minimal_value_entry(file_name):
with open(file_name, 'r') as f:
# the value of a line is the second member of 'split' result
key = lambda x: float(x.strip().split()[1])
return min(f, key=key).split()[0]
# 'test' file holds the data...
print get_minimal_value_entry('test')
# prints Dominio1.BL00020001.pdb
If you have empty lines use itertools.ifilter
to filter empty lines: 如果您有空行,请使用
itertools.ifilter
过滤空行:
from itertools import ifilter
def get_minimal_value_entry(file_name):
with open(file_name, 'r') as f:
# the value of a line is the second member of 'split' result
key = lambda x: float(x.strip().split()[1])
return min(ifilter(lambda x: x.split(), f), key=key).split()[0]
# 'test' file holds the data...
print get_minimal_value_entry('test')
# prints Dominio1.BL00020001.pdb
I'd use Python re
. 我会用Python
re
。
file.txt
Dominio1.BL00010001.pdb 24.69530
Dominio1.BL00020001.pdb 14.33748
Dominio1.BL00030001.pdb 30.53454
Dominio1.BL00040001.pdb 23.82516
Dominio1.BL00050001.pdb 27.48684
Dominio1.BL00060001.pdb 18.17364
Dominio1.BL00070001.pdb 30.98407
Dominio1.BL00080001.pdb 17.19927
Dominio1.BL00090001.pdb 19.02460
Dominio1.BL00100001.pdb 22.57086
sorts.py
import re
lines = open('file.txt').readlines() # readlines
lines = [i.strip() for i in lines] # remove newlines
lines = [re.sub('\s+', ' ', i) for i in lines] # remove extra spaces
lines = [i.split(' ') for i in lines] # split by space
lines = [i for i in lines if i != ['']] # remove empty lines
lines = sorted(lines, key = lambda i: float(i[1])) # sort by id
print lines[0][0] # print item with least id
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.