简体   繁体   English

我正在尝试使用line.split('\\ n)->无法识别行从类似文件的目录中打印一些关键行

[英]I am trying to print a few key lines from a directory of like files using line.split('\n) --> not recognizing lines

So this input file already has line breaks. 所以这个输入文件已经换行符。 It's the natural setting in which it's created. 这是一种在它的创建自然环境。 When I attempt to identify certain lines so that I can go back and call the values from said lines i get, 当我试图找出某些行,这样我可以回去,并呼吁从该行我得到的值,

name = line[2]
IndexError: list index out of range

Any thoughts? 有什么想法吗? I know there has to be an easy solution as this is fairly basic but I have sifted through every entry on splitting and splitting with ('\\n') and nothing has worked. 我知道必须有一个简单的解决方案,因为这是很基本的,但我已经通过关于拆分和分裂的每个条目与(“\\ n”),并没有奏效筛选。 Any help from you folks would be greatly appreciated! 您的任何帮助将不胜感激!

-Ut prosim -UT PROSIM

Input : 输入

    ID  rpmI_bact
    AC  TIGR00001
    DE  ribosomal protein L35

Script 脚本

for i in info.readlines():

    line = i.split('\n')
    id_hit = line[0]
    ac = line[1]
    name = line[2]

    print(name)

Error 错误

name = line[2]
IndexError: list index out of range

First of all, when you do readlines , you will get back a list of all the lines your file, which will probably look something like this: 首先,当您执行readlines ,您将获得文件所有行的列表,这可能看起来像这样:

['    ID  rpmI_bact', '    AC  TIGR00001', '    DE  ribosomal protein L35']

If you take one of these values and then try to split on newlines, you won't get anything split: 如果采用以下值之一,然后尝试在换行符之间进行拆分,则不会拆分任何内容:

'    ID  rpmI_bact'.split('\n')
['    ID  rpmI_bact']

Notice that the return value is a list with one element, so that's why you get your IndexError . 请注意,返回值是一个包含一个元素的列表,因此这就是为什么要获取IndexError的原因。

Now, it seems like you want to take each line and split on whitespace? 现在,好像您要占用每行并在空白处分割? If so, the way to do that is to use split(' ') , but this is going to give you potentially unreliable content back: 如果是这样,执行此操作的方法是使用split(' ') ,但这将使您潜在地获得不可靠的内容:

In [8]: for line in lines:
   ...:     print(line.split(' '))
   ...:     
['', '', '', '', 'ID', '', 'rpmI_bact']
['', '', '', '', 'AC', '', 'TIGR00001']
['', '', '', '', 'DE', '', 'ribosomal', 'protein', 'L35']

Notice how it's not obvious where the "content" is? 注意“内容”在哪里并不明显吗? We can solve this in a few ways. 我们可以通过几种方式解决此问题。 One is to introduce regexes, while the other way is to simply take the values that are not empty strings (note that empty strings in Python are False y values): 一种是引入正则表达式,另一种方式是简单地获取非空字符串的值(请注意,Python中的空字符串是False y值):

In [9]: bool("")
Out[9]: False
In [10]: for line in lines:
    ...:     print([elem for elem in line.split(' ') if elem])
    ...:     
['ID', 'rpmI_bact']
['AC', 'TIGR00001']
['DE', 'ribosomal', 'protein', 'L35']

Now you have to figure out what you want to do with those lists. 现在,您必须弄清楚您要使用这些列表做什么。 I didn't really get that from the question, though. 不过,我并没有从问题中真正得到这一点。

I'd probably consider just making it a dictionary. 我可能会考虑只是将它做成字典。 Then you can query it by the 2 letter key. 然后,您可以通过2个字母的键进行查询。 No need for the .readlines() either. 也不需要.readlines()

d = dict(line.strip().split('  ', 2) for line in info)

That should give you a dictionary looking like 那应该给你字典像

{'AC': 'TIGR00001', 'DE': 'ribosomal protein L35', 'ID': 'rpmI_bact'}

Then you can just access the ID you're interested in 然后,您只需访问您感兴趣的ID

name = d['DE']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM