简体   繁体   English

从表格数据中提取列

[英]Extract column from tabular data

My task is to pull a column out of table and write down its length len () .我的任务是从表中拉出一列并写下它的长度len () But my code is emitting it into a column, which is why len () counts each element of the column separately, and not their total但是我的代码将它发送到一列中,这就是为什么len ()分别计算列的每个元素,而不是它们的总数

water = water.readlines()
for col in water:
    el = list(col.split()[2])

water.txt:水.txt:

     HETATM    1  H   HOH A   1      27.265  36.739  58.126
     HETATM    2  H   HOH A   1      27.109  35.124  57.944                          
     HETATM    3  O   HOH A   1      27.486  35.958  57.542
...
     HETATM 9999  O   HOH A3333      30.490  83.899  10.929

Desired intermediary output:所需中介 output:

H
H
O
H
H
O

You are not correctly extracting the colum.您没有正确提取列。 The correct way is with a list comprehension:正确的方法是使用列表理解:

with open(...) as water:
    el = [line.split()[2] for line in water]

With your sample data, I get ['H', 'H', 'O'] for el, which is the third column.使用您的示例数据,我得到['H', 'H', 'O']的 el,这是第三列。

For the future you'll likely use other means to import data in tabular form.将来您可能会使用其他方式以表格形式导入数据。 But this is an important exercise, because the following will apply to most issues you will face.但这是一项重要的练习,因为以下内容适用于您将面临的大多数问题。 The most important initial concept is to use plenty of print statements to understand what each step does.

file = "HETATM    1  H   HOH A   1      27.265  36.739  58.126\nHETATM    2  H   HOH A   1      27.109  35.124  57.944\nHETATM    3  O   HOH A   1      27.486  35.958  57.542\n"
        
lines=file.split('\n')
print(lines)

output is a list of strings: output 是一个字符串列表:

['HETATM    1  H   HOH A   1      27.265  36.739  58.126',
 'HETATM    2  H   HOH A   1      27.109  35.124  57.944',
 'HETATM    3  O   HOH A   1      27.486  35.958  57.542',
 '']

Each line is now still a string, so you need to turn it into a list for example:每行现在仍然是一个字符串,因此您需要将其转换为一个列表,例如:

a=lines[2].split()
print(a)

output is a list of strings, each string one column value for this particular line/row: output 是一个字符串列表,每个字符串对应这个特定的行/行的一列值:

['HETATM', '3', 'O', 'HOH', 'A', '1', '27.486', '35.958', '57.542']

To do that for every line and keep the 3rd column (index 2):要为每一行执行此操作并保留第 3 列(索引 2):

col2=[]  # make an empty list to hold the column

for l in lines:
    if len(l)>1:      # leaves empty lines, also at end of file 
        cols=l.split()
        col2.append(cols[2])


print(col2)    

output is a list representing your 2nd column output 是代表您的第二列的列表

['H', 'H', 'O']

Because Python is used with many packages that do a lot in a single line, and also because of duck-typing, it is more important than in other languages to always know what the result of your last line is, both in type and in meaning.因为 Python 与许多在一行中执行很多操作的包一起使用,并且还因为鸭式打字,所以始终知道最后一行的结果是什么比其他语言更重要,无论是类型还是含义.

In the future you will likely use numpy or pandas to read in tabular data in a single line.将来,您可能会使用numpypandas在一行中读取表格数据。 But to understand that single line can sometimes be hard.但有时很难理解单行。 It is also hard to memorize.也很难记住。 Doing it yourself in low level code as shown above will help you stay connected to your code.如上所示,在低级代码中自己进行操作将帮助您与代码保持联系。 It will also help you to read how other people implemented higher level functions.它还将帮助您了解其他人如何实现更高级别的功能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM