简体   繁体   English

如何在Python中使用数字读取文件中的特定行?

[英]How to read specific lines in a file with numbers in Python?

I am writing a script to calculate the average and standard deviation for some measurements that I have. 我正在编写一个脚本,以计算某些测量的平均值和标准偏差。 I would like to read the file and make it select the data that I want. 我想读取文件,并使其选择所需的数据。

Let's say I have the table as below: 假设我的表格如下:

(1 2 3 4;
 4 x x x; 
 4 x x x; 
 4 x x x; 
 4 x x x)

now I want to make the script such that I will be able to select all the values that are under 1, then all the values under 2 and so on, so which files I import depend on the value of the first line. 现在,我想制作一个脚本,以便能够选择所有小于1的值,然后选择小于2的所有值,依此类推,因此我导入的文件取决于第一行的值。

You want to use the enumerate() function. 您想使用enumerate()函数。

    with open(filename,'r') as file_object:
        for line_number, line in enumerate(file_object):
            if line_number in list_of_line_numbers:
                do_stuff_to(line)

Where list_of_line_numbers is a list containing the lines you want to take. 其中list_of_line_numbers是包含您要采用的行的列表。 This approach also has the advantage of not loading the entire file into memory, in the event that you're working with something big. 如果您使用的是大型文件,此方法还具有不将整个文件加载到内存中的优势。

More info on the enumerate function: 有关枚举函数的更多信息:

https://docs.python.org/3/library/functions.html#enumerate https://docs.python.org/3/library/functions.html#enumerate

If your data set is not too large I would consider using a pandas.DataFrame from the Pandas Wrangling Library : 如果您的数据集不太大,我会考虑使用Pandas Wrangling Library中pandas.DataFrame

pandas.DataFrame(two_dimensional_array_like_object)

If you have a csv ( example.csv ) that looks like: 如果您的csv( example.csv )如下所示:

1,2,3
2,3,4
3,4,5

Importing this into a pandas.DataFrame : 将其导入到pandas.DataFrame

In[7]: import pandas as pd

In[8]: df = pd.read_csv('example.csv', headers=False)

In[9]: print(df)
   0  1  2
0  1  2  3
1  2  3  4
2  3  4  5

Now you have an extremely functional object ( df ) that has many built in methods for data wrangling. 现在,您有了一个功能非常强大的对象( df ),该对象具有许多内置的数据整理方法。

To perform your intended slicing: 要执行预期的切片:

In[10]: df_copy = df.loc[df[0]==2, :] # select rows that have the number 2 in the first column and make a copy
In[11]: print(df_copy) # print selected rows
   0  1  2
1  2  3  4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM