[英]Python: Is there a way to extract and concatenate several series of text files, dropping the top 3 rows of each file as I go along?
I have a folder containing several series of text files, each containing a single row of residuals from some analysis.我有一个包含几个文本文件系列的文件夹,每个文件都包含来自某些分析的单行残差。 Their file names are like this:
他们的文件名是这样的:
'residual_x01'
'residual_x02'
...
'residual_y01'
'residual_y02'
...
'residual_z01'
'residual_z02'
The contents of the files look like this:文件的内容如下所示:
1 ### This is the file number in the series
c:\file\location\goes\here
983 1051 0 0 983 1051 ### other identifier
1.1 ### this is where the data I want starts
3.5
0.8
0.7
1.3
... ## so on for about a million lines.
Using Python, I would like to extract the residuals from these files, concatenate to form one long file for each series (ie x, y, z), and remove the top three lines of each file as I go, ie to form this:使用Python,我想从这些文件中提取残差,连接形成每个系列的一个长文件(即x,y,z),并将每个文件的前三行删除为I go,即形成:
1.1 ### data from first file of series 'residual_x01 / _y01 / _z01'
3.5
0.8
0.7
1.3
...
1.1 ### data from second file of series 'residual_x02 / _y02 / _z02'
3.5
0.8
0.7
1.3
...
1.1 ### data from third file of series 'residual_x03 / _y03 / _z03'
3.5
0.8
0.7
1.3
... ... and so on.
I am at a loss as to how to to this, can anyone help?我不知道该怎么做,有人可以帮忙吗?
You didn't provide much data, so I made some bogus data.你没有提供太多数据,所以我做了一些虚假数据。 I didn't want to make a bunch of files, so I only made three fake data files, but the code should work for any number of files, and the length of each file can be variable, too.
我不想制作一堆文件,所以我只制作了三个假数据文件,但代码应该适用于任意数量的文件,每个文件的长度也可以是可变的。
Let's say you've got the following three text files:假设您有以下三个文本文件:
files/residual_x01.txt文件/residual_x01.txt
1
c:\file\location\goes\here
983 1051 0 0 983 1051
1.1
3.5
0.8
0.7
1.3
files/residual_x02.txt文件/residual_x02.txt
2
c:\file\location\goes\here
983 1051 0 0 983 1051
7.1
8.4
0.3
2.3
0.1
files/residual_y01.txt文件/residual_y01.txt
1
c:\file\location\goes\here
983 1051 0 0 983 1051
4.2
4.3
1.3
0.2
0.0
Code:代码:
def get_file_lines(path_to_file):
from itertools import islice
number_of_lines_to_skip = 3
with path_to_file.open("r") as file:
_ = list(islice(file, number_of_lines_to_skip))
for line in file:
yield line.strip()
def get_all_floats(path_to_dir):
from pathlib import Path
for path in Path(path_to_dir).glob("residual_*.txt"):
for line in get_file_lines(path):
yield float(line)
def main():
for f in get_all_floats("files/"):
print(f)
return 0
if __name__ == "__main__":
import sys
sys.exit(main())
Output: Output:
1.1
3.5
0.8
0.7
1.3
7.1
8.4
0.3
2.3
0.1
4.2
4.3
1.3
0.2
0.0
>>>
For each series, you can create a file containing all the lines from files except the first 3 lines of each using this code:对于每个系列,您可以使用以下代码创建一个文件,其中包含文件中除前 3 行之外的所有行:
filenames = ['residual_x01', 'residual_x02', ...]
output_file = 'path/to/output/residual_x'
lines_to_skip = 3
with open(output_file, 'w') as outfile:
for fname in filenames:
with open(fname) as infile:
lines = infile.readlines()[lines_to_skip:]
for line in lines:
outfile.write(line)
Change filenames
list and output_file
according to your needs.根据您的需要更改
filenames
列表和output_file
。 Also you can tweak lines_to_skip
variable.您也可以调整
lines_to_skip
变量。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.