[英]Python: Is there a way to extract and concatenate several series of text files, dropping the top 3 rows of each file as I go along?
我有一个包含几个文本文件系列的文件夹,每个文件都包含来自某些分析的单行残差。 他们的文件名是这样的:
'residual_x01'
'residual_x02'
...
'residual_y01'
'residual_y02'
...
'residual_z01'
'residual_z02'
文件的内容如下所示:
1 ### This is the file number in the series
c:\file\location\goes\here
983 1051 0 0 983 1051 ### other identifier
1.1 ### this is where the data I want starts
3.5
0.8
0.7
1.3
... ## so on for about a million lines.
使用Python,我想从这些文件中提取残差,连接形成每个系列的一个长文件(即x,y,z),并将每个文件的前三行删除为I go,即形成:
1.1 ### data from first file of series 'residual_x01 / _y01 / _z01'
3.5
0.8
0.7
1.3
...
1.1 ### data from second file of series 'residual_x02 / _y02 / _z02'
3.5
0.8
0.7
1.3
...
1.1 ### data from third file of series 'residual_x03 / _y03 / _z03'
3.5
0.8
0.7
1.3
... ... and so on.
我不知道该怎么做,有人可以帮忙吗?
你没有提供太多数据,所以我做了一些虚假数据。 我不想制作一堆文件,所以我只制作了三个假数据文件,但代码应该适用于任意数量的文件,每个文件的长度也可以是可变的。
假设您有以下三个文本文件:
文件/residual_x01.txt
1
c:\file\location\goes\here
983 1051 0 0 983 1051
1.1
3.5
0.8
0.7
1.3
文件/residual_x02.txt
2
c:\file\location\goes\here
983 1051 0 0 983 1051
7.1
8.4
0.3
2.3
0.1
文件/residual_y01.txt
1
c:\file\location\goes\here
983 1051 0 0 983 1051
4.2
4.3
1.3
0.2
0.0
代码:
def get_file_lines(path_to_file):
from itertools import islice
number_of_lines_to_skip = 3
with path_to_file.open("r") as file:
_ = list(islice(file, number_of_lines_to_skip))
for line in file:
yield line.strip()
def get_all_floats(path_to_dir):
from pathlib import Path
for path in Path(path_to_dir).glob("residual_*.txt"):
for line in get_file_lines(path):
yield float(line)
def main():
for f in get_all_floats("files/"):
print(f)
return 0
if __name__ == "__main__":
import sys
sys.exit(main())
Output:
1.1
3.5
0.8
0.7
1.3
7.1
8.4
0.3
2.3
0.1
4.2
4.3
1.3
0.2
0.0
>>>
对于每个系列,您可以使用以下代码创建一个文件,其中包含文件中除前 3 行之外的所有行:
filenames = ['residual_x01', 'residual_x02', ...]
output_file = 'path/to/output/residual_x'
lines_to_skip = 3
with open(output_file, 'w') as outfile:
for fname in filenames:
with open(fname) as infile:
lines = infile.readlines()[lines_to_skip:]
for line in lines:
outfile.write(line)
根据您的需要更改filenames
列表和output_file
。 您也可以调整lines_to_skip
变量。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.