[英]Read multiple TSV files and write to one TSV file Python
So, I have multiple TSV files with the following format: 因此,我有多个具有以下格式的TSV文件:
a b c d e f g h
a_1 b_1 c_1 d_1 e_1 f_1 g_1 h_1
a_2 b_2 c_2 d_2 e_2 f_2 g_2 h_2
. . . . . . . .
. . . . . . . .
. . . . . . . .
a_n b_n c_n d_n e_n f_n g_n h_n
(First line (a, b, ...) is titles) (第一行(a,b,...)是标题)
I want to read them all and if, for each line, the one of the columns has the attribute I want (let's say it's equal to 1), I want to save that line in a different TSV file with the same format as the one above but the data would be filtered. 我想全部读取它们,并且对于每一行,如果其中一列具有我想要的属性(假设它等于1),我想将该行保存在与该列相同格式的另一TSV文件中以上,但数据将被过滤。
I have the code to extract the line I want and write it to a TSV file but I am not sure how to read multiple TSV files and write to a single TSV file. 我具有提取所需行并将其写入TSV文件的代码,但是我不确定如何读取多个TSV文件并写入单个TSV文件。
Here's what I have so far: 这是我到目前为止的内容:
with open("./someDirectory/file.tsv") as in_file,
open("newFile.tsv","w") as out_file:
first_line = True
for line in in_file:
if first_line: #to print the titles
print(line, file=out_file)
first_line = False
columns = line.split("\t")
columnToLookAt = columns[7]
if columnToLookAt == "1":
print(line, file=out_file)
So say that someDirectory has like 80 tsv files. 所以说someDirectory有80个tsv文件。 What's the best way to go about iterating through all those and writing the needed lines to out_file?
遍历所有这些并将所需的行写入out_file的最佳方法是什么?
You can use glob.glob
from the standard library to get the list of filenames according to some pattern: 您可以使用标准库中的
glob.glob
根据某种模式获取文件名列表:
>>> import glob
>>> glob.glob('/tmp/*.tsv')
['/tmp/file1.tsv', '/tmp/file2.tsv', ...]
and then iterate over all those as input files. 然后遍历所有这些作为输入文件。 For example:
例如:
import glob
first_line = True
with open("newFile.tsv","w") as out_file:
for in_path in glob.glob("./someDirectory/*.tsv"):
with open(in_path) as in_file:
for line in in_file:
if first_line: #to print the titles
print(line, file=out_file)
first_line = False
columns = line.split("\t")
columnToLookAt = columns[7]
if columnToLookAt == "1":
print(line, file=out_file)
As a side note, you can also use csv.reader
module to read tab-separated-value files, by setting dialect='excel-tab'
. 附带说明,您还可以通过设置
dialect='excel-tab'
来使用csv.reader
模块读取制表符分隔值的文件。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.