[英]How to read a lot of txt file in specific folder using python
Please help me, i have some file txt in folder. 请帮助我,我在文件夹中有一些文件txt。 I want to read and summary all data become one file txt. 我想阅读并汇总所有数据,成为一个文件txt。 How can I do it with python. 如何使用python做到这一点。 for example : 例如 :
folder name : data
file name in that folder : log1.txt
log2.txt
log3.txt
log4.txt
data in log1.txt : Size: 1,116,116,306 bytes
data in log2.txt : Size: 1,116,116,806 bytes
data in log3.txt : Size: 1,457,116,806 bytes
data in log4.txt : Size: 1,457,345,000 bytes
My expected output: 我的预期输出:
a file txt the result.txt and the data is : 1,116,116,306
1,116,116,806
1,457,116,806
1,457,345,000
Did you mean you want to read the contents of each file and write all of them in to a different file. 您是说要读取每个文件的内容并将它们全部写入另一个文件中吗?
import os
#returns the names of the files in the directory data as a list
list_of_files = os.listdir("data")
lines=[]
for file in list_of_files:
f = open(file, "r")
#append each line in the file to a list
lines.append(f.readlines())
f.close()
#write the files to result.txt
result = open("result.txt", "w")
result.writelines(lines)
result.close()
If you are looking for size of file instead of the contents. 如果要查找文件大小而不是内容。 change the two lines : 更改两行:
f= open(file,"r")
lines.append(f.readlines())
to: 至:
lines.append(os.stat(file).st_size)
File concat.py
文件concat.py
#!/usr/bin/env python
import sys, os
def main():
folder = sys.argv[1] # argument contains path
with open('result.txt', 'w') as result: # result file will be in current working directory
for path in os.walk(folder).next()[2]: # list all files in provided path
with open(os.path.join(folder, path), 'r') as source:
result.write(source.read()) # write to result eachi file
main()
Usage concat.py <your path>
用法concat.py <your path>
You have to find all files that you are going to read: 您必须找到要阅读的所有文件:
path = "data" files = os.listdir(path)
You have to read all files and for each of them to collect the size and the content: 您必须阅读所有文件,并为每个文件收集大小和内容:
all_sz = {i:os.path.getsize(path+'/'+i) for i in files} all_data = ''.join([open(path+'/'+i).read() for i in files])
You need a formatted print: 您需要格式化的打印件:
msg = 'this is ...;' sp2 = ' '*4 sp = ' '*len(msg) + sp2 print msg + sp2, for i in all_sz: print sp, "{:,}".format(all_sz[i])
Import os
. 导入os
。 Then list the folder contents using os.listdir('data')
and store it in an array. 然后使用os.listdir('data')
列出文件夹内容,并将其存储在数组中。 For each entry you can get the size by calling os.stat(entry).st_size
. 对于每个条目,您都可以通过调用os.stat(entry).st_size
来获取大小。 Each of these entries can now be written to a file. 现在,每个条目都可以写入文件。
Combined: 合并:
import os
outfile = open('result.txt', 'w')
path = 'data'
files = os.listdir(path)
for file in files:
outfile.write(str(os.stat(path + "/" + file).st_size) + '\n')
outfile.close()
If one needs to merge sorted files so that the output file is sorted too, they can use the merge
method from the heapq
standard library module. 如果需要合并排序的文件以便对输出文件也进行排序,则他们可以使用heapq
标准库模块中的merge
方法。
from heapq import merge
from os import listdir
files = [open(f) for f in listdir(path)]
with open(outfile, 'w') as out:
for rec in merge(*files):
out.write(rec)
Records are kept sorted in lexical order, if one needs something different merge
accepts a key=...
optional argument to specify a different ordering function. 如果记录需要以其他顺序进行merge
,则记录将按词汇顺序进行排序。 merge
接受key=...
可选参数来指定其他排序功能。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.