繁体   English   中英

如何使用python读取特定文件夹中的大量txt文件

[英]How to read a lot of txt file in specific folder using python

请帮助我,我在文件夹中有一些文件txt。 我想阅读并汇总所有数据,成为一个文件txt。 如何使用python做到这一点。 例如 :

folder name : data
file name in that folder : log1.txt
                           log2.txt
                           log3.txt
                           log4.txt
data in log1.txt : Size:         1,116,116,306 bytes
data in log2.txt : Size:         1,116,116,806 bytes
data in log3.txt : Size:         1,457,116,806 bytes
data in log4.txt : Size:         1,457,345,000 bytes

我的预期输出:

   a file txt the result.txt and the data is : 1,116,116,306 
                                               1,116,116,806 
                                               1,457,116,806 
                                               1,457,345,000  

您是说要读取每个文件的内容并将它们全部写入另一个文件中吗?

import os
#returns the names of the files in the directory data as a list
list_of_files = os.listdir("data")
lines=[]
for file in list_of_files:
    f = open(file, "r")
    #append each line in the file to a list
    lines.append(f.readlines())
    f.close()

#write the files to result.txt
result = open("result.txt", "w")
result.writelines(lines)
result.close()

如果要查找文件大小而不是内容。 更改两行:

 f= open(file,"r")
lines.append(f.readlines())

至:

lines.append(os.stat(file).st_size)

文件concat.py

#!/usr/bin/env python
import sys, os

def main():
    folder = sys.argv[1] # argument contains path
    with open('result.txt', 'w') as result: # result file will be in current working directory
        for path in os.walk(folder).next()[2]: # list all files in provided path
            with open(os.path.join(folder, path), 'r') as source:
                result.write(source.read()) # write to result eachi file

main()

用法concat.py <your path>

  1. 您必须找到要阅读的所有文件:

     path = "data" files = os.listdir(path) 
  2. 您必须阅读所有文件,并为每个文件收集大小和内容:

     all_sz = {i:os.path.getsize(path+'/'+i) for i in files} all_data = ''.join([open(path+'/'+i).read() for i in files]) 
  3. 您需要格式化的打印件:

     msg = 'this is ...;' sp2 = ' '*4 sp = ' '*len(msg) + sp2 print msg + sp2, for i in all_sz: print sp, "{:,}".format(all_sz[i]) 

导入os 然后使用os.listdir('data')列出文件夹内容,并将其存储在数组中。 对于每个条目,您都可以通过调用os.stat(entry).st_size来获取大小。 现在,每个条目都可以写入文件。

合并:

import os

outfile = open('result.txt', 'w')
path = 'data'
files = os.listdir(path)
for file in files:
    outfile.write(str(os.stat(path + "/" + file).st_size) + '\n')

outfile.close()

如果需要合并排序的文件以便对输出文件也进行排序,则他们可以使用heapq标准库模块中的merge方法。

from heapq import merge
from os import listdir

files = [open(f) for f in listdir(path)]
with open(outfile, 'w') as out:
    for rec in merge(*files):
        out.write(rec)

如果记录需要以其他顺序进行merge ,则记录将按词汇顺序进行排序。 merge接受key=...可选参数来指定其他排序功能。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM