简体   繁体   English

如何使用python从目录模拟排序和唯一文件?

[英]How I can simulateneously sort and unique files from directory using python?

I am trying to sort and unique 30 files of different sizes in one single file. 我正在尝试在一个文件中对30个大小不同的文件进行排序和唯一化。
Each file contains a single line and are separated by newline. 每个文件包含一行,并由换行符分隔。 That means the file has simple text on each single line. 这意味着文件的每一行都有简单的文本。
Here is what I tried to attempt: 这是我尝试尝试的方法:

lines_seen = set() # holds lines already seen
outfile = open('out.txt', "w")
for line in open('d:\\testing\\*', "r"):
    if line not in lines_seen: # not a duplicate
        outfile.write(line)
        lines_seen.add(line)
outfile.close()

The folder name is testing and it contains 30 different files, which I am trying to combine into file out.txt . 文件夹名称正在testing ,它包含30个不同的文件,我正在尝试将其合并到文件out.txt The output will be the sorted and unique text, written on each line of the output file. 输出将是排序并唯一的文本,写在输出文件的每一行上。
Well, I thought it would be easy, if I write d:\\\\testing\\\\* and it will read the files from the folder. 好吧,我认为这很容易,如果我写d:\\\\testing\\\\* ,它将从该文件夹中读取文件。 But I got error: 但是我得到了错误:

Traceback (most recent call last):
  File "sort and unique.py", line 3, in <module>
    for line in open('d:\\testing\\*', "r"):
OSError: [Errno 22] Invalid argument: 'd:\\testing\\*'

I would like to know how I can get rid of this error and process my all files efficiently into one single output without any unsuccess. 我想知道如何摆脱这个错误,并有效地将所有文件处理成一个输出而不会失败。
Please note: RAM is 8 GB and the folder size is about 10 GB. 请注意:RAM为8 GB,文件夹大小约为10 GB。

You just need to loop over all files using os.listdir . 您只需要使用os.listdir遍历所有文件。 Something like this: 像这样:

lines_seen = set() # holds lines already seen
outfile = open('out.txt', "w")
path = r'd:\testing'
for file in os.listdir(path): #added this line
    current_file = os.path.join(path, file)
    for line in open(current_file, "r"):
        if line not in lines_seen: # not a duplicate
            outfile.write(line)
            lines_seen.add(line)
outfile.close()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用Python从指定目录(随机)打开一系列文件(PNG)? - How can I open a series of files (PNGs) from a specified directory (randomly) using Python? 使用 python 中的 lambda 对目录中的文件进行递归排序 - sort files recursively in a directory using lambda in python 如何列出 python 中的目录文件? - How can i list the directory files in python? 如何从目录中打开文件? - How can I open files from directory? 如何使用 python 将 email 从一个目录移动到另一个目录 - How can I move an email from a directory to another using python 如何使用Python对这些数据进行排序? - How can I sort this data using Python? 如何将.txt文件保存到文件中? Python或shell脚本 - How can I save sort .txt into files? Python or shell script 如何将文件从Python程序包(站点程序包)复制到目录? - How can I copy files from a Python package (site-packages) to a directory? 如何从单个目录中读取多个csv文件并在Python中单独绘制它们? - How can I read multiple csv files from a single directory and graph them separately in Python? 如何在Python的目录(包括子目录)中有效地选择100个随机JPG文件? - How can I efficiently select 100 random JPG files from a directory (including subdirs) in Python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM