简体   繁体   English

Python:连接文本文件

[英]Python: concatenating text files

Using Python, I'm seeking to iteratively combine two set of txt files to create a third set of txt files.使用 Python,我试图迭代地组合两组 txt 文件以创建第三组 txt 文件。

I have a directory of txt files in two categories:我有两个类别的txt文件目录:

  1. text_[number].txt (eg: text_0.txt , text_1.txt , text_2.txt .... text_20.txt ) text_[number].txt (例如: text_0.txt , text_1.txt , text_2.txt .... text_20.txt
  2. comments_[number].txt (eg: comments_0.txt , comments_1.txt , comments_2.txt ... comments_20.txt ). comments_[number].txt (例如: comments_0.txtcomments_1.txtcomments_2.txt ... comments_20.txt )。

I'd like to iteratively combine the text_[number] files with the matching comments_[number] files into a new file category feedback_[number].txt .我想迭代地将text_[number]文件与匹配的comments_[number]文件组合成一个新的文件类别feedback_[number].txt The script would combine text_0.txt and comments_0.txt into feedback_0.txt , and continue through each pair in the directory.该脚本会将text_0.txtcomments_0.txt合并到feedback_0.txt ,并继续遍历目录中的每一对。 The number of text and comments files will always match, but the total number of text and comment files is variable depending on preceding scripts. textcomments文件的数量将始终匹配,但textcomment文件的总数因前面的脚本而异。

I can combine two pairs using the code below with a list of file pairs:我可以使用下面的代码和文件对列表组合两对:

filenames = ['text_0.txt', 'comments_0.txt']

with open("feedback_0.txt", "w") as outfile:
    for filename in filenames:
        with open(filename) as infile:
            contents = infile.read()
            outfile.write(contents)

However, I'm uncertain how to structure iteration for the rest of the files.但是,我不确定如何为其余文件构建迭代。 I'm also curious how to generate lists from the contents of the file directory.我也很好奇如何从文件目录的内容生成列表。 Any advice or assistance on moving forward is greatly appreciated.非常感谢任何有关前进的建议或帮助。

It would be far simpler (and possibly faster) to just fork a cat process:只 fork 一个cat进程会更简单(并且可能更快):

import subprocess


n = ... # number of files
for i in range(n):
    with open(f'feedback_{i}.txt', 'w') as f:
        subprocess.run(['cat', 'text_{i}.txt', 'comments_{i}.txt'], stdout=f)

Or, if you already have lists of the file names:或者,如果您已经有文件名列表:

for text, comment, feedback in zip(text_files, comment_files, feedback_files):
    with open(feedback, 'w') as f:
        subprocess.run(['cat', text, comment], stdout=f)

Unless these are all extremely small files, the cost of reading and writing the bytes will outweigh the cost of forking a new process for each pair.除非这些都是非常小的文件,否则读取和写入字节的成本将超过为每对创建一个新进程的成本。

Maybe not the most elegant but...也许不是最优雅的,但......

length = 10
txt = [f"text_{n}.txt" for n in range(length)]
com = [f"comments_{n}.txt" for n in range(length)]
feed = [f"feedback_{n}.txt" for n in range(length)]

for f, t, c in zip(feed, txt, com):
    with open(f, "w") as outfile:
        with open(t) as infile1:
            contents = infile1.read()
            outfile.write(contents)
        with open(c) as infile2:
            contents = infile2.read()
            outfile.write(contents)

The simplest way would probably be to just iterate from 1 onwards, stopping at the first missing file.最简单的方法可能是从 1 开始迭代,在第一个丢失的文件处停止。 This works assuming that your files are numbered in increasing order and with no gaps (eg you have 1, 2, 3 and not 1, 3).这假设您的文件按递增顺序编号并且没有间隙(例如,您有 1、2、3 而不是 1、3)。

import os
from itertools import count

for i in count(1):
    t = f'text_{i}.txt'
    c = f'comments_{i}.txt'

    if not os.path.isfile(t) or not os.path.isfile(c):
        break

    with open(f'feedback_{i}.txt', 'wb') as outfile:
        outfile.write(open(t, 'rb').read())
        outfile.write(open(c, 'rb').read())

There are many ways to achieve this, but I don't seem to see any solution that's both beginner-friendly and takes into account the structure of the files you described.有很多方法可以实现这一点,但我似乎没有看到任何既适合初学者又考虑到您描述的文件结构的解决方案。

You can iterate through the files, and for every text_[num].txt , fetch the corresponding comments_[num].txt and write to feedback_[num].txt as shown below.您可以遍历文件,对于每个text_[num].txt ,获取相应的comments_[num].txt并写入feedback_[num].txt text_[num].txt ,如下所示。 There's no need to add any counters or make any other assumptions about the files that might not always be true:无需添加任何计数器或对可能不总是正确的文件做出任何其他假设:

import os

srcpath = 'path/to/files'

for f in os.listdir(srcpath):
    if f.startswith('text'):
        index = f[5:-4] # extract the [num] part

        # Build the paths to text, comment, feedback files
        txt_path = os.path.join(srcpath, f)
        cmnt_path = os.path.join(srcpath, f'comments_{index}.txt')
        fb_path = os.path.join(srcpath, f'feedback_{index}.txt')

        # write to output – reading in in byte mode following chepner's advice
        with open(fb_path, 'wb') as outfile:
            outfile.write(open(txt_path, 'rb').read())
            outfile.write(open(cmnt_path, 'rb').read())

You can try this你可以试试这个

filenames = ['text_0.txt', 'comments_0.txt','text_1.txt', 'comments_1.txt','text_2.txt', 'comments_2.txt','text_3.txt', 'comments_3.txt']
for i,j in enumerate (zip(filenames[::2],filenames[1::2])):
    with open(f'feedback_{i}','a+') as file:
        for k in j:
        with open(k,'r') as f:
                files=f.read()
                file.write(files)

I have taken a list here.我在这里列出了一份清单。 Instead, you can do相反,你可以做

import os
filenames=os.listdir('path/to/folder')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM