简体   繁体   English

如何将两个文本文件合并为一个?

[英]How to combine two text file as one?

I have two text files. 我有两个文本文件。 I want to combine some of their columns in a new text file. 我想将它们的某些列合并到一个新的文本文件中。

I am trying this, but it is not working: 我正在尝试这样做,但是没有用:

with open('1','r') as first:
    with open('2', 'r') as second:
        data1 = first.readlines()
        for line in data1:
            output = [(item.strip(), line.split(' ')[2]) for item in second]
            f = open("1+2","w")
            f.write("%s  %s\n" .format(output))
            f.close()

first text file that I have: 我拥有的第一个文本文件:

1
2
3
4

Second text file that I have: 我拥有的第二个文本文件:

1 3
2 5
5 7
7 3

I want a new file with the column in first file and second column in second file, which is like this: 我想要一个新文件,第一个文件中的列,第二个文件中的第二列,如下所示:

1 3
2 5
3 7
4 3

You can iterate over the respective line pairs, and concatenate the first column of first file with the second column of the second: 您可以遍历相应的线对,并将第一个文件的第一列与第二个文件的第二列连接起来:

with open('file_1.txt') as f1, open('file_2.txt') as f2, open('new_file.txt', 'w') as fr:
    for line in ("{} {}".format(l1.rstrip('\n'), l2.split(maxsplit=1)[1]) for l1, l2 in zip(f1, f2)):
        fr.write(line)

If you're sure that the columns are separated by a single space, you can also use str.partition like: 如果确定列之间用单个空格分隔,则也可以使用str.partition例如:

l2.partition(' ')[-1]

Example: 例:

In [28]: with open('file_1.txt') as f1, open('file_2.txt') as f2, open('new_file.txt', 'w') as fr:
    ...:     for line in ("{} {}".format(l1.rstrip('\n'), l2.split(maxsplit=1)[1]) for l1, l2 in zip(f1, f2)):
    ...:         fr.write(line)
    ...:     

In [29]: cat new_file.txt
1 3
2 5
3 7
4 3

As an aside, when you don't have same number of rows in both files, and you want to keep operating on the longest one, you can look at itertools.zip_longest instead of zip . 顺便说一句,当两个文件中的行数都不相同,并且希望保持最长的行数时,可以查看itertools.zip_longest而不是zip

Assuming your both file are data file, you can use the numpy module. 假设两个文件都是数据文件,则可以使用numpy模块。

  • loadtxt loads text file in array. loadtxt将文本文件加载到数组中。
  • savetxt saves an array in a text file. savetxt将数组保存在文本文件中。 You can also specify the format of number saved with the fmt option. 您也可以指定使用fmt选项保存的数字格式。

Here the code: 这里的代码:

import numpy as np

data1 = np.loadtxt("file1.txt")
data2 = np.loadtxt("file2.txt")
print(data1)
# [1. 2. 3. 4.]
print(data2)
# [[1. 3.]
#  [2. 5.]
#  [5. 7.]
#  [7. 3.]]

data2[:, 0] = data1
print(data2)
# [[1. 3.]
#  [2. 5.]
#  [3. 7.]
#  [4. 3.]]
np.savetxt('output.txt', data2, fmt="%d")
from itertools import izip with open("file1.txt") as textfile1, open("file2.txt") as textfile2, open('output.txt', 'w') as out: for x, y in izip(textfile1, textfile2): x = x.strip() y = y.split(" ")[1].strip() print("{0} {1}".format(x, y)) out.write("{0} {1}\n".format(x, y))

There are many interesting answers as to how to do that , but none of them show how to fix your code. 关于如何执行此操作 ,有很多有趣的答案,但是没有一个显示如何修复您的代码。 I find it better for learning when we understand our own mistakes, rather than get a solution ;) 当我们了解自己的错误而不是找到解决方案时,我发现学习起来更好;)


Tuple in the same line has object names the other way around - you want line (from 1st file) stripped and item (from 2nd) split and took second element (that would be [1] ) 同一行中的元组具有相反的对象名称-您要剥离(来自第一个文件的行)和项(来自第二个文件)的拆分,并采用第二个元素(即[1]

With those small changes (and others, described in comments), we get: 通过这些小的更改(以及注释中描述的其他更改),我们得到:

with open('1','r') as first:
    with open('2', 'r') as second:
        #data1 = first.readlines() #don't do that, iterate over the file
        for line in first: #changed
            output = [(line.strip(), item.split(' ')[1]) for item in second]
            f = open("1+2","a") #use mode "a" for appending - otherwise you're overwriting your data!
            f.write("{}  {}".format(output)) # don't mix python2 and python3 syntax, removed extra newline
            f.close()

But it's still wrong. 但这仍然是错误的。 Why? 为什么? Because for item in second - you're parsing whole second file here. 因为for item in second -您在这里解析整个第二个文件。 In the first ever line from 1st file. 在第一个文件的第一行中。

We need to change it so that we only take one element. 我们需要对其进行更改,以便仅采用一个元素。 I'd recommend you read this question and explanations about iterators . 我建议您阅读此问题以及有关迭代器的说明

Now let's apply this knowledge: second is an iterator. 现在,我们应用一下这一知识: second是迭代器。 We only need one element from it and we need to do it manually (because we're in another loop - looping over 2 things at once is a tricky thing), so we'll be using next(second) : 我们只需要其中的一个元素,就需要手动进行操作(因为我们处于另一个循环中-一次循环处理两件事是一件棘手的事情),因此我们将使用next(second)

with open('1','r') as first:
    with open('2', 'r') as second:
        for line in first: 
            item = next(second)
            output = (line.strip(), item.split(' ')[1]) #no list comprehension here
            f = open("1+2","a") 
            f.write("{}  {}".format(*output)) #you have to unpack the tuple
            f.close()

Explanation about unpacking - basically, when you pass just output , Python sees it as once element and doesn't know what to do with the other {} . 关于解包的解释 -基本上,当您仅传递output ,Python会将其视为一次元素,并且不知道如何处理其他{} You have to say "hey, treat this iterable (in this case: 2-element tuple) as single elements, not a whole" and that's how this * does. 您必须说“嘿,将此可迭代的对象(在这种情况下为2元素元组)视为单个元素,而不是整体”,这就是*作用。 :) :)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM