简体   繁体   English

python文件连接和组合文件

[英]python file concatenation and combining files

My main problem is this: 我的主要问题是:

I have a set of files, and I am concatenating them this way in python: 我有一组文件,我在python中以这种方式连接它们:

   sys.stdout=open("out.dat","w")
filenames = ['bla.txt', 'bla.txt', 'bla.txt']
with open('out.dat', 'w') as outfile:
    for fname in filenames:
        with open(fname) as infile:
            outfile.write(infile.read())
with open('out.dat') as f:
    print "".join(line.strip() for line in f)  
sys.stdout.close()

The bla.txt file looks like bla.txt文件看起来像

aaa AAA

and the intention is to make it look like 而目的是让它看起来像

aaaaaaaaa AAAAAAAAA

(3 times the same string, not on a new line each time...) (相同字符串的3倍,每次都不在新行上...)

for some reason what I do produces an output that looks like 由于某种原因,我所做的产生了一个看起来像的输出

aaaaaa AAAAAA

a 一种

I am not sure why this is happening and if there is a simpler/more elegant solution. 我不确定为什么会这样,如果有一个更简单/更优雅的解决方案。

More second problem is that eventually, my plan is to have a number of different files (letter triplets for example) that I could concatenate in all possible combinations: aaabbbccc,aaacccbbb, ...,etc 更多的第二个问题是,最终,我的计划是拥有许多不同的文件(例如字母三元组),我可以在所有可能的组合中连接:aaabbbccc,aaacccbbb,...等

Any guidance appreciated! 任何指导赞赏! Thank you! 谢谢!

There are some confusing things about your code, I'll leave some comments on the respective places: 关于你的代码有一些令人困惑的事情,我会在各自的地方留下一些评论:

# Not sure what is reason for this
sys.stdout=open("out.dat","w")

filenames = ['bla.txt', 'bla.txt', 'bla.txt']

# This does what you need
with open('out.dat', 'w') as outfile:
    for fname in filenames:
        with open(fname) as infile:
            outfile.write(infile.read())

# Here, you open `out.dat` and rewrites it content back into it - 
# because you made `sys.stdout = open("out.dat", "w")` above.
# All these lines could be removed (along with `sys.stdout` assignment above)
with open('out.dat') as f:
    print "".join(line.strip() for line in f)  
sys.stdout.close()

The most minimalistic approach I could think of: 我能想到的最简约的方法:

# Open output
with open('out.dat', 'w') as outfile:
    # Iterate over each input
    for infilename in ['bla.txt'] * 3:
        # Open each input and write it to output
        with open(infilename) as infile:
            outfile.write(infile.read())

As for your error, it should not be happening, could you confirm that the content of bla.txt is exactly aaa ? 至于你的错误,它应该不会发生,你能否确认bla.txt的内容bla.txtaaa

Nihey Takizawa post almost answers why you've got this error. Nihey Takizawa发帖几乎回答了为什么你有这个错误。 First, let's see what is going on on each step of the program execution. 首先,让我们看看程序执行的每一步发生了什么。

sys.stdout=open("out.dat","w")

This is pretty important. 这非常重要。 Because you replace stdout with file handler to "out.dat", every internal function or statement that use it will write to "out.dat" from now on. 因为用文件处理程序将stdout替换为“out.dat”,所以使用它的每个内部函数或语句将从现在开始写入“out.dat”。

with open('out.dat', 'w') as outfile:
    for fname in filenames:
        with open(fname) as infile:
            outfile.write(infile.read())

After this block, content of the file "out.dat" is: 在此块之后,文件“out.dat”的内容为:

aaa
aaa 
aaa

...or in other words: aaa\\naaa\\naaa\\n where \\n is single character standing for newline. ......或者换句话说: aaa\\naaa\\naaa\\n其中\\n是单字符代表换行符。 Number of chars: 12 (9 times a and 3 times newline \\n ). 字符数:12(9次a和3次换行\\n )。

with open('out.dat') as f:
    print "".join(line.strip() for line in f)

Here is important thing. 这是重要的事情。 Remember, that because in step 1 you've changed sys.stdout to "out.dat" internal function print writes output to "out.dat". 请记住,因为在步骤1中您已将sys.stdout更改为“out.dat”内部函数print将输出写入“out.dat”。

You strip each line and join them, so you write "aaaaaaaaa" to "out.dat". 你剥离每一行并加入它们,所以你写“aaaaaaaa”到“out.dat”。

1  2  3  4  5  6  7  8  9 10 11 12
a  a  a \n  a  a  a \n  a  a  a \n  # this is content of the file before print
a  a  a  a  a  a  a  a  a \n       # that you write, 9 a chars + \n
                                   # which is added by print function by default

Note, that you've replaced 10 out of 12 characters and close the file, so 11 and 12 chars would remain the same. 请注意,您已经替换了12个字符中的10个并关闭了文件,因此11和12个字符将保持不变。 Result is your output. 结果是你的输出。

Solution? 解? NEVER mess with things like by changing sys.stdout file handler unless you know what you're doing. 除非你知道你在做什么,否则千万不要把事情搞砸,比如改变sys.stdout文件处理程序。

EDIT: How to fix your code. 编辑:如何修复您的代码。 I thought that Nihey Takizawa nicely explained how to fix your code, but it's actually not completely correct as I see. 我认为Nihey Takizawa很好地解释了如何修复你的代码,但实际上并不完全正确。 Here's solution: 这是解决方案:

filenames = ['bla.txt', 'bla.txt', 'bla.txt']
with open('out.dat', 'w') as outfile:
    for fname in filenames:
        with open(fname) as infile:
            outfile.write(infile.read().strip())

Now your out.dat file contains aaaaaaaaa string only without newlines. 现在你的out.dat文件只包含一个没有换行符的aaaaaaaaa字符串。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM