简体   繁体   English

将多个txt文件转换为单个文件的Python脚本

[英]Python script to convert multiple txt files into single one

I'm quite new to python and encountered a problem: I want to write a script that is capable of starting in a base directory with several folders, which have all the same structure in the subdirectories and are numbered with a control variable.我对python很陌生,遇到了一个问题:我想编写一个脚本,该脚本能够在具有多个文件夹的基目录中启动,这些文件夹在子目录中具有相同的结构,并用控制变量编号。 Then, the script should go in each of these folders in a subdirectory where multiple txt files are stored.然后,脚本应该放在每个文件夹中的一个子目录中,其中存储了多个 txt 文件。 These txt files consits of 3 columns with float values which are separated with tabs and each of the files has 3371 rows (they are all the same in terms of rows and columns).这些 txt 文件由 3 列浮点值组成,用制表符分隔,每个文件有 3371 行(它们在行和列方面都相同)。 I want the script to copy only the third column (starting with the second row) and put it into the new txt files.我希望脚本只复制第三列(从第二行开始)并将其放入新的 txt 文件中。 The only exception is the first txt file, there it is important that all three columns are copied to the new file.唯一的例外是第一个 txt 文件,重要的是将所有三列都复制到新文件中。 In the other files, every third column of the txt files should be copied in an adjacent column in the new txt file (or csv file, if these is an easier way).在其他文件中,应将 txt 文件的每第三列复制到新 txt 文件(或 csv 文件,如果这是一种更简单的方法)的相邻列中。 So I would like to end up with 300 columns side by side with those values.所以我想最终得到与这些值并排的 300 列。 If possible, I would like to write the corresponding file names in the first line of the new txt file (here defined as column names).如果可能的话,我想在新建的txt文件的第一行写对应的文件名(这里定义为列名)。

import os
import glob

foldernames1 = []
for foldernames in os.listdir("W:/certaindirectory/"):
    if foldernames.startswith("scan"):
        # print(foldernames)
        foldernames1.append(foldernames)
        

for i in range(1, len(foldernames1)):
    workingpath = "W:/certaindirectory/"+foldernames1[i]+"/.../"
    os.chdir(workingpath)
    myFiles = glob.glob('*.txt')
    column_names = ['X','Y']+myFiles[1:len(myFiles)]
    
    
    files = [open(f) for f in glob.glob('*.txt')]  
    fout = open ("ResultsCombined.txt", 'w')
    
    for row in range(1, 3371): #len(files)):

        for f in files:
            fout.write(f.readline().strip().split('\t')[2])
            fout.write('\t')
        fout.write('\t')
     
    
    fout.close()

As an alternative I also tried to fix it via a csv file, but I wasn't able to fix my problem:作为替代方案,我也尝试通过 csv 文件修复它,但我无法解决我的问题:

import os
import glob
import csv

foldernames1 = []
for foldernames in os.listdir("W:/certain directory/"):
    if foldernames.startswith("scan"):
        # print(foldernames)
        foldernames1.append(foldernames)
        

for i in range(1, len(foldernames1)):
    workingpath = "W:/certain direcotry/"+foldernames1[i]+"/.../"
    os.chdir(workingpath)
    myFiles = glob.glob('*.txt')
    column_names = ['X','Y']+myFiles[0:len(myFiles)]
    # print(column_names)
    
    with open(""+foldernames1[i]+".csv", 'w', newline='') as target:
        writer = csv.DictWriter(target, fieldnames=column_names)
        writer.writeheader() # if you want a header
        
        for path in glob.glob('*.txt'):
            with open(path, newline='') as source:
                reader = csv.DictReader(source, delimiter='\t', fieldnames=column_names)
                writer.writerows(reader)

Can anyone help me?谁能帮我? Both codes do not deliver what I want.两个代码都没有提供我想要的。 They are reading out something, but not the values I am interesed in. I have the feeling my code has also some issues with float numbers?他们正在读出一些东西,但不是我感兴趣的值。我感觉我的代码也有一些浮点数问题?

Many thanks and best regards, quester非常感谢和最好的问候,探员

pathlib and pandas should make the solution here relatively simple even without knowing the specific file names: pathlib 和 pandas 应该使这里的解决方案相对简单,即使不知道具体的文件名:

import pandas as pd
from pathlib import Path

p = Path("W:/certain directory/")
# recursively search for .txt files inside all sub directories
txt_files = [txt_file for txt_file in p.rglob("*.txt")]  # p.iterdir() --> glob("*.txt") for none recursive iteration
df = pd.DataFrame()
for path in txt_files:
    # use tab separator, read only 3rd column, name the column, read as floats
    current = pd.read_csv(path, 
                          sep="\t", 
                          usecols=[2], 
                          names=[path.name], 
                          dtype="float64")
    # add header=0 to pd.read_csv if there's a header row in the .txt files
    pd.concat([df, current], axis=1)
df.to_csv("W:/certain directory/floats_third_column.csv", index=False)
    

Hope this helps!希望这可以帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM