简体   繁体   中英

Python script to convert multiple txt files into single one

I'm quite new to python and encountered a problem: I want to write a script that is capable of starting in a base directory with several folders, which have all the same structure in the subdirectories and are numbered with a control variable. Then, the script should go in each of these folders in a subdirectory where multiple txt files are stored. These txt files consits of 3 columns with float values which are separated with tabs and each of the files has 3371 rows (they are all the same in terms of rows and columns). I want the script to copy only the third column (starting with the second row) and put it into the new txt files. The only exception is the first txt file, there it is important that all three columns are copied to the new file. In the other files, every third column of the txt files should be copied in an adjacent column in the new txt file (or csv file, if these is an easier way). So I would like to end up with 300 columns side by side with those values. If possible, I would like to write the corresponding file names in the first line of the new txt file (here defined as column names).

import os
import glob

foldernames1 = []
for foldernames in os.listdir("W:/certaindirectory/"):
    if foldernames.startswith("scan"):
        # print(foldernames)
        foldernames1.append(foldernames)
        

for i in range(1, len(foldernames1)):
    workingpath = "W:/certaindirectory/"+foldernames1[i]+"/.../"
    os.chdir(workingpath)
    myFiles = glob.glob('*.txt')
    column_names = ['X','Y']+myFiles[1:len(myFiles)]
    
    
    files = [open(f) for f in glob.glob('*.txt')]  
    fout = open ("ResultsCombined.txt", 'w')
    
    for row in range(1, 3371): #len(files)):

        for f in files:
            fout.write(f.readline().strip().split('\t')[2])
            fout.write('\t')
        fout.write('\t')
     
    
    fout.close()

As an alternative I also tried to fix it via a csv file, but I wasn't able to fix my problem:

import os
import glob
import csv

foldernames1 = []
for foldernames in os.listdir("W:/certain directory/"):
    if foldernames.startswith("scan"):
        # print(foldernames)
        foldernames1.append(foldernames)
        

for i in range(1, len(foldernames1)):
    workingpath = "W:/certain direcotry/"+foldernames1[i]+"/.../"
    os.chdir(workingpath)
    myFiles = glob.glob('*.txt')
    column_names = ['X','Y']+myFiles[0:len(myFiles)]
    # print(column_names)
    
    with open(""+foldernames1[i]+".csv", 'w', newline='') as target:
        writer = csv.DictWriter(target, fieldnames=column_names)
        writer.writeheader() # if you want a header
        
        for path in glob.glob('*.txt'):
            with open(path, newline='') as source:
                reader = csv.DictReader(source, delimiter='\t', fieldnames=column_names)
                writer.writerows(reader)

Can anyone help me? Both codes do not deliver what I want. They are reading out something, but not the values I am interesed in. I have the feeling my code has also some issues with float numbers?

Many thanks and best regards, quester

pathlib and pandas should make the solution here relatively simple even without knowing the specific file names:

import pandas as pd
from pathlib import Path

p = Path("W:/certain directory/")
# recursively search for .txt files inside all sub directories
txt_files = [txt_file for txt_file in p.rglob("*.txt")]  # p.iterdir() --> glob("*.txt") for none recursive iteration
df = pd.DataFrame()
for path in txt_files:
    # use tab separator, read only 3rd column, name the column, read as floats
    current = pd.read_csv(path, 
                          sep="\t", 
                          usecols=[2], 
                          names=[path.name], 
                          dtype="float64")
    # add header=0 to pd.read_csv if there's a header row in the .txt files
    pd.concat([df, current], axis=1)
df.to_csv("W:/certain directory/floats_third_column.csv", index=False)
    

Hope this helps!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM