简体   繁体   中英

How to copy .txt files from many folders to excel in python

I scraped NLP data in.txt format and I need to copy it to Excel. The data is in a folder called 'article'. Within this folder are other subfolders from 2012 to 2020. The subfolders are named as date eg '2012-04-18'. Within each subfolder are several.txt files.

I would like to: Copy contents of each.txt file and match them to their corresponding dates displayed on the subfolder into an excel file. So the excel file will have a column with dates (copied from subfolders) and the corresponding rows containing the contents of.txt files. For reference[1]

The code below could be a starting point although I think its iterating on subfolders without reading each.txt file, and output an empty excel. Any help is appreciated.

import os
from typing import List
import openpyxl
from openpyxl.utils import get_column_letter

def text_into_spreadsheet():
    """main logic for read .txt into spreadsheet"""
    workbook = openpyxl.Workbook()
    sheet = workbook.active
    column: int = 1
    article: List[str] = os.listdir('../FolderA/FolderB/article/') 
    for file in article:
            if file.endswith(".txt"):
                 with open(file) as textfile:
                    lines: List[int] = textfile.readlines()
                    sheet[get_column_letter(column) + '1'] = file
                    row: int = 2
                    for line in lines:
                        sheet[get_column_letter(column) + str(row)]=line
                    row += 1
                    column += 1
    workbook.save('result.xlsx')```


  [1]: https://i.stack.imgur.com/YZfVd.png

From your code, I see as the minimum that you didn't iterate over subfolders like "2012-04-18" as you mentioned before. So you will need to add an additional loop:

root_dir = '../FolderA/FolderB/article/'
for subfolder in os.listdir(root_dir):
    for filename in os.listdir(os.path.join(root_dir, subfolder)):
        if filename.endswith(".txt"):
            with open(os.path.join(root_dir, subfolder, filename)) as fp:
                # ....

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM