I scraped NLP data in.txt format and I need to copy it to Excel. The data is in a folder called 'article'. Within this folder are other subfolders from 2012 to 2020. The subfolders are named as date eg '2012-04-18'. Within each subfolder are several.txt files.
I would like to: Copy contents of each.txt file and match them to their corresponding dates displayed on the subfolder into an excel file. So the excel file will have a column with dates (copied from subfolders) and the corresponding rows containing the contents of.txt files. For reference[1]
The code below could be a starting point although I think its iterating on subfolders without reading each.txt file, and output an empty excel. Any help is appreciated.
import os
from typing import List
import openpyxl
from openpyxl.utils import get_column_letter
def text_into_spreadsheet():
"""main logic for read .txt into spreadsheet"""
workbook = openpyxl.Workbook()
sheet = workbook.active
column: int = 1
article: List[str] = os.listdir('../FolderA/FolderB/article/')
for file in article:
if file.endswith(".txt"):
with open(file) as textfile:
lines: List[int] = textfile.readlines()
sheet[get_column_letter(column) + '1'] = file
row: int = 2
for line in lines:
sheet[get_column_letter(column) + str(row)]=line
row += 1
column += 1
workbook.save('result.xlsx')```
[1]: https://i.stack.imgur.com/YZfVd.png
From your code, I see as the minimum that you didn't iterate over subfolders like "2012-04-18" as you mentioned before. So you will need to add an additional loop:
root_dir = '../FolderA/FolderB/article/'
for subfolder in os.listdir(root_dir):
for filename in os.listdir(os.path.join(root_dir, subfolder)):
if filename.endswith(".txt"):
with open(os.path.join(root_dir, subfolder, filename)) as fp:
# ....
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.