[英]content from multiple txt files into single excel file using python
If I have for example 3 txt files that looks as follows:例如,如果我有 3 个如下所示的 txt 文件:
file1.txt:文件 1.txt:
a 10
b 20
c 30
file2.txt:文件2.txt:
d 40
e 50
f 60
file3.txt:文件 3.txt:
g 70
h 80
i 90
I would like to read this data from the files and create a single excel file that will look like this:我想从文件中读取这些数据并创建一个如下所示的 Excel 文件:
Specifically in my case I have 100+ txt files that I read using glob and loop.特别是在我的情况下,我使用 glob 和循环读取了 100 多个 txt 文件。
Thank you谢谢
There's a bit of logic involved into getting the output you need.获得所需的输出涉及一些逻辑。
First, to process the input files into separate lists.首先,将输入文件处理成单独的列表。 You might need to adjust this logic depending on the actual contents of the files.您可能需要根据文件的实际内容调整此逻辑。 You need to be able to get the columns for the files.您需要能够获取文件的列。 For the samples provided my logic works.对于提供的示例,我的逻辑有效。
I added a safety check to see if the input files have the same number of rows.我添加了一个安全检查以查看输入文件是否具有相同的行数。 If they don't it will seriously mess up the resulting excel file.如果他们不这样做,它会严重弄乱生成的 excel 文件。 You'll need to add some logic if a length mismatch happens.如果发生长度不匹配,您需要添加一些逻辑。
For the writing to the excel file, it's very easy using pandas in combination with openpyxl.对于写入 excel 文件,将 pandas 与 openpyxl 结合使用非常容易。 There are likely more elegant solutions, but I'll leave it to you.可能有更优雅的解决方案,但我会把它留给你。
I'm referencing some SO answers in the code for further reading.我在代码中引用了一些 SO 答案以供进一步阅读。
requirements.txt要求.txt
pandas
openpyxl
main.py主文件
# we use pandas for easy saving as XSLX
import pandas as pd
filelist = ["file01.txt", "file02.txt", "file03.txt"]
def load_file(filename: str) -> list:
result = []
with open(filename) as infile:
# the split below is OS agnostic and removes EOL characters
for line in infile.read().splitlines():
# the split below splits on space character by default
result.append(line.split())
return result
loaded_files = []
for filename in filelist:
loaded_files.append(load_file(filename))
# you will want to check if the files have the same number of rows
# it will break stuff if they don't, you could fix it by appending empty rows
# stolen from:
# https://stackoverflow.com/a/10825126/9267296
len_first = len(loaded_files[0]) if loaded_files else None
if not all(len(i) == len_first for i in loaded_files):
print("length mismatch")
exit(419)
# generate empty list of lists so we don't get index error below
# stolen from:
# https://stackoverflow.com/a/33990699/9267296
result = [ [] for _ in range(len(loaded_files[0])) ]
for f in loaded_files:
for index, row in enumerate(f):
result[index].extend(row)
result[index].append('')
# trim the last empty column
result = [line[:-1] for line in result]
# write as excel file
# stolen from:
# https://stackoverflow.com/a/55511313/9267296
# note that there are some other options on this SO question, but this one
# is easily readable
df = pd.DataFrame(result)
writer = pd.ExcelWriter("output.xlsx")
df.to_excel(writer, sheet_name="sheet_name_goes_here", index=False)
writer.save()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.