用于从 txt 文件中提取数据到 excel 的 Python 脚本

Question

这是我的文件：

例子.txt

new names_a

    tim
    jeremy; 24 - age

next

new "names_b"

    jordan; 27 - age
    alex; 26 - age
    steven; 24 - age

next

new names_c

    johnny; 20 - age
    ron
    ;joe; 19 - age
    brian; 23 - age

next

这是我的代码：


file=open("example.txt", "r")
data=file.read()
categories=data.split('new')
dict_format={}
for categor_data in categories:
    items=categor_data.split('\n')
    item_name=items[0].replace(" ", "")
    item_name=item_name.strip('"')
    dict_format[item_name]=items[1:]

for name in dict_format:
    print(name)

print("Which category to export?")
answer=input()

with open(answer+".csv",'w') as csv:
    for row in dict_format[answer][:-1]:
        if row != "":
            csv.write(row.replace(";",",")+"\n")

    csv.write(dict_format[answer][-1].replace(";",","))
    csv.close()

类别 names_c 的示例输出：

	一个	乙	C
1	约翰尼	20 - 年龄
2	罗恩
3		乔	19 - 年龄
4	布莱恩	23 - 年龄
5
6	下一个

问题一：

有没有办法：

a) 让代码不读取单词“next”作为列表的一部分

b）打开文件，删除单词'next'的所有条目，保存文件，关闭文件，重新打开文件然后运行代码

问题2：

有没有办法不输出以';'开头的条目？ 例如：;乔; 19 - 年龄

问题 3：

有没有办法删除空行？

期望的输出：

	一个	乙
1	约翰尼	20 - 年龄
2	罗恩
3	布莱恩	23 - 年龄

Answer 1

这是一种方法：

import pandas as pd
# Read data using Pandas
df = pd.read_csv('example.txt',sep = '\n+', header = None, engine='python')
# Drop rows starting with ';' (e.g. ;joe; 19 - age) and 'next'
df = df.drop(df[df[0].str.startswith((';','next'))].index)
# Split categories
df2 = df[0].str.replace('"','').str.split('new ',expand=True)
# Grop dataframe by categories
df3.fillna(method='ffill',inplace=True)
gp = df4.groupby(1)
dfs = [gp.get_group(x).reset_index().drop('index',axis=1).set_index(1)[0].str.split(';',expand=True).iloc[1:] for x in gp.groups]
# save dataframes 
for df in dfs:
    df.to_csv(f"{df.index[0]}.csv",index=False)

用于从 txt 文件中提取数据到 excel 的 Python 脚本

问题描述

1 个解决方案

解决方案1
0 2022-05-26 06:59:48

用于从 txt 文件中提取数据到 excel 的 Python 脚本

问题描述

1 个解决方案

解决方案1 0 2022-05-26 06:59:48

解决方案1
0 2022-05-26 06:59:48