用于从 txt 文件中提取数据到 excel 的 Python 脚本

Question

Here is my file:这是我的文件：

example.txt例子.txt

new names_a

    tim
    jeremy; 24 - age

next

new "names_b"

    jordan; 27 - age
    alex; 26 - age
    steven; 24 - age

next

new names_c

    johnny; 20 - age
    ron
    ;joe; 19 - age
    brian; 23 - age

next

Here is my code:这是我的代码：


file=open("example.txt", "r")
data=file.read()
categories=data.split('new')
dict_format={}
for categor_data in categories:
    items=categor_data.split('\n')
    item_name=items[0].replace(" ", "")
    item_name=item_name.strip('"')
    dict_format[item_name]=items[1:]

for name in dict_format:
    print(name)

print("Which category to export?")
answer=input()

with open(answer+".csv",'w') as csv:
    for row in dict_format[answer][:-1]:
        if row != "":
            csv.write(row.replace(";",",")+"\n")

    csv.write(dict_format[answer][-1].replace(";",","))
    csv.close()

Example output for the category names_c:类别 names_c 的示例输出：

	A一个	B乙	C C
1 1	johnny约翰尼	20 - age 20 - 年龄
2 2	ron罗恩
3 3		joe乔	19 - age 19 - 年龄
4 4	brian布莱恩	23 - age 23 - 年龄
5 5
6 6	next下一个

Question 1:问题一：

Is there a way to either:有没有办法：

a) have the code not read the word 'next' as part of the list a) 让代码不读取单词“next”作为列表的一部分

b) open the file, delete all entries of the word 'next', save the file, close the file, reopen the file and then run the code b）打开文件，删除单词'next'的所有条目，保存文件，关闭文件，重新打开文件然后运行代码

Question 2:问题2：

Is there a way to not output entries starting with ';'?有没有办法不输出以';'开头的条目？ ex: ;joe;例如：;乔; 19 - age 19 - 年龄

Question 3:问题 3：

Is there a way to delete empty rows?有没有办法删除空行？

Desired output:期望的输出：

	A一个	B乙
1 1	johnny约翰尼	20 - age 20 - 年龄
2 2	ron罗恩
3 3	brian布莱恩	23 - age 23 - 年龄

Answer 1

This is one way to do it:这是一种方法：

import pandas as pd
# Read data using Pandas
df = pd.read_csv('example.txt',sep = '\n+', header = None, engine='python')
# Drop rows starting with ';' (e.g. ;joe; 19 - age) and 'next'
df = df.drop(df[df[0].str.startswith((';','next'))].index)
# Split categories
df2 = df[0].str.replace('"','').str.split('new ',expand=True)
# Grop dataframe by categories
df3.fillna(method='ffill',inplace=True)
gp = df4.groupby(1)
dfs = [gp.get_group(x).reset_index().drop('index',axis=1).set_index(1)[0].str.split(';',expand=True).iloc[1:] for x in gp.groups]
# save dataframes 
for df in dfs:
    df.to_csv(f"{df.index[0]}.csv",index=False)

用于从 txt 文件中提取数据到 excel 的 Python 脚本

问题描述

1 个解决方案

解决方案1
0 2022-05-26 06:59:48

用于从 txt 文件中提取数据到 excel 的 Python 脚本

问题描述

1 个解决方案

解决方案1 0 2022-05-26 06:59:48

解决方案1
0 2022-05-26 06:59:48