简体   繁体   English

用于从 txt 文件中提取数据到 excel 的 Python 脚本

[英]Python script for pulling data from txt file to excel

Here is my file:这是我的文件:

example.txt例子.txt

new names_a

    tim
    jeremy; 24 - age

next

new "names_b"

    jordan; 27 - age
    alex; 26 - age
    steven; 24 - age

next

new names_c

    johnny; 20 - age
    ron
    ;joe; 19 - age
    brian; 23 - age

next

Here is my code:这是我的代码:


file=open("example.txt", "r")
data=file.read()
categories=data.split('new')
dict_format={}
for categor_data in categories:
    items=categor_data.split('\n')
    item_name=items[0].replace(" ", "")
    item_name=item_name.strip('"')
    dict_format[item_name]=items[1:]

for name in dict_format:
    print(name)

print("Which category to export?")
answer=input()

with open(answer+".csv",'w') as csv:
    for row in dict_format[answer][:-1]:
        if row != "":
            csv.write(row.replace(";",",")+"\n")

    csv.write(dict_format[answer][-1].replace(";",","))
    csv.close()

Example output for the category names_c:类别 names_c 的示例输出:

A一个 B C C
1 1 johnny约翰尼 20 - age 20 - 年龄
2 2 ron罗恩
3 3 joe 19 - age 19 - 年龄
4 4 brian布莱恩 23 - age 23 - 年龄
5 5
6 6 next下一个

Question 1:问题一:

Is there a way to either:有没有办法:

a) have the code not read the word 'next' as part of the list a) 让代码不读取单词“next”作为列表的一部分

b) open the file, delete all entries of the word 'next', save the file, close the file, reopen the file and then run the code b)打开文件,删除单词'next'的所有条目,保存文件,关闭文件,重新打开文件然后运行代码

Question 2:问题2:

Is there a way to not output entries starting with ';'?有没有办法不输出以';'开头的条目? ex: ;joe;例如:;乔; 19 - age 19 - 年龄

Question 3:问题 3:

Is there a way to delete empty rows?有没有办法删除空行?

Desired output:期望的输出:

A一个 B C C
1 1 johnny约翰尼 20 - age 20 - 年龄
2 2 ron罗恩
3 3 brian布莱恩 23 - age 23 - 年龄

This is one way to do it:这是一种方法:

import pandas as pd
# Read data using Pandas
df = pd.read_csv('example.txt',sep = '\n+', header = None, engine='python')
# Drop rows starting with ';' (e.g. ;joe; 19 - age) and 'next'
df = df.drop(df[df[0].str.startswith((';','next'))].index)
# Split categories
df2 = df[0].str.replace('"','').str.split('new ',expand=True)
# Grop dataframe by categories
df3.fillna(method='ffill',inplace=True)
gp = df4.groupby(1)
dfs = [gp.get_group(x).reset_index().drop('index',axis=1).set_index(1)[0].str.split(';',expand=True).iloc[1:] for x in gp.groups]
# save dataframes 
for df in dfs:
    df.to_csv(f"{df.index[0]}.csv",index=False)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM