[英]Python script for pulling data from txt file to excel
Here is my file:这是我的文件:
example.txt例子.txt
new names_a
tim
jeremy; 24 - age
next
new "names_b"
jordan; 27 - age
alex; 26 - age
steven; 24 - age
next
new names_c
johnny; 20 - age
ron
;joe; 19 - age
brian; 23 - age
next
Here is my code:这是我的代码:
file=open("example.txt", "r")
data=file.read()
categories=data.split('new')
dict_format={}
for categor_data in categories:
items=categor_data.split('\n')
item_name=items[0].replace(" ", "")
item_name=item_name.strip('"')
dict_format[item_name]=items[1:]
for name in dict_format:
print(name)
print("Which category to export?")
answer=input()
with open(answer+".csv",'w') as csv:
for row in dict_format[answer][:-1]:
if row != "":
csv.write(row.replace(";",",")+"\n")
csv.write(dict_format[answer][-1].replace(";",","))
csv.close()
Example output for the category names_c:类别 names_c 的示例输出:
A一个 | B乙 | C C | |
---|---|---|---|
1 1 | johnny约翰尼 | 20 - age 20 - 年龄 | |
2 2 | ron罗恩 | ||
3 3 | joe乔 | 19 - age 19 - 年龄 | |
4 4 | brian布莱恩 | 23 - age 23 - 年龄 | |
5 5 | |||
6 6 | next下一个 |
Question 1:问题一:
Is there a way to either:有没有办法:
a) have the code not read the word 'next' as part of the list a) 让代码不读取单词“next”作为列表的一部分
b) open the file, delete all entries of the word 'next', save the file, close the file, reopen the file and then run the code b)打开文件,删除单词'next'的所有条目,保存文件,关闭文件,重新打开文件然后运行代码
Question 2:问题2:
Is there a way to not output entries starting with ';'?有没有办法不输出以';'开头的条目? ex: ;joe;例如:;乔; 19 - age 19 - 年龄
Question 3:问题 3:
Is there a way to delete empty rows?有没有办法删除空行?
Desired output:期望的输出:
A一个 | B乙 | C C | |
---|---|---|---|
1 1 | johnny约翰尼 | 20 - age 20 - 年龄 | |
2 2 | ron罗恩 | ||
3 3 | brian布莱恩 | 23 - age 23 - 年龄 |
This is one way to do it:这是一种方法:
import pandas as pd
# Read data using Pandas
df = pd.read_csv('example.txt',sep = '\n+', header = None, engine='python')
# Drop rows starting with ';' (e.g. ;joe; 19 - age) and 'next'
df = df.drop(df[df[0].str.startswith((';','next'))].index)
# Split categories
df2 = df[0].str.replace('"','').str.split('new ',expand=True)
# Grop dataframe by categories
df3.fillna(method='ffill',inplace=True)
gp = df4.groupby(1)
dfs = [gp.get_group(x).reset_index().drop('index',axis=1).set_index(1)[0].str.split(';',expand=True).iloc[1:] for x in gp.groups]
# save dataframes
for df in dfs:
df.to_csv(f"{df.index[0]}.csv",index=False)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.