[英]Converting a text file to csv with columns
I want to convert a text file to a csv file with the columns such name,date,Description Im new to python so not getting a proper way to do this can someone guide me regarding this.我想将文本文件转换为 csv 文件,其中包含名称、日期、描述等列我是 python 的新手,所以没有正确的方法来做这件事有人可以指导我。 below is the sample text file.下面是示例文本文件。
================================================== ====
Title: Whole case
Location: oyuri
From: Aki
Date: 2018/11/30 (Friday) 11:55:29
================================================== =====
1: Aki
2018/12/05 (Wed) 17:33:17
An approval notice has been sent.
-------------------------------------------------- ------------------
2: Aki
2018/12/06 (Thursday) 17:14:30
I was notified by Mr. Id, the agent of the other party.
-------------------------------------------------- ------------------
3: kano, etc.
2018/12/07 (Friday) 11:44:45
Please call rito.
-------------------------------------------------- ------------------
I outline below a very simplistic approach to achieving your task.我在下面概述了一种非常简单的方法来完成您的任务。 The general idea is to:总体思路是:
open()
使用open()
读入你的文本文件list
将文本拆分为list
list
隔离list
中每个元素中的信息pandas
使用 pandas 将信息导出到pandas
I would recommend using Jupyter Notebooks to get a better idea of what I have done here.我建议使用 Jupyter Notebooks 来更好地了解我在这里所做的事情。
import pandas as pd
# open file and extract text
text_path = 'text.txt'
with open(text_path) as f:
text = f.read()
# split text into a list
lines = text.split('\n')
# remove heading
len_heading = 6
lines = lines[6:]
# seperate information using divider
divider = '-----'
data = []
start = 0
for i, line in enumerate(lines):
# add elements to data if divider found
if line.startswith(divider):
data.append(lines[start:i])
start = i+1
# extract name, date and description from data
names, dates, description = [], [], []
for info in data:
# this is a very simplistic approach, please add checks
# to make sure you are getting the right data
name = info[0][2:]
date = info[1][:11]
desc = info[2]
names.append(name)
dates.append(date)
description.append(desc)
# create pandas dataframe
df = pd.DataFrame({'name': names, 'date': dates, 'description': description})
# export dataframe to csv
df.to_csv('converted_text.csv', index=False)
You should get a CSV file that looks like this.你应该得到一个看起来像这样的 CSV 文件。
np.where(cond, 1, 0).cumsum()
to tag every separate msg.然后使用np.where(cond, 1, 0).cumsum()
标记每个单独的味精。# read the file with only one col
df = pd.read_csv(file, sep='\n', header=None)
# located the row contains ------ or ======
cond = df[0].str.contains('-----|======')
df['tag'] = np.where(cond, 1, 0).cumsum()
# filter the line contains msg
cond2 = df['tag'] >=2
dfn = df[(~cond & cond2)].copy()
# output
df_output = (dfn.groupby('tag')[0]
.apply('\n'.join)
.str.split('\n', n=2, expand=True))
df_output.columns = ['name', 'date', 'Description']
output: output:
name date \
tag
2.0 1: Aki 2018/12/05 (Wed) 17:33:17
3.0 2: Aki 2018/12/06 (Thursday) 17:14:30
4.0 3: kano, etc. 2018/12/07 (Friday) 11:44:45
Description
tag
2.0 An approval notice has been sent.
3.0 I was notified by Mr. Id, the agent of the oth...
4.0 Please call rito.
df:东风:
0 tag
0 ==============================================... 1
1 Title: Whole case 1
2 Location: oyuri 1
3 From: Aki 1
4 Date: 2018/11/30 (Friday) 11:55:29 1
5 ==============================================... 2
6 1: Aki 2
7 2018/12/05 (Wed) 17:33:17 2
8 An approval notice has been sent. 2
9 ----------------------------------------------... 3
10 2: Aki 3
11 2018/12/06 (Thursday) 17:14:30 3
12 I was notified by Mr. Id, the agent of the oth... 3
13 ----------------------------------------------... 4
14 3: kano, etc. 4
15 2018/12/07 (Friday) 11:44:45 4
16 Please call rito. 4
17 ----------------------------------------------... 5
you can continue handle the name:您可以继续处理名称:
obj = df_output['name'].str.strip().str.split(':\s*')
df_output['name'] = obj.str[-1]
df_output['idx'] = obj.str[0]
df_output = df_output.set_index('idx')
name date \
idx
1 Aki 2018/12/05 (Wed) 17:33:17
2 Aki 2018/12/06 (Thursday) 17:14:30
3 kano, etc. 2018/12/07 (Friday) 11:44:45
Description
idx
1 An approval notice has been sent.
2 I was notified by Mr. Id, the agent of the oth...
3 Please call rito.
add more header columns:添加更多 header 列:
cond = (df['tag'] == 1) & (df[0].str.contains(':'))
header_dict = dict(df.loc[cond, 0].str.split(': ', n=1).values)
# {'Title': 'Whole case',
# 'Location': 'oyuri',
# 'From': 'Aki ',
# 'Date': '2018/11/30 (Friday) 11:55:29'}
for k,v in header_dict.items():
df_output[k] = v
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.