简体   繁体   English

将.txt文件(数据提要)转换为.csv文件

[英]Convert .txt file (data feed) to .csv file

Basically the original data has no headers but only value (but i have header list). 基本上,原始数据没有标题,只有值(但我有标题列表)。 The delimiter is '|'. 分隔符为“ |”。 Now what i try to do is to convert txt file to csv file by using. 现在,我尝试执行的操作是将txt文件转换为csv文件。 The csv file contains headers i have and corresponding values. csv文件包含我具有的标头和相应的值。

For example: 例如:

txt file looks like: txt文件如下所示:

sadasd|dsdads|adsasd sadasd | dsdads | adsasd

value 1|value 2|value 3|value 4| 值1 |值2 |值3 |值4 | value 5| 值5 | value 100|value 101|value 102|value 103|value 104|value 105 value 200|value 201|value 202|value 203|value 204|value 205 值100 |值101 |值102 |值103 |值104 |值105值200 |值201 |值202 |值203 |值204 |值205

sdasd|dsa|dsdad sdasd | dsa | dsdad

and after converting .csv file will look like : 转换后的.csv文件如下所示:

header 1,header 2, header 3, header 4, header 5, 标头1,标头2,标头3,标头4,标头5,

value 1,value 2,value 3,value 4,value 5, 值1,值2,值3,值4,值5,

value 100,value 101,value 102,value 103,value 104,value 105 值100,值101,值102,值103,值104,值105

value 200,value 201,value 202,value 203,value 204,value 205 值200,值201,值202,值203,值204,值205

I just start to learn python and what my idea is: 我刚刚开始学习python,我的想法是:

  • delete first and last line. 删除第一行和最后一行。

  • use dictionary list: every column is a list with key (header i have). 使用字典列表:每一列都是带有键的列表(我有标题)。 to dataframe 到数据框

  • convert to .csv 转换为.csv

so it looks like {'header 1': [value 1, value 100, value 200],'header 2': [value 2, value 101, value 201]. 因此看起来像{'标题1':[值1,值100,值200],'标题2':[值2,值101,值201]。 and then convert to .csv. 然后转换为.csv。

That's just my thought, or you have the easiest way but only using python. 那只是我的想法,或者您有最简单的方法,但仅使用python。

Using csv module 使用csv模块

Ex: 例如:

import csv
with open(filename, "r") as infile:
    data = []
    for i in infile.readlines()[1:-1]:                   #Strip first and last line. 
        if i.strip():
            data.extend(i.strip().split("|"))
data = [data[i:i+5] for i in range(0, len(data), 5)]     #Split list to sub-list of 5 elements
print(data)


header = ["header 1","header 2", "header 3", "header 4", "header 5"]
with open(outfile, "w") as outfile:                     #Output CSV file
    writer = csv.writer(outfile, delimiter=",")
    writer.writerow(header)                             #Write Header
    writer.writerows(data)                              #Write content.

Stitching up from parts in stackoverflow yields the following solution 从stackoverflow中的零件拼接起来产生以下解决方案

import pandas as pd

mycolnames = ['col1','col2','col3','col4','col5']

# Use the sep argument to change your delimiter accordingly
df = pd.read_csv("foo.txt", sep="|")

# Set your column names to the data frame
df.columns = mycolnames

# Write your desired columns to csv
df['col1'].to_csv("bar.csv", sep=",")

Credits 学分

@atomh33ls - How to read csv into record array in numpy? @ atomh33ls- 如何在numpy中将csv读取到记录数组中?

@LangeHaare - set column names in pandas data frame from_dict with orient = 'index' @ LangeHaare-使用Orient ='index'在熊猫数据框from_dict中设置列名称

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM