简体   繁体   English

分为两列,然后将txt文本转换为csv文件

[英]Split into two columns and convert txt text into a csv file

I have the following data: 我有以下数据:

Graudo. A selection of Pouteria caimito, a minor member...

TtuNextrecod. A selection of Pouteria caimito, a minor member of the Sapotaceae...

I want to split it into two columns 我想将其分为两列

Column1       Column2
------------------------------------------------------------------------------
Graudo        A selection of Pouteria caimito, a minor member...
TtuNextrecod  A selection of Pouteria caimito, a minor member of the Sapotaceae...

Need help with the code. 需要帮助的代码。 Thanks, 谢谢,

import csv # convert
import itertools #function for a efficient looping

with open('Abiutxt.txt', 'r') as in_file:
    lines = in_file.read().splitlines() #returns a list with all the lines in string, including the line breaks

    test = [line.split('. ')for line in lines ] #split period....but...need work

    print(test)


    stripped = [line.replace('', '').split('. ')for line in lines ]

    grouped = itertools.izip(*[stripped]*1)
    with open('logtestAbiutxt.csv', 'w') as out_file:
        writer = csv.writer(out_file)
        writer.writerow(('Column1', 'Column2'))

        for group in grouped:
            writer.writerows(group)

I am not sure you need zipping here at all. 我不确定您是否完全需要在此处压缩。 Simply iterate over every line of the input file, skip empty lines, split by the period and write to the csv file: 只需遍历输入文件的每一行,跳过空行,按句点分隔,然后写入csv文件:

import csv


with open('Abiutxt.txt', 'r') as in_file:
    with open('logtestAbiutxt.csv', 'w') as out_file:
        writer = csv.writer(out_file, delimiter="\t")
        writer.writerow(['Column1', 'Column2'])

        for line in in_file:
            if not line.strip():
                continue

            writer.writerow(line.strip().split(". ", 1))

Notes: 笔记:

  • Note: specified a tab as a delimiter, but you could change it appropriately 注意:指定了一个制表符作为分隔符,但是您可以适当地对其进行更改
  • thanks to @PatrickHaugh for the idea to split by the first occurence of ". " only as your second column may contain periods as well. 感谢@PatrickHaugh的想法,因为它的第一个出现的地方也可能包含句点,所以可以按第一次出现的". "来拆分。

This should get you what you want. 这应该给您您想要的。 This will handle all the escaping. 这将处理所有转义。

import csv
with open('Abiutxt.txt', 'r') as in_file:
    x = in_file.read().splitlines()
    x = [line.split('. ', 1) for line in x if line]
with open('logtestAbiutxt.csv', "w") as output:
    writer = csv.writer(output, lineterminator='\n')
    writer.writerow(['Column1', 'Column2']) 
    writer.writerows(x)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM