简体   繁体   English

从.txt到.csv中选择数据列

[英]Select columns of data from .txt to .csv

I am quite new to python (well more like I've only been using it for the past week). 我对python很新(更像是我过去一周只使用过它)。 My task seems fairly simple, yet I am struggling. 我的任务看起来很简单,但我很挣扎。 I have several large text files each with many columns of data in them from different regions. 我有几个大文本文件,每个文件都包含来自不同地区的许多数据列。 I would like to take the data from one text file and extract only the columns of data that I need and write it into a new .csv file. 我想从一个文本文件中获取数据,并仅提取我需要的数据列,并将其写入新的.csv文件。 Currently they are tab delimitated but I would like the output to be comma delimitated. 目前它们是制表符分隔符,但我希望输出以逗号分隔。

I have: 我有:

#YY  MM DD hh mm WVHT  SwH  SwP  WWH  WWP SwD WWD   MWD
#yr  mo dy hr mn    m    m  sec    m  sec  -  degT  degT
2010 07 16 17 00  0.5  0.5  5.0  0.3  4.0 SSE SSE   163
2010 07 16 16 00  0.6  0.5  5.9  0.3  3.8 SSE SSE   165
2010 07 16 15 00  0.5  0.5  6.7  0.3  3.6 SSE  SW   151
2010 07 16 14 00  0.6  0.5  5.6  0.3  3.8 SSE SSE   153

I only want to keep: DD, WVHT, and MWD 我只想保留:DD,WVHT和MWD

Thanks in advance, Harper 先谢谢你,哈珀

You need to format this question a little more legibly. 你需要更清晰地格式化这个问题。 :) :)

Take a look at the python csv module for writing your csv files from your now-stored data: http://docs.python.org/library/csv.html 看一下python csv模块,用于从您现在存储的数据中编写csv文件: http//docs.python.org/library/csv.html

EDIT: Here's some better, more concise code, based on comments + csv module: 编辑:这是一些更好,更简洁的代码,基于评论+ csv模块:

import csv

csv_out = csv.writer(open('out.csv', 'w'), delimiter=',')

f = open('myfile.txt')
for line in f:
  vals = line.split('\t')
  # DD, WVHT, MWD
  csv_out.writerow(vals[2], vals[5], vals[12])
f.close()

One easy way to achieve this is by using the csv module in the standard library. 实现此目的的一种简单方法是在标准库中使用csv模块。

First, create a CSVReader and a CSVWriter object: 首先,创建CSVReader和CSVWriter对象:

>>> import csv
>>> csv_in = csv.reader(open('eggs.txt', 'rb'), delimiter='\t')
>>> csv_out = csv.writer(open('spam.csv', 'w'), delimiter=',')

Then just put the information you want into the new csv file. 然后将您想要的信息放入新的csv文件中。

>>> for line in csv_in:
...    csv_out.writerow(line[2], line[5], line[-1])

One of the problems appears to be that all of your data is on a single line: 其中一个问题似乎是您的所有数据都在一行上:

2010 07 16 17 00 0.5 0.5 5.0 0.3 4.0 SSE SSE 163 2010 07 16 16 00 0.6 0.5 5.9 0.3 3.8 SSE SSE 165 2010 07 16 15 00 0.5 0.5 6.7 0.3 3.6 SSE SW 151 2010 07 16 14 00 0.6 0.5 5.6 0.3 3.8 SSE SSE 153 2010 07 16 17 00 0.5 0.5 5.0 0.3 4.0 SSE SSE 163 2010 07 16 16 00 0.6 0.5 5.9 0.3 3.8 SSE SSE 165 2010 07 16 15 00 0.5 0.5 6.7 0.3 3.6 SSE SW 151 2010 07 16 14 00 0.6 0.5 5.6 0.3 3.8 SSE SSE 153

If this is the case, you will need to split the input line up. 如果是这种情况,则需要将输入线分开。 If you know your data are regular, then you could be sneaky and split on the 2010: 如果你知道你的数据是正常的,那么你可能会偷偷摸摸地分开2010年:

f = open('data.txt')
for line in f:
    for portion in line.split(' 2010') #space is significant
    # write to csv

If your data span multiple years, then Python itertools module can be very handy. 如果您的数据跨越多年,那么Python itertools模块可以非常方便。 I often find myself using the grouper recipe. 我经常发现自己使用grouper配方。

import csv
from itertools import izip_longest

csv_writer = csv.writer(open('eggs.csv', 'wb'), delimiter=',')

def grouper(n, iterable, fillvalue=None):
  """
  >>> grouper(3, 'ABCDEFG', 'x')
  ['ABC', 'DEF', 'Gxx']
  """
  args = [iter(iterable)] * n
  return izip_longest(fillvalue=fillvalue, *args)

f = open('spam.txt')
for line in grouper(22, f.split('\t')): 
    csv_writer.writerow(line[2], line[12])

Here is a basic thing since it is a basic need and since there is no extensive use of csv, here's a snippet without the csv module. 这是一个基本的东西,因为它是一个基本需求,因为没有广泛使用csv,这里是一个没有csv模块的片段。

DD = 2
WVHT = 5
MWD = 12
INPUT = "input.txt"
OUTPUT = "output.csv"

from os import linesep

def main():
    t = []
    fi = open(INPUT)
    fo = open(OUTPUT, "w")
    try:
        for line in fi.xreadlines():
            line = line.split()
            t.append("%s,%s,%s" %(line[DD], line[WVHT], line[MWD]))
        fo.writelines(linesep.join(t))
    finally:
        fi.close()
        fo.close()

if __name__ == "__main__":
    main()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM