简体   繁体   English

在 Python 中使用 CSV 模块删除列

[英]Deleting Columns Using CSV module in Python

I realize this has been asked (many) times before, but I've been trying different solutions and none of them are working for me - clearly I'm doing something wrong, but I'm not sure what.我意识到这之前(很多次)被问过,但我一直在尝试不同的解决方案,但没有一个对我有用 - 显然我做错了什么,但我不确定是什么。

We're learning how to scrub data in Python, so what I'm trying to do is take a text file (that has been converted to a text file from excel) as input and output my scrubbed data.我们正在学习如何在 Python 中清理数据,所以我要做的是将文本文件(已从 excel 转换为文本文件)作为输入并输出我清理的数据。 Data is a mix of text and numbers and each cell either has text or numbers but not both.数据是文本和数字的混合体,每个单元格要么包含文本,要么包含数字,但不能同时包含两者。 I'm trying to delete certain columns, and I can't figure out how to.我正在尝试删除某些列,但我不知道如何删除。 I would really appreciate it if I could get answers just using the csv package (or no package at all) - I know pandas is supposed to be helpful, but I'm trying to go by what we're using in class.如果我只使用 csv 包(或根本没有包)就可以得到答案,我将非常感激——我知道 Pandas 应该会有所帮助,但我正在尝试我们在课堂上使用的内容。

This is the code I currently have right now;这是我目前拥有的代码; when I run it, I just get a blank excel sheet as my output.当我运行它时,我只得到一个空白的 excel 表作为我的输出。

import csv

def airbnb_csv():

    source = '/Users/(myname)/Desktop/airbnb.txt'
    target = 'scrubbed_airbnb2.csv'

    with open(source,'r') as fp_in:
        reader = csv.reader(fp_in, delimiter=',')
        with open(target,'w') as fp_out:
            writer = csv.writer(fp_out, delimiter=',')
            for r in reader:
                writer.writerow((r[2], r[3], r[5], r[7], r[8], r[9], 
                r[10], r[11], r[13]))

I have other code that did get me a filled in excel sheet as output.我有其他代码确实让我填写了 Excel 工作表作为输出。 This was my original code, but it got weird fast.这是我的原始代码,但它变得很奇怪。

for row in fp_in:
     if (row[:5].isdigit()):
         v = row.split()
         v = v[:9]
         writer.writerow(v)
         if row.startswith("room_id") and not header_written:
            header_written = True
            v = row.split()

Thank you so much for any and all help/advice/hints you can give me!非常感谢您可以给我的任何和所有帮助/建议/提示! (no need to correct my code if it's too messy to deal with, but just wanted to add it so I didn't look like I was trying to get my homework done for me) (如果我的代码太乱而无法处理,则无需更正,但只是想添加它,这样我看起来不像是在为我完成作业)

So, here's some directions.所以,这里有一些方向。

First, not directly regarding your question, context managers can be chained, ie:首先,不是直接针对您的问题,可以链接上下文管理器,即:

with open('input') as inp, open('output') as out:
    do_your_stuff

This will save your from some indentation pain.这将使您免于压痕疼痛。

More on question: there's no way to "delete" columns for most modern filesystems, so you need to read, process, write to other file (being honest, there's ways around, but that requires way more work).更多的问题:有没有办法“删除”列最先进的文件系统,所以你需要读取,处理,写入到其他文件(诚实,有没有解决的办法,但需要更多的方式工作)。 In your case, processing would be selecting what to write (or what to skip).在您的情况下,处理将选择要写的内容(或要跳过的内容)。 The best way to do this, while keeping your code readable and maintainable is to useDictReader andDictWriter .在保持代码可读性和可维护性的同时,最好的方法是使用DictReaderDictWriter Once you reading/writing csv rows by named entities, everything becomes easy:一旦您按命名实体读取/写入 csv 行,一切都变得简单:

fields_needed = ['price', 'rooms']
with open(source) as fp_in, open(target, 'w') as fp_out:
    reader = csv.DictReader(fp_in)
    writer = csv.DictWriter(fp_out, fieldnames=fields_needed, extrasaction='ignore')

    writer.writeheader()
    for r in reader:
        writer.writerow(r)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM