简体   繁体   English

Python CSV文件操作

[英]Python CSV file manipulation

Here's a small excerpt of a CSV file that I'm trying to manipulate, each line in the CSV is a string: 这是我尝试处理的CSV文件的一小段摘录,CSV中的每一行都是一个字符串:

"address,bathrooms,bedrooms,built,lot,saledate,sale price,squarefeet"
"1116 Fountain St, Ann Arbor, MI Real Estate",2,4,1949,0.62 ac,20140905,469900,"1,910"
"3277 Chamberlain Cir, Ann Arbor, MI Real Estate",3,3,2002,0.32 ac,20140905,315000,"1,401"
"2889 Walnut Ridge Dr, Ann Arbor, MI Real Estate",4,4,2005,0.50 ac,20140904,790000,"3,972"
"1336 Nottington Ct, Ann Arbor, MI Real Estate",3,3,2002,,20140904,332350,"1,521"
"344 Sedgewood Ln # 14, Ann Arbor, MI Real Estate",,,,"6,534",20140904,345000,
"545 Allison Dr, Ann Arbor, MI Real Estate",2,2,,0.29 ac,20140904,159900,"1,400"

I would like to make each line a list, separated like so: 我想使每一行成为一个列表,像这样分开:

["1116 Fountain St, Ann Arbor, MI Real Estate", 2, 4, 1949, 0.62 , 20140905, 469900, 1910] [ “1116泉街,密歇根州安阿伯房地产”,2,4,1949年,0.62,20140905,469900,1910]

I would like for the first item (address) to be a string and the rest to be ints and floats. 我希望第一项(地址)是一个字符串,其余的是整数和浮点数。 The reason why I bolded the 0.62 is because I want to be able to replace 0.62ac with 0.62. 之所以加粗0.62,是因为我希望能够用0.62代替0.62ac。 I tried splitting each line, but doing line.split(',') won't work because the address contains two commas in it, and I'd be splitting that as well. 我尝试拆分每行,但是执行line.split(',')无效,因为该地址中包含两个逗号,我也将其拆分。 Is there a simpler way to do this? 有没有更简单的方法可以做到这一点?

I'd appreciate any suggestions. 我将不胜感激任何建议。

Thanks. 谢谢。

First of all, use the csv module . 首先,使用csv模块 It will handle the quoted fields for you and won't break the field up if it contains embedded commas. 它将为您处理引用的字段,并且如果包含嵌入式逗号,则不会拆分该字段。

import csv

with open('input.csv') as f:
    reader = csv.reader(f)
    next(reader)   # thow away the header
    for row in reader:
        print row

Produces 产生

['1116 Fountain St, Ann Arbor, MI Real Estate', '2', '4', '1949', '0.62 ac', '20140905', '469900', '1,910']
['3277 Chamberlain Cir, Ann Arbor, MI Real Estate', '3', '3', '2002', '0.32 ac', '20140905', '315000', '1,401']
['2889 Walnut Ridge Dr, Ann Arbor, MI Real Estate', '4', '4', '2005', '0.50 ac', '20140904', '790000', '3,972']
['1336 Nottington Ct, Ann Arbor, MI Real Estate', '3', '3', '2002', '', '20140904', '332350', '1,521']
['344 Sedgewood Ln # 14, Ann Arbor, MI Real Estate', '', '', '', '6,534', '20140904', '345000', '']
['545 Allison Dr, Ann Arbor, MI Real Estate', '2', '2', '', '0.29 ac', '20140904', '159900', '1,400']

So you can see that the CSV reader handles the fields properly. 因此,您可以看到CSV阅读器正确处理了这些字段。 Next you need to convert the fields to ints and floats as appropriate. 接下来,您需要根据需要将字段转换为int和float。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM