[英]Skipping certain characters in a CSV file
我正在编写一个脚本来解析技术类别下列出的每个公司的纳斯达克文件。 这是CSV,以逗号分隔。 但是,有时一家公司的名称被列为XXX,Inc。逗号将脚本中的列表弄乱了,因此它得到了错误的值。 我正在解析公司股票代码,因此“,Inc.” 会弄乱地方。
我对Python还是很陌生,所以我对它没有太多的经验,但是我一直在尽力而为,并且已经使它能够读写CSV,但是这个解析问题对我来说很难。 这是我目前拥有的:
try:
# py3
from urllib.request import Request, urlopen
from urllib.parse import urlencode
except ImportError:
# py2
from urllib2 import Request, urlopen
from urllib import urlencode
import csv
import urllib.request
import string
def _request():
url = 'http://www.nasdaq.com/screening/companies-by-industry.aspx?industry=Technology&render=download'
req = Request(url)
resp = urlopen(req)
content = resp.read().decode().strip()
content1 = content.replace('"', '')
return content1
def symbol_quote():
counter = 1
recursive = 9*counter
values = _request().split(',')
values2 = values[recursive]
return values2
counter += 1
def csvwrite():
import csv
path = "symbol_comp.csv"
data = [symbol_quote()]
parsing = False
with open(path, 'w', newline='') as csv_file:
writer = csv.writer(csv_file, delimiter=' ')
for line in data:
writer.writerow(line)
我还没有做到这一点,所以它循环并根据一个计数器执行操作,因为现在没有意义。 这个解析问题更加紧迫。
谁能帮一个新手吗?
更改_request()
使用csv.reader()
与cStringIO.StringIO()
并返回一个csv.reader
对象,您可以遍历:
try:
# py3
from urllib.request import Request, urlopen
from urllib.parse import urlencode
except ImportError:
# py2
from urllib2 import Request, urlopen
from urllib import urlencode
import csv, cStringIO
##import urllib.request
import string
def _request():
url = 'http://www.nasdaq.com/screening/companies-by-industry.aspx?industry=Technology&render=download'
req = Request(url)
resp = urlopen(req)
sio = cStringIO.StringIO(resp.read().decode().strip())
reader = csv.reader(sio)
return reader
用法:
data = _request()
print 'fields:\n{}\n'.format('|'.join(data.next()))
for n, row in enumerate(data):
print '|'.join(row)
if n == 5: break
# fields:
# Symbol|Name|LastSale|MarketCap|ADR TSO|IPOyear|Sector|Industry|Summary Quote|
#
# VNET|21Vianet Group, Inc.|25.87|1137471769.46|43968758|2011|Technology|Computer Software: Programming, Data Processing|http://www.nasdaq.com/symbol/vnet|
# TWOU|2U, Inc.|13.28|534023394.4|n/a|2014|Technology|Computer Software: Prepackaged Software|http://www.nasdaq.com/symbol/twou|
# DDD|3D Systems Corporation|54.4|5630941606.4|n/a|n/a|Technology|Computer Software: Prepackaged Software|http://www.nasdaq.com/symbol/ddd|
# JOBS|51job, Inc.|64.32|746633699.52|11608111|2004|Technology|Diversified Commercial Services|http://www.nasdaq.com/symbol/jobs|
# WUBA|58.com Inc.|37.25|2959078388.5|n/a|2013|Technology|Computer Software: Programming, Data Processing|http://www.nasdaq.com/symbol/wuba|
# ATEN|A10 Networks, Inc.|10.64|638979699.12|n/a|2014|Technology|Computer Communications Equipment|http://www.nasdaq.com/symbol/aten|
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.