简体   繁体   中英

Python 3 CSV writer splitting lines which contain commas

I want to pull a csv for the below url. There is a column where some of the value contain text with commas in them which is causing issues. For example in the columns below the last 2 items should be a single column but are being split

"""SL""","""2019-09-29""","""88.6""","""-0.6986""","""5.8034""","""Josh Phegley""",572033,542914,"""field_out""","""hit_into_play_score""",,,,,14,"""Josh Phegley grounds out"," second baseman Donnie Walton to first baseman Austin Nola. Sean Murphy scores. """

My code is as follows

import requests
import csv

file_name = 'test.csv'

url = 'https://baseballsavant.mlb.com/statcast_search/csv?all=true&hfPT=&hfAB=&hfBBT=&hfPR=&hfZ=&stadium=&hfBBL=&hfNewZones=&hfGT=R%7C&hfC=&hfSea=2019%7C&hfSit=&player_type=&hfOuts=&opponent=&pitcher_throws=&batter_stands=&hfSA=&game_date_gt=&game_date_lt=&team=OAK&position=&hfRO=&home_road=&hfFlag=&metric_1=&hfInn=&min_pitches=0&min_results=0&group_by=name&sort_col=pitches&player_event_sort=h_launch_speed&sort_order=desc&min_abs=0&type=details&'

req = requests.get(url)

with open(file_name, 'w') as f:
    writer = csv.writer(f, quotechar = '"')
    for line in raw_data.iter_lines():
        writer.writerow(line.decode('utf-8').split(','))

I've tried removing split(',') , but this just results in each character being separated by a comma. I've tried various combinations of quotechar , quoting , and escapechar for the writed but no luck. Is there a way of ignoring columns if they appear within quotes?

Your incoming data is already CSV; you shouldn't be using the csv module to write it (unless you need to change the dialect for some reason, but even then, you'd need to read it with the csv module in the original dialect, then write it in the new dialect).

Just do:

# newline='' preserves original line endings to avoid messing with existing dialect
with open(file_name, 'w', newline='') as f:
    f.writelines(line.decode('utf-8') for line in raw_data.iter_lines())

to perform the minimal decode to UTF-8 and otherwise dump the data raw. If your locale encoding is UTF-8 anyway (or you want to write as UTF-8 regardless of locale), you can simplify further by dumping the raw bytes:

# newline='' not needed for binary mode, which doesn't translate line endings anyway
with open(file_name, 'wb') as f:
    f.writelines(raw_data.iter_lines())

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM