简体   繁体   English

我如何处理来自其中一个字段中包含逗号的 csv 文件的数据?

[英]How do i handle data from a csv file that has a comma in one of the fields?

commas within data in a csv file csv文件中数据中的逗号

this is the data set download im working with from WHO official website这是我从世卫组织官方网站下载的数据集

this is my functional code, but some of the record contains a comma, it is contained by having quotation marks at the start and end of the field but i dont know how keep it as one string before it goes into the parallel arrays, i cant just import csv because its for my sqa higher (age 15-16 for those who arnt from scotland) computing course and i have to do it manually这是我的功能代码,但有些记录包含一个逗号,它包含在字段的开头和结尾处有引号,但我不知道如何在它进入并行数组之前将其保留为一个字符串,我不能只需导入 csv,因为它适用于我的 sqa 更高(对于那些来自苏格兰的 15-16 岁的人)计算课程,我必须手动完成

one of the countries names have a comma in it其中一个国家名称中有一个逗号

2020-09-30,BQ,"Bonaire, Sint Eustatius and Saba",AMRO,9,115,0,1 2020-09-30,BQ,“博内尔、圣尤斯特歇斯和萨巴”,AMRO,9,115,0,1

("Bonaire, Sint Eustatius and Saba") (“博内尔岛、圣尤斯特歇斯和萨巴岛”)

where as the rest dont and nor quote marks其余的不和也不引号

2020-09-08,AF,Afghanistan,EMRO,96,38494,3,1415 2020-09-08,AF,阿富汗,EMRO,96,38494,3,1415

(Afghanistan) (阿富汗)

and basically it splits where i dont want it to and im left with 9 fields when there sould only be 8基本上它在我不想要的地方分裂,当只有 8 个字段时,我留下了 9 个字段

def get_data():

    country_code, country_code, country, who_region, new_cases, cumulative_cases, new_deaths, cumulative_deaths = [], [], [], [], [] , [], [], []
    with open("data\csv\\todays_data.csv") as f:
        next(f)
        t=0
        for line in f:
            field = line.split(",")
            country_code.append(field[0])
            country_code.append(field[1])
            country.append(field[2])
            who_region.append(field[3])
            cumulative_cases.append(int(field[5]))
            new_deaths.append(int(field[6]))
            cumulative_deaths.append(int(field[7].strip("\n")))
    print("data successfully read")
    return country_code, country_code, country, who_region, new_cases, cumulative_cases, new_deaths, cumulative_deaths

Code here for your example only, at most only one special quotation string.此处的代码仅作为示例,最多只有一个特殊的引号字符串。

from itertools import repeat

lines = [
    '2020-09-30,BQ,"Bonaire, Sint Eustatius and Saba",AMRO,9,115,0,1',
    '2020-09-08,AF,Afghanistan,EMRO,96,38494,3,1415',
]

magic = 'a0b1c2d3e4'  # magic string won't be found in your csv file
result = []
for line in lines:
    if '"' in line:
        temp = line.split('"')
        temp[1] = temp[1].replace(',', magic)
        line = '"'.join(temp)
    result.append(list(map(str.replace, line.split(','), repeat(magic), repeat(','))))
>>> result
[['2020-09-30', 'BQ', '"Bonaire, Sint Eustatius and Saba"', 'AMRO', '9', '115', '0', '1'],
 ['2020-09-08', 'AF', 'Afghanistan', 'EMRO', '96', '38494', '3', '1415']]

Given that this is an assignment for a computing course, it's likely intending for you to build a simple parser.鉴于这是计算课程的作业,您可能打算构建一个简单的解析器。 The pseudo-code would be something like this:伪代码将是这样的:

data = []
for char in file:
  record = []
  is_in_quotes = false
  cell = ''

  switch char:
    case ',':
      if is_in_quotes:
        # If we are inside a quoted string, treat the comma as part of the cell's value
        cell.append(char)
      else:
        # Otherwise, it denotes the end of a cell
        record.append(cell)
        cell = ''
    case '"':
      # We assume that
      # - there are only ever two double quotes within a cell
      # - the double quotes only appear right after the leading command and right before the trailing comma
      # Because of that, we can simply toggle the state
      is_in_quotes = !is_in_quotes
    case '\n':
      # At line end, we first add the last cell's value to the record
      record.append(cell)
      cell = ''
      # Then we add the record to the data set
      data.append(record)
      record = []
    default:
      # Any other character is simply treated as part of the cell's value
      cell.append(char)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在python的csv文件中处理json数据? - how do i handle a json data in csv file in python? 如何用带有逗号分隔符和空格的pandas解析csv? - How do I parse a csv with pandas that has a comma delimiter and space? 如何从python中找到在csv文件中输入的数据的平均值? - How do I find the mean of data that has been entered in a csv file from python? 如何在csv文件中的行中打印特定字段,以及如何将输入内容写入csv文件? - How do I print specific fields from rows in a csv file and how to write input to a csv file? 我希望重新组织 CSV 文件中的数据。 CSV 文件具有多行字段,数据如下所示: - I am looking to reorganize the data in CSV file. CSV file has multiline fields and this is how data looks like: 如何从CSV文件中分离逗号分隔的数据? - How to separate comma separated data from csv file? 如何将带引号的csv字段拆分为两个字段? - How do I split a csv field that has quotations into two fields? 如何快速从大量的csv文件提取数据? - How do I quickly extract data from this massive csv file? 如何在python中以数字方式对csv文件中的数据进行排序 - how do i sort data from a csv file numerically in python 如何使用2个条件限制从csv文件中选择的数据? - How do I restrict data selected from a csv file with 2 criteria?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM