简体   繁体   English

如何在不使用外部库(例如 Numpy、Pandas)的情况下读取 CSV 文件?

[英]How to read a CSV file without using external libraries (such as Numpy, Pandas)?

This is a question that usually appears in interviews.这是面试中经常出现的问题。

I know how to read csv files using Pandas .我知道如何使用Pandas读取 csv 文件。

However I am struggling to find a way to read files without using external libraries.但是,我正在努力寻找一种无需使用外部库即可读取文件的方法。

Does Python come with any module that would help read csv files? Python 是否带有任何有助于读取 csv 文件的模块?

You most likely will need a library to read a CSV file.您很可能需要一个库来读取 CSV 文件。 While you could potentially open and parse the data yourself, this would be tedious and time consuming.虽然您可以自己打开和解析数据,但这将是乏味和耗时的。 Luckily python comes with a standard csv module that you won't have to pip install!幸运的是,python 附带了一个标准的csv模块,您无需 pip 安装! You can read your file in like this:你可以像这样阅读你的文件:

import csv

with open('file.csv', 'r') as file:
    my_reader = csv.reader(file, delimiter=',')
    for row in my_reader:
        print(row)

This will show you that each row is being read in as a list.这将向您显示每一row作为列表被读入。 You can then process it based on index!然后您可以根据索引对其进行处理! There are other ways to read in data too as described at https://docs.python.org/3/library/csv.html one of which will create a dictionary instead of a list!还有其他读取数据的方法,如https://docs.python.org/3/library/csv.html 中所述,其中一种方法将创建字典而不是列表!

update更新

You linked your github for the project I took the snip你为我截取的项目链接了你的 github

product_id,product_name,aisle_id,department_id
9327,Garlic Powder,104,13
17461,Air Chilled Organic Boneless Skinless Chicken Breasts,35,12
17668,Unsweetened Chocolate Almond Breeze Almond Milk,91,16
28985,Michigan Organic Kale,83,4
32665,Organic Ezekiel 49 Bread Cinnamon Raisin,112,3
33120,Organic Egg Whites,86,16
45918,Coconut Butter,19,13
46667,Organic Ginger Root,83,4
46842,Plain Pre-Sliced Bagels,93,3

Saved it as file.csv and ran it with the above code I posted.将其保存为file.csv并使用我发布的上述代码运行它。 Result:结果:

['product_id', 'product_name', 'aisle_id', 'department_id']
['9327', 'Garlic Powder', '104', '13']
['17461', 'Air Chilled Organic Boneless Skinless Chicken Breasts', '35', '12']
['17668', 'Unsweetened Chocolate Almond Breeze Almond Milk', '91', '16']
['28985', 'Michigan Organic Kale', '83', '4']
['32665', 'Organic Ezekiel 49 Bread Cinnamon Raisin', '112', '3']
['33120', 'Organic Egg Whites', '86', '16']
['45918', 'Coconut Butter', '19', '13']
['46667', 'Organic Ginger Root', '83', '4']
['46842', 'Plain Pre-Sliced Bagels', '93', '3']

This does what you have asked in your question.这可以满足您在问题中的要求。 I am not going to do your project for you, you should be able to work it from here.我不会为你做你的项目,你应该可以从这里开始工作。

Had a similar requirement and came up with this solution;有类似的需求并提出了这个解决方案; a function that converts csv to json (needed json for readability and to make querying the data easier without having access to Pandas).将 csv 转换为 json 的函数(需要 json 以提高可读性并使查询数据更容易,而无需访问 Pandas)。 If the headers arguement of the function is True , the first row of the csv is used keys in the json, otherwise value indices are used as keys.如果函数的headers参数为True ,则 csv 的第一行用作 json 中的键,否则使用值索引作为键。

from csv import reader as csv_reader

def csv_to_json(csv_path: str, headers: bool) -> list:
  '''Convert data from a csv to json'''
  # store json data
  json_data = []
  
  try:
    with open(csv_path, 'r') as file:
      reader = csv_reader(file)
      # set column names using first row
      if headers:
        columns = next(reader)
      
      # convert csv to json
      for row in reader:
        row_data = {}
        for i in range(len(row)):
          # set key names
          if headers:
            row_key = columns[i].lower()
          else: 
            row_key = i
          # set key/value
          row_data[row_key] = row[i]
        # add data to json store 
        json_data.append(row_data)
        
  # error handling
  except Exception as e:
    print(repr(e))
    
  return json_data

Given a csv containing the following给定一个包含以下内容的 csv

+------+-------+------+
| Year | Month | Week |
+------+-------+------+
| 2020 |    11 |   11 |
| 2020 |    12 |   12 |
+------+-------+------+

The output with headers is带标题的输出是

[
  {"year": 2020, "month": 11, "week": 11},
  {"year": 2020, "month": 12, "week": 12}
]

The ouput without headers is没有标题的输出是

[
  {"0": 2020, "1": 11, "2": 11},
  {"0": 2020, "1": 12, "2": 12}
]

When one's production environment is limited by memory, being able to read and manage data without importing additional libraries may be helpful.当生产环境受内存限制时,能够在不导入额外库的情况下读取和管理数据可能会有所帮助。

In order to achieve that, the built in csv module does the work.为了实现这一点,内置的csv模块完成了这项工作。

import csv

There are at least two ways one might do that: using csv.Reader() or using csv.DictReader() .至少有两种方法可以做到这一点:使用csv.Reader()或使用csv.DictReader()

csv.Reader() allows you to access CSV data using indexes and is ideal for simple CSV files (Source ). csv.Reader()允许您使用索引访问 CSV 数据,非常适合简单的 CSV 文件(Source )。

csv.DictReader() on the other hand is friendlier and easy to use, especially when working with large CSV files (Source ).另一方面, csv.DictReader()更友好且易于使用,尤其是在处理大型 CSV 文件(Source )时。

Here's how to do it with csv.Reader()以下是如何使用csv.Reader()

>>> import csv
>>> with open('eggs.csv', newline='') as csvfile:
...     spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
...     for row in spamreader:
...         print(', '.join(row))
Spam, Spam, Spam, Spam, Spam, Baked Beans
Spam, Lovely Spam, Wonderful Spam

Here's how to do it with csv.DictReader()以下是如何使用csv.DictReader()

>>> import csv
>>> with open('names.csv', newline='') as csvfile:
...     reader = csv.DictReader(csvfile)
...     for row in reader:
...         print(row['first_name'], row['last_name'])
...
Eric Idle
John Cleese

>>> print(row)
{'first_name': 'John', 'last_name': 'Cleese'}

For another example,check Real Python's page here .再举一个例子,在这里查看 Real Python 的页面

Recently I got a very similar question that was made more complicated than this one on making a data structure without using pandas.最近我遇到了一个非常相似的问题,它比这个关于在不使用 Pandas 的情况下制作数据结构的问题更复杂。 This is the only relevant question I have found so far.这是迄今为止我发现的唯一相关问题。 If I take this question, then what I was asked was: put the product id as keys to a dictionary and then put list of tuples of aisle and department ids as values (in python).如果我提出这个问题,那么我被问到的是:将产品 ID 作为字典的键,然后将过道和部门 ID 的元组列表作为值(在 python 中)。 The dictionary is the required dataframe.字典是所需的数据框。 Of course I could not do it in 15 min (rather in 2 hours).当然,我无法在 15 分钟(而不是 2 小时)内完成。 It is hard for me to think of outside of numpy and pandas.除了numpy和pandas之外,我很难想到。

I have the following solutions, which also answers this question in the beginning.我有以下解决方案,它们在一开始也回答了这个问题。 Probably not ideal but got what I needed.可能不理想,但得到了我需要的东西。
Hopefully this helps too.希望这也有帮助。

import csv
file =  open('data.csv', 'r')
reader = csv.reader(file)

items = []  # put the rows in csv to a list
aisle_dept_id = []  # to have a tuple of aisle and dept ids
mydict = {} # porudtc id as keys and list of above tuple as values in a dictionary

product_id, aisle_id, department_id, product_name = [], [], [], []

for row in reader:
    items.append(row)

for i  in range(1, len(items)):
    product_id.append(items[i][0])
    aisle_id.append(items[i][1])
    department_id.append(items[i][2])
    product_name.append(items[i][3])

for item1, item2 in zip(aisle_id, department_id):
    aisle_dept_id.append((item1, item2))
for item1, item2 in zip(product_id, aisle_dept_id):
    mydict.update({item1: [item2]})

With the output,随着输出,

mydict:
{'9327': [('104', '13')],
 '17461': [('35', '12')],
 '17668': [('91', '16')],
 '28985': [('83', '4')],
 '32665': [('112', '3')],
 '33120': [('86', '16')],
 '45918': [('19', '13')],
 '46667': [('83', '4')],
 '46842': [('93', '3')]}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM