简体   繁体   English

有效地将自定义CSV文件读入Python

[英]Efficiently Read custom CSV Files into Python

I am trying to learn Python and started with this task of trying to import specific csv files in a given folder into a Python Data Type and then further processing the data. 我正在尝试学习Python,并开始了尝试将给定文件夹中的特定csv文件导入Python数据类型,然后进一步处理数据的任务。 I am struggling with the part where the data needs to be imported into Python . 我在需要将数据导入Python的部分中苦苦挣扎。 I need this to be efficient. 我需要这个来提高效率。 I tried a couple of things and attempted a couple of approaches based on suggestions provided in the forums and other web pages - all of them resulting in one problem or other . 我尝试了几件事,并根据论坛和其他网页上提供的建议尝试了几种方法-所有这些都会导致一个问题或其他问题。 If any one can help solve this , the help would be greatly appreciated. 如果有人可以帮助解决此问题,将不胜感激。

Note: I have already imported pandas as pd 注意:我已经将pandas导入为pd

Approach 1 : 方法1

DF = pd.read_csv(FilePath)

This yields the following error : 这将产生以下错误:

Error tokenizing data. 标记数据时出错。 C error: Expected 1 fields in line 13, saw 2 C错误:第13行中应有1个字段,看到2

Approach 2 : 方法2:

DF = pd.read_csv(FilPath, skiprows=3)

This also yields the same error : 这也会产生相同的错误:

Error tokenizing data. 标记数据时出错。 C error: Expected 1 fields in line 13, saw 2 C错误:第13行中应有1个字段,看到2

Approach 3 : 方法3:

data = pd.read_csv(FilePath, error_bad_lines=False)

This skips every row and reads one character per line . 这将跳过每一行并每行读取一个字符。 This makes processing the data any further hard. 这使得处理数据变得更加困难。

https://dl.dropboxusercontent.com/u/32778128/Test.csv https://dl.dropboxusercontent.com/u/32778128/Test.csv

If any one has any suggestions to fix this problem , I would greatly appreciate the help. 如果有人对解决此问题有任何建议,我将不胜感激。

Best U 最好的U

When you say "DataFrame" what you should be using is the Pandas library. 当您说“ DataFrame”时,您应该使用的是Pandas库。 Pandas gives you the Pandas DataFrame where you can easily manipulate import csv files and start manipulating the data. Pandas为您提供了Pandas DataFrame,您可以在其中轻松地处理导入的CSV文件并开始处理数据。 You should look into the pandas.read_csv function specifically. 您应该专门研究pandas.read_csv函数。 It will do what you're asking and more. 它将满足您的要求,甚至更多。 Look into the "skiprows" argument if you need to filter by row. 如果需要按行过滤,请查看“ skiprows”参数。

You can get a DataFrame object by doing the following: 您可以通过执行以下操作获取DataFrame对象:

import pandas

df = pandas.read_csv('boing.csv')  # Creates dataframe from specified CSV file

If you need more than that then you can refer to the documentation linked above, pandas.read_csv takes too many arguments to list here. 如果您需要的还不止这些,那么您可以参考上面链接的文档,pandas.read_csv需要太多参数以在此处列出。 I hope this helps. 我希望这有帮助。

Example with skiprows : skiprows示例:

df = pandas.read_csv('boing.csv', skiprows=2)

This will give you a DataFrame skipping the first two rows of your CSV file. 这将为您提供一个跳过CSV文件的前两行的DataFrame。 You can change 2 to any number of headers you have. 您可以将2更改为任意数量的标题。 When using skiprows , make sure the you are not skipping a row that is representative of the actual data in the file. 使用skiprows ,请确保您没有跳过代表文件中实际数据的行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM