简体   繁体   English

如何使用Pandas在Python中读取文本文件

[英]How to read text file in Python using Pandas

I'm new to Pandas and I've been trying to do a scatter plot in Python 2.7, I've the dataset in .txt file something like this (comma separated) 我是Pandas的新手,我一直在尝试在Python 2.7中做散点图,在.txt文件中添加了数据集,如下所示(逗号分隔)

6.1101,17.592
5.5277,9.1302
8.5186,13.662
7.0032,11.854
5.8598,6.8233
8.3829,11.886
7.4764,4.3483



import pandas as pd
import matplotlib.pyplot as mplt

# Taking Dataset using Pandas

input_data = pd.read_csv('data.txt');
#input_data.head(5)

How to plot the above data in scatter plot without any headers on the dataset ? 如何在数据集中没有标题的散点图中绘制以上数据?

I've seen in tutorials and examples that if the data set has column headings then it's possible to plot the scatter plot. 我在教程和示例中已经看到,如果数据集具有列标题,则可以绘制散点图。 I tried putting x and y as the headers for the two columns of the data set in .txt file and tried the below code. 我尝试将x和y作为.txt文件中数据集的两列的标题,并尝试了以下代码。

input_data = pd.read_csv('data.txt');
#input_data.head(5)
x_value = input_data[['x']]
y_value = input_data[['y']]

mplt.scatter(x_value, y_value)

But still I'm getting error as shown below 但是仍然出现错误,如下所示

Traceback (most recent call last):
  File "E:\IIT Madras\Research\Experiments\Machine Learning\Linear Regression\Linear_Regression.py", line 16, in <module>
    y_value = input_data[['y']]
  File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 1791, in __getitem__
    return self._getitem_array(key)
  File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 1835, in _getitem_array
    indexer = self.ix._convert_to_indexer(key, axis=1)
  File "C:\Python27\lib\site-packages\pandas\core\indexing.py", line 1112, in _convert_to_indexer
    raise KeyError('%s not in index' % objarr[mask])
KeyError: "['y'] not in index"

Is there a better way to deal with this (with and without header names) ? 有没有更好的方法来解决这个问题(带或不带标题名称)?

EDIT: 编辑:

The following worked for me after going through Ishan reply 以下经过Ishan回复为我工作

input_data = pd.read_csv('data.txt', header =None);
x_value = input_data[[0]]
y_value = input_data[[1]]
mplt.scatter(x_value, y_value)
mplt.show()

Try importing the data without column headers and then naming columns by your own : 尝试导入没有列标题的数据,然后用自己的名称命名列:

df=pd.read_csv(r'/home/ishan/Desktop/file',header=None)
df.columns=['x','y']
import matplotlib.pyplot as plt
plt.scatter(df['x'],df['y'])
plt.show()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM