简体   繁体   English

将熊猫对象转换为 numpy 数组

[英]Converting panda object to numpy array

I have a simple code to find similar rows in a dataset.我有一个简单的代码来查找数据集中的相似行。

 h=0
count=0
#227690
deletedIndexes=np.zeros((143,))
len(data)
for i in np.arange(len(data)):
    if(data[i-1,2]==data[i,2]):
        similarIndexes[h]=int(i)
        h=h+1        
        count=count+1
        print("similar found in -->", i," there are--->", count)

It works correctly when data is a numpy.ndarray But if data is a panda object, i give the following error:当数据是 numpy.ndarray 时它可以正常工作但是如果数据是熊猫对象,我会给出以下错误:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
 File "<stdin>", line 7, in smilarData
  File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 1658, in __getitem__
return self._getitem_column(key)
  File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 1665, in _getitem_column

return self._get_item_cache(key)返回 self._get_item_cache(key)

File "/usr/lib/python2.7/dist-packages/pandas/core/generic.py", line 1005, in _get_item_cache
values = self._data.get(item)



File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", line 2874, in get
_, block = self._find_block(item)



File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", line 3186, in _find_block
self._check_have(item)



 File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", line 3193, in _check_have


 raise KeyError('no item named %s' % com.pprint_thing(item))
KeyError: u'no item named (-1, 2)'

What should i do to use this code?我应该怎么做才能使用此代码? If converting pandas object to numpy array is helpful, how can i do that?如果将 pandas 对象转换为 numpy 数组有帮助,我该怎么做?

To convert a pandas dataframe to a numpy array:要将 Pandas 数据帧转换为 numpy 数组:

import numpy as np
np.array(dataFrame)

I can not comment yet to Adrienne's answer so I would like to add that dataframes have built in method to convert df to array ie matrix我还不能对 Adrienne 的回答发表评论,所以我想补充一点,数据帧已经内置了将 df 转换为数组即矩阵的方法

>>> df = pd.DataFrame({"a":range(5),"b":range(5,10)})
>>> df
   a  b
0  0  5
1  1  6
2  2  7
3  3  8
4  4  9
>>> mat = df.as_matrix()
array([[0, 5],
       [1, 6],
       [2, 7],
       [3, 8],
       [4, 9]])
>>>col = [x[0] for x in mat] # to get certain columns
>>> col
[0, 1, 2, 3, 4]

also to find duplicated rows you can do:还可以找到重复的行:

>>> df2
   a  b
0  0  5
1  1  6
2  2  7
3  3  8
4  4  9
5  0  5
>>> df2[df2.duplicated()]
   a  b
5  0  5

I subscribe to the previous answers but in case you want to work directly with pandas objects, accessing DataFrame items has its own special way.我同意前面的答案,但如果你想直接与工作pandas对象,访问数据帧的项目有它自己的特殊方式。 In your code you should say eg在您的代码中,您应该说例如

if(data.iloc[i-1,2]==data.iloc[i,2]):

See the documentation for more查看文档了解更多

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM