繁体   English   中英

Modin库在做简单的pandas操作时抛出错误

[英]Modin library throws errors while doing simple pandas operation

我遇到了应该加速一些 Pandas 操作的 modin 库并开始测试它。

虽然使用 read_csv 加载数据明显更快,但在纯 Pandas 中完美运行的简单条件表达式,例如:

    df.loc[df['Score'] > 8,'Score_T2B'] = 1
    df.loc[df['Score'] < 9,'Score_T2B'] = 0

抛出许多错误:

回溯(最近一次调用最后一次):

  File "<ipython-input-21-0b842942ffac>", line 1, in <module>
    df.loc[df['Score'] > 8,'Score_T2B'] = 1

  File "C:\ProgramData\Anaconda3\lib\site-packages\modin\pandas\indexing.py", line 251, in __setitem__
new_col[row_loc] = item

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\series.py", line 1244, in __setitem__
setitem(key, value)

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\series.py", line 1221, in setitem
self.loc[key] = value

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 204, in __setitem__
indexer = self._get_setitem_indexer(key)

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 191, in _get_setitem_indexer
return self._convert_to_indexer(key, axis=axis, is_setter=True)

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 1285, in _convert_to_indexer
return self._get_listlike_indexer(obj, axis, **kwargs)[1]

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 1092, in _get_listlike_indexer
keyarr, indexer, o._get_axis_number(axis), raise_missing=raise_missing

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 1177, in _validate_read_indexer
key=key, axis=self.obj._get_axis_name(axis)

KeyError: "None of [Index([False, False, False, False, False, False, False, False, False, False,\n       ...\n       False, False, False, False, False, False, False, False, False, False],\n      dtype='object', length=169815)] are in the [index]"

这应该是一个简单的操作。 有没有解决办法,或者我只是错过了加载以外的东西:

  import modin.pandas as pm  
  df = pm.read_csv(input_file, sep='\t', encoding='utf-8', low_memory=False)

非常感谢!

我发现read_csvread_csv函数与替换掉的read_csv函数不完全一样。 特别是,它也不做异常处理,并抛出原本可以与 Pandas 一起使用的异常。

也许最好在失败时导入两个版本并回退到熊猫?

这是我的进口商

try:
    import modin.pandas as pd # claims to be 4X faster loading csvs, using all processor cores
    print('modin.pandas active')
except ImportError:
    import pandas as pd

这是我未能正确 read_csv 的示例:

这会导致 modin 的 TypeError,但不会导致 pandas 的错误。 正在加载的文件不包含列名'IlmnID'

try:
    sample = pd.read_csv(part, index_col='IlmnID')
except ValueError:
    sample = pd.read_csv(part)

这适用于 modin.pd.read_csv,因为没有 try/except 包装器

sample = pd.read_csv(part)
if 'IlmnID' in sample.columns:
    sample.set_index('IlmnID', inplace=True)
elif 'illumina_id' in sample.columns:
    sample.set_index('illumina_id', inplace=True)
    sample.rename(index={'illumin_id': 'IlmnID'}, inplace=True)
else:
    # assume first column
    guess_index = columns[0]
    sample.set_index(guess_index, inplace=True)
    sample.rename(index={guess_index: 'IlmnID'}, inplace=True)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM