简体   繁体   English

用numpy数组替换pandas数据帧变量值

[英]replacing pandas dataframe variable values with a numpy array

I am doing a transformation on a variable from a pandas dataframe and then I would like to replace the column with my new values. 我正在对pandas数据框中的变量进行转换,然后我想用新值替换该列。 The problem seems to be that after the transformation, the length of the array is not the same as the length of my dataframe's index. 问题似乎是在转换之后,数组的长度与我的数据帧索引的长度不同。 I don't think that is true though. 我不认为这是真的。

>>> df['variable'] = stats.boxcox(df.variable)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\eMachine\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\frame.py", line 2119, in __setitem__
    self._set_item(key, value)
  File "C:\Users\eMachine\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\frame.py", line 2165, in _set_item
    value = self._sanitize_column(key, value)
  File "C:\Users\eMachine\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\frame.py", line 2205, in _sanitize_column
    raise AssertionError('Length of values does not match '
AssertionError: Length of values does not match length of index

When I check the length, these lengths seem to disagree. 当我检查长度时,这些长度似乎不一致。 The len(array) says it is 2 but when I call the stats.boxcox it says it is 50000. What is going on here? len(数组)说它是2但是当我打电话给stats.boxcox它说它是50000.这里发生了什么?

>>> len(df)
50000
>>> len(stats.boxcox(df.variable))
2
>>> stats.boxcox(df.variable)
(0    -0.079496
1    -0.117982
2    -0.104637

...
49985    -0.041300
49986     0.651771
49987    -0.115660
49988    -0.118034
49998    -0.118014
49999    -0.034076
Name: feat9, Length: 50000, dtype: float64, 8.4721358117221772)
>>> 

You can see in your example that the result of boxcox is a tuple. 你可以在你的例子中看到boxcox的结果是一个元组。 This is consistent with the documentation , which indicates that boxcox returns a tuple of the transformed data and a lambda value. 这与文档一致,表明boxcox返回转换数据的元组和lambda值。 Notice in the example on that page that it does: 请注意该页面上的示例:

xt, _ = stats.boxcox(x)

. . . showing again that boxcox returns a 2-tuple. 再次显示boxcox返回2元组。

You should be doing df['variable'] = stats.boxcox(df.variable)[0] . 你应该做df['variable'] = stats.boxcox(df.variable)[0]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM