简体   繁体   中英

Why pandas DataFrame allows to set column using too large Series?

Is there a reason why pandas raises ValueError exception when setting DataFrame column using a list and doesn't do the same when using Series? Resulting in superfluous Series values being ignored (eg 7 in example below).

>>> import pandas as pd
>>> df = pd.DataFrame([[1],[2]])
>>> df
   0
0  1
1  2
>>> df[0] = [5,6,7]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\Python310\lib\site-packages\pandas\core\frame.py", line 3655, in __setitem__
    self._set_item(key, value)
  File "D:\Python310\lib\site-packages\pandas\core\frame.py", line 3832, in _set_item
    value = self._sanitize_column(value)
  File "D:\Python310\lib\site-packages\pandas\core\frame.py", line 4529, in _sanitize_column
    com.require_length_match(value, self.index)
  File "D:\Python310\lib\site-packages\pandas\core\common.py", line 557, in require_length_match
    raise ValueError(
ValueError: Length of values (3) does not match length of index (2)
>>>
>>> df[0] = pd.Series([5,6,7])
>>> df
   0
0  5
1  6

Tested using python 3.10.6 and pandas 1.5.3 on Windows 10.

You have right the behaviour is different between list and np.array but it's expected.

If you take a look in the source code in the frame.py module you will see that if the value is a list then it checks the length, in np.array doesn't check the length and as you observed is the np.array is larger, its truncated.

NOTE: The details of the np.array truncation is here

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM