Why pandas DataFrame allows to set column using too large Series?

Question

Is there a reason why pandas raises ValueError exception when setting DataFrame column using a list and doesn't do the same when using Series? Resulting in superfluous Series values being ignored (eg 7 in example below).

>>> import pandas as pd
>>> df = pd.DataFrame([[1],[2]])
>>> df
   0
0  1
1  2
>>> df[0] = [5,6,7]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\Python310\lib\site-packages\pandas\core\frame.py", line 3655, in __setitem__
    self._set_item(key, value)
  File "D:\Python310\lib\site-packages\pandas\core\frame.py", line 3832, in _set_item
    value = self._sanitize_column(value)
  File "D:\Python310\lib\site-packages\pandas\core\frame.py", line 4529, in _sanitize_column
    com.require_length_match(value, self.index)
  File "D:\Python310\lib\site-packages\pandas\core\common.py", line 557, in require_length_match
    raise ValueError(
ValueError: Length of values (3) does not match length of index (2)
>>>
>>> df[0] = pd.Series([5,6,7])
>>> df
   0
0  5
1  6

Tested using python 3.10.6 and pandas 1.5.3 on Windows 10.

Answer 1

You have right the behaviour is different between list and np.array but it's expected.

If you take a look in the source code in the frame.py module you will see that if the value is a list then it checks the length, in np.array doesn't check the length and as you observed is the np.array is larger, its truncated.

NOTE: The details of the np.array truncation is here

Why pandas DataFrame allows to set column using too large Series?

Question

1 answers

solution1
1 2023-01-20 12:38:29

Why pandas DataFrame allows to set column using too large Series?

Question

1 answers

solution1 1 2023-01-20 12:38:29

solution1
1 2023-01-20 12:38:29