[英]Adding list with different length as a new column to a dataframe
I am willing to add or insert the list values in the dataframe. 我愿意在数据框中添加或插入列表值。 The dataframe len is 49
, whereas the length of list id 47
. 数据帧len是49
,而列表id的长度是47
。 I am getting the following error while implementing the code. 我在实现代码时遇到以下错误。
print("Lenght of dataframe: ",datasetTest.open.count())
print("Lenght of array: ",len(test_pred_list))
datasetTest['predict_close'] = test_pred_list
The error is: 错误是:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-105-68114a4e9a82> in <module>()
5 # datasetTest = datasetTest.dropna()
6 # print(datasetTest.count())
----> 7 datasetTest['predict_close'] = test_pred_list
8 # test_shifted['color_predicted'] = test_shifted.apply(determinePredictedcolor, axis=1)
9 # test_shifted['color_original'] =
c:\python35\lib\site-packages\pandas\core\frame.py in __setitem__(self, key, value)
2517 else:
2518 # set column
-> 2519 self._set_item(key, value)
2520
2521 def _setitem_slice(self, key, value):
c:\python35\lib\site-packages\pandas\core\frame.py in _set_item(self, key, value)
2583
2584 self._ensure_valid_index(value)
-> 2585 value = self._sanitize_column(key, value)
2586 NDFrame._set_item(self, key, value)
2587
c:\python35\lib\site-packages\pandas\core\frame.py in _sanitize_column(self, key, value, broadcast)
2758
2759 # turn me into an ndarray
-> 2760 value = _sanitize_index(value, self.index, copy=False)
2761 if not isinstance(value, (np.ndarray, Index)):
2762 if isinstance(value, list) and len(value) > 0:
c:\python35\lib\site-packages\pandas\core\series.py in _sanitize_index(data, index, copy)
3119
3120 if len(data) != len(index):
-> 3121 raise ValueError('Length of values does not match length of ' 'index')
3122
3123 if isinstance(data, PeriodIndex):
ValueError: Length of values does not match length of index
How I can get rid of this error. 我怎么能摆脱这个错误。 Please help me. 请帮我。
If you convert the list to a Series then it will just work: 如果您将列表转换为系列,那么它将正常工作:
datasetTest.loc[:,'predict_close'] = pd.Series(test_pred_list)
example: 例:
In[121]:
df = pd.DataFrame({'a':np.arange(3)})
df
Out[121]:
a
0 0
1 1
2 2
In[122]:
df.loc[:,'b'] = pd.Series(['a','b'])
df
Out[122]:
a b
0 0 a
1 1 b
2 2 NaN
The docs refer to this as setting with enlargement which talks about adding or expanding but it also works where the length is less than the pre-existing index. 文档将此称为扩展设置,其中涉及添加或扩展,但它也适用于长度小于预先存在的索引的情况。
To handle where the index doesn't start at 0
or in fact is not an int: 要处理索引从0
开始的位置或实际上不是int:
In[126]:
df = pd.DataFrame({'a':np.arange(3)}, index=np.arange(3,6))
df
Out[126]:
a
3 0
4 1
5 2
In[127]:
s = pd.Series(['a','b'])
s.index = df.index[:len(s)]
s
Out[127]:
3 a
4 b
dtype: object
In[128]:
df.loc[:,'b'] = s
df
Out[128]:
a b
3 0 a
4 1 b
5 2 NaN
You can optionally replace the NaN
if you wish calling fillna
如果您希望调用fillna
可以选择替换NaN
You can add items to your list with an arbitrary filler
scalar. 您可以使用任意filler
标量向列表中添加项目。
Data from @EdChum. 来自@EdChum的数据。
filler = 0
lst = ['a', 'b']
df.loc[:, 'b'] = lst + [filler]*(len(df.index) - len(lst))
print(df)
a b
0 0 a
1 1 b
2 2 0
You still can assign it by using loc
data from Ed 您仍然可以使用Ed中的loc
数据来分配它
l = ['a','b']
df.loc[range(len(l)),'b'] = l
df
Out[546]:
a b
0 0 a
1 1 b
2 2 NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.