[英]Appending series data with repeated index to pandas dataframe column
I have a series named result and the data in it is copied 5 times using numpy repeat function. 我有一个名为result的序列,并且使用numpy repeat函数将其中的数据复制了5次。
result=np.repeat(rating_df['RESULT'],5)
result series looks like this with repeaed index. 结果序列看起来像这样,具有重复的索引。 I want to add result series data to a new column in feature_file_df datframe 我想将结果系列数据添加到feature_file_df datframe中的新列中
feature_file_df_trans['result']=result
I am getting this error 我收到此错误
alueError Traceback (most recent call last)
<ipython-input-150-cffb056edf1a> in <module>()
----> 1 feature_file_df_trans['result']=result
/home/jayashree/anaconda2/lib/python2.7/site-packages/pandas/core/frame.pyc in __setitem__(self, key, value)
2427 else:
2428 # set column
-> 2429 self._set_item(key, value)
2430
2431 def _setitem_slice(self, key, value):
/home/jayashree/anaconda2/lib/python2.7/site-packages/pandas/core/frame.pyc in _set_item(self, key, value)
2493
2494 self._ensure_valid_index(value)
-> 2495 value = self._sanitize_column(key, value)
2496 NDFrame._set_item(self, key, value)
2497
/home/jayashree/anaconda2/lib/python2.7/site-packages/pandas/core/frame.pyc in _sanitize_column(self, key, value, broadcast)
2643
2644 if isinstance(value, Series):
-> 2645 value = reindexer(value)
2646
2647 elif isinstance(value, DataFrame):
/home/jayashree/anaconda2/lib/python2.7/site-packages/pandas/core/frame.pyc in reindexer(value)
2635 # duplicate axis
2636 if not value.index.is_unique:
-> 2637 raise e
2638
2639 # other
ValueError: cannot reindex from a duplicate axis
How to add column to the dataframe that looks like this 如何将列添加到如下所示的数据框中
I think you can cast Series
to values
, then add to column numpy array
: 我认为您可以将Series
为values
,然后添加到numpy array
列:
Notice - need same length of output numpy array
as column for append. 注意-需要相同的输出numpy array
长度作为追加列。
feature_file_df_trans['result']=np.repeat(rating_df['RESULT'].values,5)
Sample: 样品:
rating_df = pd.DataFrame({'RESULT':[1,2,3]})
feature_file_df_trans = pd.DataFrame({'a':range(15)})
feature_file_df_trans['result']=np.repeat(rating_df['RESULT'].values,5)
print (feature_file_df_trans)
a result
0 0 1
1 1 1
2 2 1
3 3 1
4 4 1
5 5 2
6 6 2
7 7 2
8 8 2
9 9 2
10 10 3
11 11 3
12 12 3
13 13 3
14 14 3
More general solution if lengths are different, need get minimum of each lengths and filter by it in Series
constructor: 如果长度不同,则是更通用的解决方案,需要获取每个长度的最小值并在Series
构造函数中对其进行过滤:
rating_df = pd.DataFrame({'RESULT':[1,2,3,5,6,7]})
feature_file_df_trans = pd.DataFrame({'a':range(15)}, index = range(3, 18))
result = np.repeat(rating_df['RESULT'].values,5)
len1 = len(feature_file_df_trans.index)
print (len1)
15
len2 = len(result)
print (len2)
30
len_min = min(len1, len2)
feature_file_df_trans['result'] = pd.Series(result[:len_min],
index=feature_file_df_trans.index[:len_min])
print (feature_file_df_trans)
a result
3 0 1
4 1 1
5 2 1
6 3 1
7 4 1
8 5 2
9 6 2
10 7 2
11 8 2
12 9 2
13 10 3
14 11 3
15 12 3
16 13 3
17 14 3
rating_df = pd.DataFrame({'RESULT':[1,2]})
feature_file_df_trans = pd.DataFrame({'a':range(15)})
result = np.repeat(rating_df['RESULT'].values,5)
len1 = len(feature_file_df_trans.index)
print (len1)
15
len2 = len(result)
print (len2)
10
len_min = min(len1, len2)
feature_file_df_trans['result'] = pd.Series(result[:len_min],
index=feature_file_df_trans.index[:len_min])
print (feature_file_df_trans)
a result
0 0 1.0
1 1 1.0
2 2 1.0
3 3 1.0
4 4 1.0
5 5 2.0
6 6 2.0
7 7 2.0
8 8 2.0
9 9 2.0
10 10 NaN
11 11 NaN
12 12 NaN
13 13 NaN
14 14 NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.