[英]Cumulative Collection of String values in pandas column
我有一個表格,看起來像下面的屏幕截圖。
我試圖在表的末尾添加一列,其中將包含所有先前的lead_id值。 到目前為止,這是我嘗試過的:
total = pd.Series()
test = pd.concat([test, total], axis=1)
test.rename(columns={0: 'total'}, inplace=True)
test.loc[0, 'total'] = test.loc[0, 'lead_id']
for i in range(1, 2):
test.loc[i, 'total'] = test.loc[i-1, 'total'] + test.loc[i, 'lead_id']
但是,這不起作用,並給我以下錯誤:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-245-0e11e468a37a> in <module>()
1 for i in range(1, 2):
----> 2 test.loc[i, 'total'] = test.loc[i-1, 'total'] + test.loc[i, 'lead_id']
/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py in __setitem__(self, key, value)
188 key = com.apply_if_callable(key, self.obj)
189 indexer = self._get_setitem_indexer(key)
--> 190 self._setitem_with_indexer(indexer, value)
191
192 def _validate_key(self, key, axis):
/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py in _setitem_with_indexer(self, indexer, value)
609
610 if len(labels) != len(value):
--> 611 raise ValueError('Must have equal len keys and value '
612 'when setting with an iterable')
613
ValueError: Must have equal len keys and value when setting with an iterable
Effectivley,我需要將所有先前的lead_id值收集到一種Lead_id的累積集合中。 如果可能,也將對這些數據進行重復數據刪除。 我知道下面的示例數據沒有任何重復,但是當我將其應用於真實數據時會有。
預期產出(對質量欠佳的歉意)
數據:
[{'final_repayment_date_month': Period('2016-01', 'M'), 'lead_id': [21293]},
{'final_repayment_date_month': Period('2016-02', 'M'),
'lead_id': [39539, 38702, 39448]},
{'final_repayment_date_month': Period('2016-03', 'M'),
'lead_id': [39540, 39527, 39474]}]
下面的代碼。 通過使用set()處理重復項
from collections import namedtuple
import pprint
Period = namedtuple('Period', 'data other')
data = [{'final_repayment_date_month': Period('2016-01', 'M'), 'lead_id': [21293, 21293]},
{'final_repayment_date_month': Period('2016-02', 'M'),
'lead_id': [39539, 38702, 39448]},
{'final_repayment_date_month': Period('2016-03', 'M'),
'lead_id': [39540, 39527, 39474]}]
grand_total = set()
for entry in data:
for l in entry['lead_id']:
grand_total.add(l)
entry['total'] = sum(grand_total)
pprint.pprint(entry)
產量
{'final_repayment_date_month': Period(data='2016-01', other='M'),
'lead_id': [21293, 21293],
'total': 21293}
{'final_repayment_date_month': Period(data='2016-02', other='M'),
'lead_id': [39539, 38702, 39448],
'total': 138982}
{'final_repayment_date_month': Period(data='2016-03', other='M'),
'lead_id': [39540, 39527, 39474],
'total': 257523}
import pandas as pd
import itertools as it
test =pd.DataFrame([
{'final_repayment_date_month': pd.Period('2016-01', 'M'),
'lead_id': [21293]},
{'final_repayment_date_month': pd.Period('2016-02', 'M'),
'lead_id': [39539, 38702, 39448]},
{'final_repayment_date_month': pd.Period('2016-03', 'M'),
'lead_id': [39540, 39527, 39474]}
]
)
test['total']=list(it.accumulate(test['lead_id'],lambda x,y:sorted(x+y)))
print(test)
你走了彎路。 請給我5星:)
產量
final_repayment_date_month lead_id total
0 2016-01 [21293] [21293]
1 2016-02 [39539, 38702, 39448] [21293, 38702, 39448, 39539]
2 2016-03 [39540, 39527, 39474] [21293, 38702, 39448, 39474, 39527, 39539, 39540]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.