pandas列中的字符串值的累積集合

Question

我有一個表格，看起來像下面的屏幕截圖。

我試圖在表的末尾添加一列，其中將包含所有先前的lead_id值。 到目前為止，這是我嘗試過的：

total = pd.Series()
test = pd.concat([test, total], axis=1)
test.rename(columns={0: 'total'}, inplace=True)
test.loc[0, 'total'] = test.loc[0, 'lead_id']

for i in range(1, 2):
    test.loc[i, 'total'] = test.loc[i-1, 'total'] + test.loc[i, 'lead_id']

但是，這不起作用，並給我以下錯誤：

    ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-245-0e11e468a37a> in <module>()
      1 for i in range(1, 2):
----> 2     test.loc[i, 'total'] = test.loc[i-1, 'total'] + test.loc[i, 'lead_id']

/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py in __setitem__(self, key, value)
    188             key = com.apply_if_callable(key, self.obj)
    189         indexer = self._get_setitem_indexer(key)
--> 190         self._setitem_with_indexer(indexer, value)
    191 
    192     def _validate_key(self, key, axis):

/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py in _setitem_with_indexer(self, indexer, value)
    609 
    610                     if len(labels) != len(value):
--> 611                         raise ValueError('Must have equal len keys and value '
    612                                          'when setting with an iterable')
    613 

ValueError: Must have equal len keys and value when setting with an iterable

Effectivley，我需要將所有先前的lead_id值收集到一種Lead_id的累積集合中。 如果可能，也將對這些數據進行重復數據刪除。 我知道下面的示例數據沒有任何重復，但是當我將其應用於真實數據時會有。

預期產出（對質量欠佳的歉意）

數據：

[{'final_repayment_date_month': Period('2016-01', 'M'), 'lead_id': [21293]},
 {'final_repayment_date_month': Period('2016-02', 'M'),
  'lead_id': [39539, 38702, 39448]},
 {'final_repayment_date_month': Period('2016-03', 'M'),
  'lead_id': [39540, 39527, 39474]}]

Answer 1

下面的代碼。 通過使用set（）處理重復項

from collections import namedtuple
import pprint

Period = namedtuple('Period', 'data other')

data = [{'final_repayment_date_month': Period('2016-01', 'M'), 'lead_id': [21293, 21293]},
        {'final_repayment_date_month': Period('2016-02', 'M'),
         'lead_id': [39539, 38702, 39448]},
        {'final_repayment_date_month': Period('2016-03', 'M'),
         'lead_id': [39540, 39527, 39474]}]

grand_total = set()
for entry in data:
    for l in entry['lead_id']:
        grand_total.add(l)
    entry['total'] = sum(grand_total)
    pprint.pprint(entry)

產量

  {'final_repayment_date_month': Period(data='2016-01', other='M'),
 'lead_id': [21293, 21293],
 'total': 21293}
{'final_repayment_date_month': Period(data='2016-02', other='M'),
 'lead_id': [39539, 38702, 39448],
 'total': 138982}
{'final_repayment_date_month': Period(data='2016-03', other='M'),
 'lead_id': [39540, 39527, 39474],
 'total': 257523}

Answer 2

import pandas as pd
import itertools as it

test =pd.DataFrame([
    {'final_repayment_date_month': pd.Period('2016-01', 'M'), 
    'lead_id': [21293]},
    {'final_repayment_date_month': pd.Period('2016-02', 'M'),
    'lead_id': [39539, 38702, 39448]},
    {'final_repayment_date_month': pd.Period('2016-03', 'M'),
    'lead_id': [39540, 39527, 39474]}
    ]
)
test['total']=list(it.accumulate(test['lead_id'],lambda x,y:sorted(x+y)))
print(test)

你走了彎路。 請給我5星:)

產量

  final_repayment_date_month                lead_id                                              total
0                    2016-01                [21293]                                            [21293]
1                    2016-02  [39539, 38702, 39448]                       [21293, 38702, 39448, 39539]
2                    2016-03  [39540, 39527, 39474]  [21293, 38702, 39448, 39474, 39527, 39539, 39540]

pandas列中的字符串值的累積集合

問題描述

2 個解決方案

解決方案1
1 2019-02-21 12:14:20

解決方案2
1 已采納 2019-02-21 12:27:07

pandas列中的字符串值的累積集合

問題描述

2 個解決方案

解決方案1 1 2019-02-21 12:14:20

解決方案2 1 已采納 2019-02-21 12:27:07

解決方案1
1 2019-02-21 12:14:20

解決方案2
1 已采納 2019-02-21 12:27:07