如何合並相似字符串的數值？

Question

我有一個許可證列表和相關的許可證計數，例如：

1 Third Party SIP Device Seat   
1 Third Party SIP Device Seat   
1 Third Party SIP Device Seat   
3 Station   
3 Station   
3 Station   
20 Station

列表永遠不會以相同的順序排列，我只需要添加每種許可證類型的總數，因此在示例中，我希望返回：

3 Third Party SIP Device Seat
29 Station

數據被輸入到未保存的記事本中，然后移入數據庫中。 使用excel不適用於數字和名稱之間的空格，而不是制表符。

什么是完成此任務的最簡單方法？

Answer 1

這是一個非常丑陋的解決方案：

from functools import reduce
from collections import defaultdict

lines = [       # replace with e.g: with open('input.txt', 'r') as f: lines = f.readlines()
  "1 Third Party SIP Device Seat",   
  "1 Third Party SIP Device Seat",  
  "1 Third Party SIP Device Seat", 
  "3 Station",
  "3 Station",  
  "3 Station",  
  "20 Station"
]

def f(acc, x):
  acc[" ".join(x.split(" ")[1:])] += int(x.split(" ")[0]) # first element is the count, everything after we use as "key"
  return acc

r = dict(reduce(f, lines, defaultdict(int)))

print(r)
# {'Third Party SIP Device Seat': 3, 'Station': 29}

# to write to file:
with open("output.txt", "w") as f:  
  for k, v in r.items():
    f.write(str(v) + " " + str(k))

Answer 2

你想要一個groupby。 幸運的是itertools有一個

from itertools import groupby 

text = """1 Third Party SIP Device Seat    
1 Third Party SIP Device Seat    
1 Third Party SIP Device Seat    
3 Station    
3 Station    
3 Station    
0 Station""" 

# clean stuff up and split on first space
lines = [line.strip().split(" ", 1) for line in text.split("\n")]

# groupby
result = []
for k, g in groupby(lines, lambda x: x[1]): 
    total = 0 
    for i in g: 
        total += int(i[0]) 
    result.append([k, total])  
print(result)

Answer 3

在名為licences.txt的“保存的記事本文件”中包含數據的完整解決方案：

from collections import Counter
counter=Counter()
with open ('licences.txt','r') as f:
    for line in f:
        count,*words = line.split()
        counter[" ".join(words)] += int(count)

with open('grouped_licences.txt','w') as f:
    for licence,total in counter.items():
        f.write(str(total) + " " + licence + "\n")

然后結果在文件grouped_licences.txt ：

    3 Third Party SIP Device Seat 
    29 Station

pandas另一種解決方案：

df=pandas.read_csv('licences.txt', sep=" ",header=None).fillna("")        
df["licence"]=df.iloc[:,1:].apply(" ".join,axis=1)        
print(df.groupby("licence")[0].sum())

對於：

licence
Station                           29
Third Party SIP Device Seat        3

如何合並相似字符串的數值？

問題描述

3 個解決方案

解決方案1
2 已采納 2019-05-30 16:03:04

解決方案2
1 2019-05-30 16:07:09

解決方案3
1 2019-05-30 16:21:11

如何合並相似字符串的數值？

問題描述

3 個解決方案

解決方案1 2 已采納 2019-05-30 16:03:04

解決方案2 1 2019-05-30 16:07:09

解決方案3 1 2019-05-30 16:21:11

解決方案1
2 已采納 2019-05-30 16:03:04

解決方案2
1 2019-05-30 16:07:09

解決方案3
1 2019-05-30 16:21:11