簡體   English   中英

如何計算文本文件中值匹配的出現次數

[英]How to count occurences of a value match from a text file

這是我的問題:每個員工都由一個 ID 唯一標識(例如 KCUTD_41)我已經從一個文件中創建了一個字典來收集每個公司的員工 ID,如下所示:

{    'Company 1' :['KCUTD_41',
                   'KCTYU_48',
                   'VTSYC_48',
                      ......]
     'Company 2' :['PORUH_21',
                   'PUSHB_10',
                    ....... ]
     'Company 3' :['STEYRU_69']}

我總共有幾家公司。

在另一個文件中並行,我有幾行,其中每一行對應於具有多個員工和博士生的公司之間的協作組(d215485 等.....)

該文件如下所示:

PORUH_21 d215487 d215489 d213654 KCTYU_48 d154225 ...
d25548 d89852 VTSYC_48 d254548 d121154 d258774 PUSHB_10 ...
etc ....

我想要的是員工人數和組數(出現的行)以獲得類似的東西

OUTPUT:

Company 1 : (number of employees from company 1 per line ) : number of groups or line where it appears in total 
Company 2 : (number of employees per line from company2) : nb of groups or line where the employees from company2 appears in total
Company 3 : ......

我想使用一個條件來查看我的字典中每個鍵的值是否匹配,如果是,則計算出現次數

我希望它現在更好^^'

如果你能幫助我^^

我不清楚你希望 output 看起來如何,但這段代碼可能會幫助你到達你想要 go 的地方......

import re

companies = {
    'Company 1' :['KCUTD_41','KCTYU_48','VTSYC_48'],
    'Company 2' :['PORUH_21','PUSHB_10'],
    'Company 3' :['STEYRU_69']
     }

finalout = {}
for k,v in companies.items():
    finalout[k] = {"number_in_company":len(v)}
print (finalout)

lines_from_file = [
    "PORUH_21 d215487 d215489 d213654 KCTYU_48 d154225", 
    "d25548 d89852 VTSYC_48 d254548 d121154 d258774 PUSHB_10"
]


pattern_groups    = "(d\d+)"
pattern_employees = "([A-Z]_\d+)"
for line in lines_from_file:
    print("---------------------")
    print(line)
    print("Groups per line:", re.subn(pattern_groups, '', line)[1])
    print("Employees per line:", re.subn(pattern_employees, '', line)[1])

OUTPUT:

{'Company 1': {'number_in_company': 3}, 'Company 2': {'number_in_company': 2}, 'Company 3': {'number_in_company': 1}}
---------------------
PORUH_21 d215487 d215489 d213654 KCTYU_48 d154225
Groups per line: 4
Employees per line: 2
---------------------
d25548 d89852 VTSYC_48 d254548 d121154 d258774 PUSHB_10
Groups per line: 5
Employees per line: 2

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM