简体   繁体   English

使用python计算和存储值在字典中

[英]counting and storing values in a dictionary using python

def prodInfo():
    from collections import Counter
    prodHolder = {}
    tempdict = {}
    try:
        os.chdir(copyProd)
        for root, dirs, files in os.walk('.'):
            for data in files:

                fullpath = os.path.join(root, data)
                with open(fullpath, 'rt') as fp:
                    for info in fp:
                        info = info.strip()
                        if info.startswith('prodType'):
                            info0 = info.split('=')[1]
                            info0 = info0.replace(';','')
                            info0 = info0.replace('"','')
                        if info.startswith('acq'):
                            info1 = info.split('=')[1]  
                            info1 = info1.replace(';','')
                            info1 = info1.replace('"','')
                        if info.startswith('ID_num'):
                            info2 = info.split('=')[1]
                            info2 = info2.replace(';','')
                            info2 = info2.replace('"','')

                    print info0 + info1 + info2

produces this result: 产生以下结果:

SD Acq645467 356788
SD Acq645467 356788
SD Acq645467 356788
SD Acq645467 356788
SD Acq645467 356788
SD Acq645467 356788
SD Acq645467 356788
SD Acq645467 356788
SD Acq645467 356788
Image Acq645467 356788
Image Acq645467 356788
Image Acq645467 356788
Image Acq645467 356788

SD Acq644869 356849
SD Acq644869 356849
Image Acq644869 356849

SD Acq644247 356851
SD Acq644247 356851
Image Acq644247 356851

I would like to store the results and have the ability to count the number of times 'SD' occurs for each specific Id number (356788/356849/356851) and how many 'images' for each Id number. 我想存储结果,并能够计算每个特定ID编号(356788/356849/356851)出现“ SD”的次数以及每个ID编号有多少个“图像”。

The results would be as follows: 结果如下:

9 - SD / 4 - Image for 356788 9-SD / 4-356788的图像

2 - SD / 1 - Image for 356849 2-SD / 1-356849的图像

2 - SD / 1 - Image for 356851 2-SD / 1-356851的图像

I though it would be best if I stored the items in a dictionary but have not been able to successfully count the values. 我最好将项目存储在字典中,但无法成功计算值。 This is the code I have used to store the info in a dictionary. 这是我用来在字典中存储信息的代码。

prodHolder[info2] = {'SD/Image': info0, 'Acq' : info1}
total_Acq = prodHolder
print prodHolder

Results are: 结果是:

{'356788': {'SD/Image': 'SD', 'Acq': Acq645467'}} ... {'356788':{'SD / Image':'SD','Acq':Acq645467'}} ...

Every time the function is run a different set of values will be entered thus producing a different result. 每次运行该功能时,都会输入一组不同的值,从而产生不同的结果。

So there's two questions here. 所以这里有两个问题。

1) How to write the results into a file: 1)如何将结果写入文件:

I'd use csv (comma-separated-values). 我会使用csv(逗号分隔值)。 Python has a great module for that ( csv ) Python为此提供了一个很棒的模块( csv

You can modify your code so, at the same time it reads from a file (as it already does), it writes info0 , info1 and info2 to a .csv file: 您可以修改代码,以便在读取文件的同时(已经这样做),将info0info1info2写入.csv文件:

def prodInfo():
    from collections import Counter
    prodHolder = {}
    tempdict = {}
    try:
        os.chdir(copyProd)
        for root, dirs, files in os.walk('.'):
                for data in files:
                fullpath = os.path.join(root, data)
                with open(fullpath, 'r') as fp,\
                     open('./stack59.write.csv', 'w') as fw:

                    writer = csv.writer(fw)
                    for info in fp:
                    # [ . . . ]
                    # Yadda yadda yadda
                    print info0 + info1 + info2
                    writer.writerow([info0, info1, info2])

This will create a file stack59.write.csv looking like: 这将创建一个文件stack59.write.csv如下所示:

SD,Acq645467,356788
SD,Acq645467,356788
SD,Acq645467,356788
[ . . . ]
SD,Acq644247,356851
SD,Acq644247,356851
Image,Acq644247,356851

2) How to count common results: 2)如何计算常见结果:

For that, probably itertools.groupby would suit your needs. 为此, itertools.groupby可能适合您的需求。 You might wanna look at what iterators do, as well (see this , this and this ) 您可能还想看看迭代器的功能(请参阅thisthisthis

First, I'd store the data into a matrix: 首先,我将数据存储到一个矩阵中:

def prodInfo():
    from collections import Counter
    prodHolder = {}
    tempdict = {}
    data_matrix = []   # NEW !
    try:
        os.chdir(copyProd)
        for root, dirs, files in os.walk('.'):
            for data in files:
                # [ . . . ]
                # Yadda, yadda, yadda...
                print info0 + info1 + info2
                data_matrix.append([info0, info1, info2])  # NEW!

And then you can group your data_matrix as you please. 然后,您可以根据需要将data_matrix分组。 For instance: 例如:

# First, group by picture id (356788, 356849...), which is
# the third column of the data
for group_by_id in itertools.groupby(data_matrix,
                                     lambda x: x[2]):
    # Now, within those groups, group by type, the first column
    # of the data (SD, Image...)
    for group_by_type in itertools.groupby([a for a in group_by_id[1]],
                                           lambda y: y[0]):
        print "%s: %s %s" % (group_by_id[0],
                             len([a for a in group_by_type[1]]),
                             group_by_type[0])
    print ''

Which outputs: 哪个输出:

356788: 9 SD
356788: 4 Image

356849: 2 SD
356849: 1 Image

356851: 2 SD
356851: 1 Image

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM