简体   繁体   English

python-如何计算列中的数字数量?

[英]python - how to count number of numbers from columns?

I have a file with the following input data: 我有一个包含以下输入数据的文件:

       IN   OUT
data1  2.3  1.3
data2  0.1  2.1
data3  1.5  2.8
dataX  ...  ...

There are thousands of such files and each has the same data1, data2, data3, ..., dataX I'd like to count number of each value for each data and column from all files. 有成千上万个这样的文件,每个文件都有相同的data1,data2,data3,...,dataX。我想对所有文件中每个数据和列的每个值的数量进行计数。 Example: 例:

In file 'data1-IN' (filename) 在文件“ data1-IN”(文件名)中

2.3 - 50    (times)
0.1 - 233   (times)
... - ...   (times)

In file 'data1-OUT' (filename) 在文件“ data1-OUT”(文件名)中

2.1 - 1024 (times)
2.8 - 120  (times)
... - ...  (times)

In file 'data2-IN' (filename) 在文件“ data2-IN”(文件名)中

0.4 - 312    (times)
0.3 - 202   (times)
... - ...   (times)

In file 'data2-OUT' (filename) 在文件“ data2-OUT”(文件名)中

1.1 - 124 (times)
3.8 - 451  (times)
... - ...  (times)

In file 'data3-IN' ... 在文件'data3-IN'中...

Which Python data structure would be the best to count such data ? 哪种Python数据结构最适合计算此类数据? I wanted to use multidimensional dictionary but I am struggling with KeyErrors etc. 我想使用多维字典,但是我在KeyErrors等方面苦苦挣扎。

You really want to use collections.Counter , perhaps contained in a collections.defaultdict : 您确实想使用collections.Counter ,也许包含在collections.defaultdict

import collections
import csv

counts = collections.defaultdict(collections.Counter)

for filename in files:
    for line in csv.reader(open(filename, 'rb')):
         counts[filename + '-IN' ][line[1]] += 1
         counts[filename + '-OUT'][line[2]] += 1

I have recently started using the Pandas data frame. 我最近开始使用Pandas数据框。 It has a CSV reader and makes slicing and dicing data very simple. 它具有CSV阅读器,使切片和切块数据非常简单。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM