在 Python 中从 CSV 计算唯一值的最佳方法？

Question

我需要一种快速计算 CSV 中唯一值的方法（例如，它是一个无法在 Excel 中打开的非常大的文件（> 100mb）），我想创建一个 python 脚本。

CSV 如下所示：

我只需要脚本来返回文件中有多少不同的值。 例如，对于上述所需的输出将是：

6

到目前为止，这就是我所拥有的：

import csv
input_file = open(r'C:\Users\guill\Downloads\uu.csv')
csv_reader = csv.reader(input_file, delimiter=',')
thisdict = {
  "UserId": 1
}

for row in csv_reader:
    if row[0] not in thisdict:
        thisdict[row[0]] = 1

print(len(thisdict)-1)

似乎工作正常，但我想知道是否有更好/更有效/更优雅的方法来做到这一点？

Answer 1

集合比字典更适合这个问题：

with open(r'C:\Users\guill\Downloads\uu.csv') as f:
    input_file = f

csv_reader = csv.reader(f, delimiter=',')
uniqueIds = set()

for row in csv_reader:
    uniqueIds.add(row[0])

print(len(uniqueIds))

Answer 2

使用集合而不是字典，就像这样

import csv
input_file = open(r'C:\Users\guill\Downloads\uu.csv')
csv_reader = csv.reader(input_file, delimiter=',')
aa = set()
for row in csv_reader:
    aa.add(row[0])
print(len(aa))

在 Python 中从 CSV 计算唯一值的最佳方法？

问题描述

2 个解决方案

解决方案1
2 已采纳 2020-11-16 12:33:37

解决方案2
0 2020-11-16 12:35:31

在 Python 中从 CSV 计算唯一值的最佳方法？

问题描述

2 个解决方案

解决方案1 2 已采纳 2020-11-16 12:33:37

解决方案2 0 2020-11-16 12:35:31

解决方案1
2 已采纳 2020-11-16 12:33:37

解决方案2
0 2020-11-16 12:35:31