Python中CSV文件所有列的唯一元素，不使用Pandas

Question

I am trying to get the unique values of all the columns in the CSV. I am getting the column number and creating sets for all the columns and trying to go through the csv data and find the unique columns.我正在尝试获取 CSV 中所有列的唯一值。我正在获取列号并为所有列创建集合，并尝试通过 csv 数据查找 go 并找到唯一列。 But the second loop executes only once.但是第二个循环只执行一次。

decoded_file = data_file.read().decode('utf-8')
reader = csv.reader(decoded_file.splitlines(),
                            delimiter=',')
list_reader = list(reader)
data = iter(list_reader)
next(data) #skipping the header
col_number = len(next(data))
col_sets = [set() for i in range(col_number)]

for col in range(col_number):
   for new_row in data:
       col_sets[col].add(new_row[col])
   print(col_sets[col])

I need to get all the unique values for each column and add it to col_sets to access it.我需要获取每列的所有唯一值并将其添加到 col_sets 以访问它。 What is the best way to do this?做这个的最好方式是什么？

Answer 1

Everything is good, but you should just change the order of iterations.一切都很好，但您应该只更改迭代顺序。


for new_row in data:
    for col in range(col_number):
        col_sets[col].add(new_row[col])
print(col_sets)

Python中CSV文件所有列的唯一元素，不使用Pandas

问题描述

1 个解决方案

解决方案1
0 已采纳 2022-04-25 07:15:53

Python中CSV文件所有列的唯一元素，不使用Pandas

问题描述

1 个解决方案

解决方案1 0 已采纳 2022-04-25 07:15:53

解决方案1
0 已采纳 2022-04-25 07:15:53