简体   繁体   English

Python中CSV文件所有列的唯一元素,不使用Pandas

[英]Unique elements of all the columns of CSV file in Python without using Pandas

I am trying to get the unique values of all the columns in the CSV. I am getting the column number and creating sets for all the columns and trying to go through the csv data and find the unique columns.我正在尝试获取 CSV 中所有列的唯一值。我正在获取列号并为所有列创建集合,并尝试通过 csv 数据查找 go 并找到唯一列。 But the second loop executes only once.但是第二个循环只执行一次。

decoded_file = data_file.read().decode('utf-8')
reader = csv.reader(decoded_file.splitlines(),
                            delimiter=',')
list_reader = list(reader)
data = iter(list_reader)
next(data) #skipping the header
col_number = len(next(data))
col_sets = [set() for i in range(col_number)]

for col in range(col_number):
   for new_row in data:
       col_sets[col].add(new_row[col])
   print(col_sets[col])

I need to get all the unique values for each column and add it to col_sets to access it.我需要获取每列的所有唯一值并将其添加到 col_sets 以访问它。 What is the best way to do this?做这个的最好方式是什么?

Everything is good, but you should just change the order of iterations.一切都很好,但您应该只更改迭代顺序。


for new_row in data:
    for col in range(col_number):
        col_sets[col].add(new_row[col])
print(col_sets)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM