简体   繁体   English

python:在 for 循环中创建嵌套列表

[英]python: creating nested Lists within for loops

I want to work on a csv file, the outputs I want are the number of different values per column (this should be in unique_list ) and the datatype in a column (in 'types_list') What I have so far is a nested loop that:我想处理 csv 文件,我想要的输出是每列不同值的数量(这应该在unique_list中)和列中的数据类型(在'types_list'中)到目前为止我有一个嵌套循环:

  1. for unique_list : returns me a list with all the unique values, I was trying to solve this by creating another list that is in each iteration filled with the respective unique column items as another list so I could in another step count the items per list in a list but so far I have failed to implement that对于unique_list :返回一个包含所有唯一值的列表,我试图通过创建另一个列表来解决这个问题,该列表在每次迭代中填充了相应的唯一列项目作为另一个列表,这样我可以在另一个步骤中计算每个列表中的项目一个列表,但到目前为止我还没有实现

  2. for types_list : here I want to achieve pretty much the same thing, a list of lists where each 'sub-list' contains the datatypes of one column - I tried this as can be seen in the code but what I get as a result is a list of lists where the sub list does contain the datattypes of one column but this is repeated multiple times instead of just once.对于types_list :在这里我想实现几乎相同的事情,一个列表列表,其中每个“子列表”包含一列的数据类型 - 我尝试了这个,可以在代码中看到,但我得到的结果是一个列表列表,其中子列表确实包含一列的数据类型,但这会重复多次而不是一次。 In the next step here I would want to loop over each list to check whether the datatypes in a sublist are all the same, and if so, append the respective type to a list (and if they are not the same, append 'object' to this list).在下一步中,我想遍历每个列表以检查子列表中的数据类型是否都相同,如果是,则 append 将相应类型添加到列表中(如果它们不同,则 append 'object'到这个列表)。

I know this might be easier using pandas etc. but I want to use pure python for this我知道使用 pandas 等可能会更容易,但我想为此使用纯 python


with open(filePath,'r') as f:
        reader = csv.reader(f)
      
l=list(reader)
rows = len(l)-1 #counts how many rows there are in the CSV, -1 to exclude the header 
columns = len(l[0]) #the number of columns is given by the number of objects in the header list, at least in a clean CSV
without_header = l[1:] #returns the csv list without the header
        
unique_list = []
types_list = []
looping_list = []
for x in range(0,columns):
    looping_list = [item[x] for item in without_header]
    worklist = []
        for b in looping_list: 
            try: #here i'm trying if the value in the CSV file could be an integer just in case it isn't recognised as one
                int(b)
                worklist.append('int')
                types_list.append(worklist)
            except: 
                worklist.append(type(b))
                types_list.append(worklist)

    
    for n in looping_list: 
        if n not in unique_list:
            unique_list.append(n)

As an example, for this CSV:例如,对于这个 CSV:

Position,Experience in Years,Salary
Middle Management,5,5000
Lower Management,2,3000
Upper Management,1,7000
Middle Management,5,5000
Middle Management,7,7000
Upper Management,10,12000
Lower Management,2,2000
Middle Management,5,500
Upper Management,7, NoAnswer

I want unique_list to return [3,5,7] and types_list to return [str,int,object]我希望 unique_list 返回 [3,5,7] 和 types_list 返回 [str,int,object]

The reading from the file should be inside the 'with' statement, if not the file is already closed, and reading from it raises an exception.从文件读取应该在'with'语句中,如果不是,文件已经关闭,从它读取会引发异常。

with open(filePath, 'r') as f:
    reader = csv.reader(f)
    l = list(reader)

For the type_list: you use the string 'int' to represent an int, but use the type class 'str' to represent a string.对于 type_list:您使用字符串 'int' 来表示一个 int,但使用类型 class 'str' 来表示一个字符串。 I think you should consistently use one or the other, ie use the type class int to represent int object.我认为您应该始终使用其中一种,即使用 class int 类型来表示 int object。

In the nested loop you append your worklist for every iteration on a column item, shouldnt you only do that after you are done looping over the column?在嵌套循环中,您 append 您的工作列表在列项目上的每次迭代中,您不应该只在完成对列的循环后才这样做吗? that is after the nested loop has finished.那是在嵌套循环完成之后。

for x in range(0, columns):
    looping_list = [item[x] for item in without_header]
    worklist = []
    for b in looping_list:
        try:
            int(b)
            worklist.append(int)
        except:
            worklist.append(type(b))
    types_list.append(worklist)

To unite each sublist to 1 value, we can convert our sublist to a Set.要将每个子列表合并为 1 个值,我们可以将子列表转换为 Set。 A Set removes duplicated items, so if its length is 1 we know the sublist contained only 1 unique item. Set 删除重复项,因此如果它的长度为 1,我们知道子列表仅包含 1 个唯一项。

# uniting the sublist into 1 value
new_types_list = []
for sub_list in types_list:
    if len(set(sub_list)) == 1:
        # if all items in the sublist are the same
        # use the first value in the list
        new_types_list.append(sub_list[0])
    else:
        # they are not all the same
        new_types_list.append(object)

For unique_list: you are trying to use a variable that was created inside the loop in which you iterated over the columns, so it only contains the items from the last column.对于 unique_list:您正在尝试使用在循环中创建的变量,您在其中迭代列,因此它仅包含最后一列中的项目。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM