简体   繁体   中英

python: creating nested Lists within for loops

I want to work on a csv file, the outputs I want are the number of different values per column (this should be in unique_list ) and the datatype in a column (in 'types_list') What I have so far is a nested loop that:

  1. for unique_list : returns me a list with all the unique values, I was trying to solve this by creating another list that is in each iteration filled with the respective unique column items as another list so I could in another step count the items per list in a list but so far I have failed to implement that

  2. for types_list : here I want to achieve pretty much the same thing, a list of lists where each 'sub-list' contains the datatypes of one column - I tried this as can be seen in the code but what I get as a result is a list of lists where the sub list does contain the datattypes of one column but this is repeated multiple times instead of just once. In the next step here I would want to loop over each list to check whether the datatypes in a sublist are all the same, and if so, append the respective type to a list (and if they are not the same, append 'object' to this list).

I know this might be easier using pandas etc. but I want to use pure python for this


with open(filePath,'r') as f:
        reader = csv.reader(f)
      
l=list(reader)
rows = len(l)-1 #counts how many rows there are in the CSV, -1 to exclude the header 
columns = len(l[0]) #the number of columns is given by the number of objects in the header list, at least in a clean CSV
without_header = l[1:] #returns the csv list without the header
        
unique_list = []
types_list = []
looping_list = []
for x in range(0,columns):
    looping_list = [item[x] for item in without_header]
    worklist = []
        for b in looping_list: 
            try: #here i'm trying if the value in the CSV file could be an integer just in case it isn't recognised as one
                int(b)
                worklist.append('int')
                types_list.append(worklist)
            except: 
                worklist.append(type(b))
                types_list.append(worklist)

    
    for n in looping_list: 
        if n not in unique_list:
            unique_list.append(n)

As an example, for this CSV:

Position,Experience in Years,Salary
Middle Management,5,5000
Lower Management,2,3000
Upper Management,1,7000
Middle Management,5,5000
Middle Management,7,7000
Upper Management,10,12000
Lower Management,2,2000
Middle Management,5,500
Upper Management,7, NoAnswer

I want unique_list to return [3,5,7] and types_list to return [str,int,object]

The reading from the file should be inside the 'with' statement, if not the file is already closed, and reading from it raises an exception.

with open(filePath, 'r') as f:
    reader = csv.reader(f)
    l = list(reader)

For the type_list: you use the string 'int' to represent an int, but use the type class 'str' to represent a string. I think you should consistently use one or the other, ie use the type class int to represent int object.

In the nested loop you append your worklist for every iteration on a column item, shouldnt you only do that after you are done looping over the column? that is after the nested loop has finished.

for x in range(0, columns):
    looping_list = [item[x] for item in without_header]
    worklist = []
    for b in looping_list:
        try:
            int(b)
            worklist.append(int)
        except:
            worklist.append(type(b))
    types_list.append(worklist)

To unite each sublist to 1 value, we can convert our sublist to a Set. A Set removes duplicated items, so if its length is 1 we know the sublist contained only 1 unique item.

# uniting the sublist into 1 value
new_types_list = []
for sub_list in types_list:
    if len(set(sub_list)) == 1:
        # if all items in the sublist are the same
        # use the first value in the list
        new_types_list.append(sub_list[0])
    else:
        # they are not all the same
        new_types_list.append(object)

For unique_list: you are trying to use a variable that was created inside the loop in which you iterated over the columns, so it only contains the items from the last column.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM