简体   繁体   中英

How to generate variables automatically using values from a list elements using for-loop?

Suppose I want to generate index for a large header row automatically using forloop, to prevent writing index for each header.

In a file, I have say a header with lots of fruits name. Each column has a data which I have to access using index for downstream parsing. Rather than preparing index for each fruit name, I want to run a forloop to create the index values on fly to save time.

data = 

      apple                     banana              orange
      genus:x,species:b    genus:x,species:b     genus:x,species:b
      genus:x,species:b    genus:x,species:b     genus:x,species:b
      variety:gala,pinklady,...  variety:wild,hybrid...   variety:florida,venz,
      flavors:tangy,tart,sweet..
      global_consumption:....
      pricePerUnit:...
      seedstocks:.....
      insect_resistance:.....
      producer:....


# first I convert the header into list like this:

for lines in data:
    if 'apple' in lines:
        fruits = lines.split('\t')
        # this will give me header as list:
        # ['apple', 'banana', 'orange']

        # then create the index as:           
        for x in fruits:
            str(x) + '_idx' = fruits.index(x)  
            # this is where the problem is for me .. !??   
            # .. because this is not valid python method
            print(x)

            # if made possible, new variable are created as
            apple_idx = 0, banana_idx = 1 ... so on

# Now, start mining your data for interested fruits
     data = lines.split('\t')
     apple_values = data[apple_idx]
     for values in apple_values:
          do something ......

     same for others. I also need to do several other things.

Make sense?? 

How can this be made possible? in a very simply way.

Post Edit: After doing a lots of reading, I realized that it is possible to create a variable_name using value(string) of another varible in bash :

how to use a variable's value as other variable's name in bash

https://unix.stackexchange.com/questions/98419/creating-variable-using-variable-value-as-part-of-new-variable-name

But, not possible in python as I had thought. My gut feeling is that, it is possible to prepare this method within python programming language (if hacked or if author decided), but it is also possible that author of python thought and knew about possible dangers or using this method.

  • The danger is that you always want variable_name to be visible in the written python script. Preparing a dynamic variable_names would have been nice, but it could lead to a problem when tracing back, if any problem arose.
  • Since, the variable name was never typed in it would be a nightmare to track and debug if any problem arose (especially in large programme), say when the variable_value was like 2BetaTheta or *ping^pong which is not a valid variable_name.This is my thought. Please other people can chime in as to Why this capability was not introduced in python?
  • Dict method over comes this issue since we have the record of the origin of the variable_name , but still the issue with valid vs. invalid variable_name doesn't go away.

I am going to take some the provided answer using dict method and see if I can work out a very simple-comprehensive way of making this possible.

Thanks everyone !

Hopefully the code below will give you some ideas on ways that you might move forward. There are actually better ways than these to do some of these things, but for a beginner it is best to learn the basics first. Mind you: there's nothing really WRONG with the code below, but it could be a lot shorter and even more usable if we used some more advanced concepts.

# get the headers from the first line out of the data
# this won't work if the headers are not on the first line
fruits = data[0].split('\t')

# now you have this list, as before
>>> ['apple', 'banana', 'orange']

# make a dictionary that will hold a data list
# for each fruit; these lists will be empty to start
# each fruit's list will hold the data appearing on 
# each line in the data file under each header
data_dict = dict()
for fruit in data_dict:
    data_dict[fruit] = [] # an empty list

# now you have a dictionary that looks like this
>>> {'apple': [], 'banana': [], 'orange': []}

# you can access the (now empty) lists this way
>>> data_dict['apple']
[]

# now use a for loop to go through the data, but skip the 
# first line which you already handled
for lines in data[1:]:
    values = lines.split('\t')
    # append the values to the end of the list for each 
    # fruit. use enumerate so you know the index number
    for idx,fruit in enumerate(fruits):
        data_dict[fruit].append(values[idx])

# now you have the data dictionary that looks like this
>>> {'apple': ['genus:x,species:b', 'genus:x,species:b'], 
     'banana': ['genus:x,species:b', 'genus:x,species:b'], 
     'orange': ['genus:x,species:b', 'genus:x,species:b']}

print("<<here's some interesting data about apples>>")
# Mine the data_dict for interesting fruits this way
data_list = fruits['apple']
for data_line in data_list:
    genus_and_species = data_line.split(',')
    genus = genus_and_species[0].split(':')[1] 
    species = genus_and_species[1].split(':')[1] 
    print("\tGenus: ",genus,"\tSpecies: ",species)

If you want to have a look at ALL the fruits (in the original order as before), you can do that this way:

for fruit in fruits:
    data_list = data_dict[fruit]
    for data_line in data_list:
        print(data_line)

If you don't care about the order ( dicts do not have order*), you can forget about your fruits list and just loop over the data dictionary itself:

for fruit in data_dict:
    print(fruit)

OR to get the values (the data lists), use values ( viewvalues in Python 2.7):

for data_list in data_dict.values():
    print(data_list)

OR to get both the keys (fruits) and the values, use items ( viewitems in Python 2.7):

for fruit,data_list in data_dict.items():
    print(data_list)

TIP: if you want to mutate (change) the dictionary, DO NOT use for fruit in data_dict: . Instead, you need to make sure you use the values , items , or keys ( viewkeys in Python 2.7) methods. If you don't, you will have problems:

for fruit in data_dict.keys():
    # remove it
    data_dict.pop(fruit)

* Quick note: dict s have been undergoing some changes and it is very likely you will be allowed to assume that they will actually remember their order in the upcoming next version of Python (3.7).

EDIT: now that the question has been edited I'll provide a much more useful answer later if I have time.

I don't fully understand what it is you actually are trying to do, but here are some things that might help.

The thing to recognize is you already have an object that has all the information you are after in it: a list with all the object names. By its very nature, your list of names already has the indexes in it. The data exists; it is there. What you need to do is learn to access this information the right way.

What you probably need is the enumerate function . This function generates a two tuple (which is a pair of objects) that contain the list indexes and the contents of the list as you go:

for idx,fruit in enumerate(fruits): 
    print(fruit+'_idx: ', idx)

There is no reason to STORE these indexes in some other data structure; THEY ARE ALREADY IN your list.

If you insist that you want to access some arbitrary value by some name (a string), you should do that with a dictionary, or dict :

fruit_dict = dict()
fruit_dict['apple'] = 1

However, since you are after the index values, this seems a little bit odd to do because a dict by its very nature is intended to be un-ordered. And as I have said, you already KNOW the indexes in your list. Storing indexes with the names a second time most likely makes little sense, although there may be situations where you'd want to do it.

The built-in functions exec and eval are relevant here.

From the Python documentation :

  • eval : "The expression argument is parsed and evaluated as a Python expression"
  • exec : "This function supports dynamic execution of Python code"

Really, you only need exec for your problem, as follows:

for fruit in fruits: exec('{0}_idx = fruits.index("{0}")'.format(fruit))

(Notice that we need quotes in the second {} , since otherwise Python will think that you are trying to get the index of some variable named apple , rather than passing it the string 'apple' .

If you now type apple_idx (for example) into your console, it should return 0 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM