简体   繁体   中英

Looping over a string list in Python

I have a list which consists of a different colours, all stored as string variables.

Preferredcolours = ['red','yellow','green', 'blue']

I have a panda array, which contains information about cars. One of the column DfCar['colour'] consists of the colours of these cars. I want to create a new variable in my data frame, column named PreferredMathcing which =1 if the DataFrame colour column matches with one of the list colours. How can I use a for loop to solve this?

I would ideally want this sort of a solution:

+=================+============================+
| DfCar['colour'] | DfCar['PreferredMathcing'] |
+=================+============================+
| white           |                          0 |
+-----------------+----------------------------+
| yellow          |                          1 |
+-----------------+----------------------------+
| black           |                          0 |
+-----------------+----------------------------+
| purple          |                          0 |
+-----------------+----------------------------+
| green           |                          1 |
+-----------------+----------------------------+

you can use .isin() , which returns a Series with True / False for each row based on if it is in a list of values. then use .astype(int) to get your 1 / 0 instead.

try this:

import pandas as pd
import numpy as np

df = pd.DataFrame.from_dict({'colour': ['white', 'yellow', 'black', 'purple', 'green']})
Preferredcolours = ['red','yellow','green', 'blue']

df["PreferredMathcing"] = df['colour'].isin(Preferredcolours).astype(int)

print(df)

output:

   colour  PreferredMathcing
0   white                  0
1  yellow                  1
2   black                  0
3  purple                  0
4   green                  1

NOTE:

choosing a solution with a pure library function will likely out-perform a solution using apply with custom python logic.

bench-marking those against each other on my machine suggests .isin() is almost x8 faster:

with '.isin()': 1.0591506958007812
with '.apply()': 8.234664678573608
ratio: 7.774780974248154

following will give you output

def check_colour(x, Preferredcolours) :
    return 1 if x['colour'] in Preferredcolours else 0

dfCar['PreferredMathcing'] = df.apply(check_colour,args=(Preferredcolours,), axis=1)

You can use np.where like below:

import pandas as pd
import numpy as np

DfCar = pd.DataFrame.from_dict({'colour': ['white', 'yellow', 'black', 'purple', 'green']})
Preferredcolours = ['red','yellow','green', 'blue']

DfCar['PreferredMathcing'] = np.where(DfCar['colour'].isin(Preferredcolours), 1, 0)

Assuming DfCar is your Dataframe.

Preferredcolours = ['red','yellow','green', 'blue']    
DfCar['PreferredMatching'] = DfCar['colour'].apply(lambda x: x in Preferredcolours)

This will apply the lambda function over every element in your "colour" column. Simply check if it is in "preferredcolours" and return True or False.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM