I have a list which consists of a different colours, all stored as string variables.
Preferredcolours = ['red','yellow','green', 'blue']
I have a panda array, which contains information about cars. One of the column DfCar['colour'] consists of the colours of these cars. I want to create a new variable in my data frame, column named PreferredMathcing which =1 if the DataFrame colour column matches with one of the list colours. How can I use a for loop to solve this?
I would ideally want this sort of a solution:
+=================+============================+
| DfCar['colour'] | DfCar['PreferredMathcing'] |
+=================+============================+
| white | 0 |
+-----------------+----------------------------+
| yellow | 1 |
+-----------------+----------------------------+
| black | 0 |
+-----------------+----------------------------+
| purple | 0 |
+-----------------+----------------------------+
| green | 1 |
+-----------------+----------------------------+
you can use .isin() , which returns a Series with True
/ False
for each row based on if it is in a list of values. then use .astype(int)
to get your 1
/ 0
instead.
try this:
import pandas as pd
import numpy as np
df = pd.DataFrame.from_dict({'colour': ['white', 'yellow', 'black', 'purple', 'green']})
Preferredcolours = ['red','yellow','green', 'blue']
df["PreferredMathcing"] = df['colour'].isin(Preferredcolours).astype(int)
print(df)
output:
colour PreferredMathcing
0 white 0
1 yellow 1
2 black 0
3 purple 0
4 green 1
NOTE:
choosing a solution with a pure library function will likely out-perform a solution using apply
with custom python logic.
bench-marking those against each other on my machine suggests .isin()
is almost x8 faster:
with '.isin()': 1.0591506958007812
with '.apply()': 8.234664678573608
ratio: 7.774780974248154
following will give you output
def check_colour(x, Preferredcolours) :
return 1 if x['colour'] in Preferredcolours else 0
dfCar['PreferredMathcing'] = df.apply(check_colour,args=(Preferredcolours,), axis=1)
You can use np.where like below:
import pandas as pd
import numpy as np
DfCar = pd.DataFrame.from_dict({'colour': ['white', 'yellow', 'black', 'purple', 'green']})
Preferredcolours = ['red','yellow','green', 'blue']
DfCar['PreferredMathcing'] = np.where(DfCar['colour'].isin(Preferredcolours), 1, 0)
Assuming DfCar
is your Dataframe.
Preferredcolours = ['red','yellow','green', 'blue']
DfCar['PreferredMatching'] = DfCar['colour'].apply(lambda x: x in Preferredcolours)
This will apply the lambda function over every element in your "colour" column. Simply check if it is in "preferredcolours" and return True or False.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.