I have a df like this:
Casa | Name | Clase_jfs | Categoria |
---|---|---|---|
Just_For_Sports | mochila reebok active | ACCESORIOS | mochila |
Just_For_Sports | tubo lejopi de pelotas softee | ACCESORIOS | tubo |
Just_For_Sports | pack de medias puma x2 | ACCESORIOS | pack |
Just_For_Sports | gorro adidas de natación 3 rayas | ACCESORIOS | natacion |
And 27 different Lists like these:
MODA=['mochila','wear', 'urban', 'pack']
TENIS=['tubo', 'raqueta','red']
NATACION=['natacion', 'pileta','tapon']
on the other hand I have an empty list:
intermedia1=[]
this is my current script:
for element in df_JFS['Categoria']:
if element in VOLEY:
intermedia1.append('VOLEY')
elif element in UNIFORMES:
intermedia1.append('UNIFORMES')
elif element in TREKKING_OUTDOOR_ADVENTURE:
intermedia1.append('TREKKING_OUTDOOR_ADVENTURE')
elif element in TRAINING:
intermedia1.append('TRAINING')
elif element in TENIS:
intermedia1.append('TENIS')
elif element in SURF:
intermedia1.append('SURF')
elif element in SQUASH:
intermedia1.append('SQUASH')
elif element in SKATEBOARD:
intermedia1.append('SKATEBOARD')
elif element in RUNNING:
intermedia1.append('RUNNING')
elif element in RUGBY:
intermedia1.append('RUGBY')
elif element in PING_PONG:
intermedia1.append('PING_PONG')
elif element in PESAS:
intermedia1.append('PESAS')
elif element in PADDLE:
intermedia1.append('PADDLE')
elif element in NATACION:
intermedia1.append('NATACION')
elif element in MODA:
intermedia1.append('MODA')
elif element in INFANTIL:
intermedia1.append('INFANTIL')
elif element in HOCKEY:
intermedia1.append('HOCKEY')
elif element in HANDBALL:
intermedia1.append('HANDBALL')
elif element in GOLF:
intermedia1.append('GOLF')
elif element in FUTBOL:
intermedia1.append('FUTBOL')
elif element in FRONTON:
intermedia1.append('FRONTON')
elif element in CICLISMO:
intermedia1.append('CICLISMO')
elif element in BASQUET:
intermedia1.append('BASQUET')
elif element in BASICOS:
intermedia1.append('BASICOS')
elif element in BASEBALL_SOFTBALL:
intermedia1.append('BASEBALL_SOFTBALL')
elif element in ARTES_MARCIALES_Y_BOX:
intermedia1.append('ARTES_MARCIALES_Y_BOX')
elif element in AEROBICS_Y_FITNESS:
intermedia1.append('AEROBICS_Y_FITNESS')
else:
intermedia1.append('OTROS')
df_JFS['Categoria']=intermedia1
How can it be done efficiently?
output should look like this:
Casa | Name | Clase_jfs | Categoria |
---|---|---|---|
Just_For_Sports | mochila reebok active | ACCESORIOS | MODA |
Just_For_Sports | tubo lejopi de pelotas softee | ACCESORIOS | TENIS |
Just_For_Sports | pack de medias puma x2 | ACCESORIOS | MODA |
Just_For_Sports | gorro adidas de natación 3 rayas | ACCESORIOS | NATACION |
df['Categoria'] value, should be the name of the list where the word was found
Thanks!
Not sure about the time efficiency, but if you want to prevent boilerplate coding, you can use apply
function along with a few other steps:
import pandas as pd
# Defining the lists of data(rest of the code)
# .
# .
myDict ={'MODA':MODA, "TENIS":TENIS, "NATACION":NATACION}
def search(valueToSearch):
for key, valuesList in myDict.items():
if valueToSearch in valuesList:
return key
return "Not Found"
df["Categoria"] = df["Categoria"].apply(search)
df
Casa | Name | Clase_jfs | Categoria | |
---|---|---|---|---|
0 | Just_For_Sports | mochila reebok active | ACCESORIOS | MODA |
1 | Just_For_Sports | tubo lejopi de pelotas softee | ACCESORIOS | TENIS |
2 | Just_For_Sports | pack de medias puma x2 | ACCESORIOS | MODA |
3 | Just_For_Sports | gorro adidas de natación 3 rayas | ACCESORIOS | NATACION |
Note that, you should define the myDict
as shown above. If you have any other list, you should define them in myDict
variable in the same way.
There are few approaches I would suggests
The complexity of finding something in a list is O(n)
. it optimise that you can use a set instead which is O(1)
.
MODA = set(['mochila', 'wear', 'urban', 'pack'])
If all of the value of all the list is unique, you can create a dict
that map values to key. You can just write a loop to map value to key the result should be like below:
{
'mochila': "MODA",
'wear': "MODA",
'urban': "MODA",
'pack': "MODA",
...
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.