[英]Check if elements from different lists are in df column and append to another column
I have a df like this:我有一个这样的 df:
Casa![]() |
Name![]() |
Clase_jfs ![]() |
Categoria![]() |
---|---|---|---|
Just_For_Sports![]() |
mochila reebok active ![]() |
ACCESORIOS![]() |
mochila![]() |
Just_For_Sports![]() |
tubo lejopi de pelotas softee ![]() |
ACCESORIOS![]() |
tubo![]() |
Just_For_Sports![]() |
pack de medias puma x2 ![]() |
ACCESORIOS![]() |
pack![]() |
Just_For_Sports![]() |
gorro adidas de natación 3 rayas ![]() |
ACCESORIOS![]() |
natacion![]() |
And 27 different Lists like these:还有 27 个不同的列表,如下所示:
MODA=['mochila','wear', 'urban', 'pack']
TENIS=['tubo', 'raqueta','red']
NATACION=['natacion', 'pileta','tapon']
on the other hand I have an empty list:另一方面,我有一个空列表:
intermedia1=[]
this is my current script:这是我当前的脚本:
for element in df_JFS['Categoria']:
if element in VOLEY:
intermedia1.append('VOLEY')
elif element in UNIFORMES:
intermedia1.append('UNIFORMES')
elif element in TREKKING_OUTDOOR_ADVENTURE:
intermedia1.append('TREKKING_OUTDOOR_ADVENTURE')
elif element in TRAINING:
intermedia1.append('TRAINING')
elif element in TENIS:
intermedia1.append('TENIS')
elif element in SURF:
intermedia1.append('SURF')
elif element in SQUASH:
intermedia1.append('SQUASH')
elif element in SKATEBOARD:
intermedia1.append('SKATEBOARD')
elif element in RUNNING:
intermedia1.append('RUNNING')
elif element in RUGBY:
intermedia1.append('RUGBY')
elif element in PING_PONG:
intermedia1.append('PING_PONG')
elif element in PESAS:
intermedia1.append('PESAS')
elif element in PADDLE:
intermedia1.append('PADDLE')
elif element in NATACION:
intermedia1.append('NATACION')
elif element in MODA:
intermedia1.append('MODA')
elif element in INFANTIL:
intermedia1.append('INFANTIL')
elif element in HOCKEY:
intermedia1.append('HOCKEY')
elif element in HANDBALL:
intermedia1.append('HANDBALL')
elif element in GOLF:
intermedia1.append('GOLF')
elif element in FUTBOL:
intermedia1.append('FUTBOL')
elif element in FRONTON:
intermedia1.append('FRONTON')
elif element in CICLISMO:
intermedia1.append('CICLISMO')
elif element in BASQUET:
intermedia1.append('BASQUET')
elif element in BASICOS:
intermedia1.append('BASICOS')
elif element in BASEBALL_SOFTBALL:
intermedia1.append('BASEBALL_SOFTBALL')
elif element in ARTES_MARCIALES_Y_BOX:
intermedia1.append('ARTES_MARCIALES_Y_BOX')
elif element in AEROBICS_Y_FITNESS:
intermedia1.append('AEROBICS_Y_FITNESS')
else:
intermedia1.append('OTROS')
df_JFS['Categoria']=intermedia1
How can it be done efficiently?如何才能高效完成?
output should look like this: output 应如下所示:
Casa![]() |
Name![]() |
Clase_jfs ![]() |
Categoria![]() |
---|---|---|---|
Just_For_Sports![]() |
mochila reebok active ![]() |
ACCESORIOS![]() |
MODA ![]() |
Just_For_Sports![]() |
tubo lejopi de pelotas softee ![]() |
ACCESORIOS![]() |
TENIS![]() |
Just_For_Sports![]() |
pack de medias puma x2 ![]() |
ACCESORIOS![]() |
MODA ![]() |
Just_For_Sports![]() |
gorro adidas de natación 3 rayas ![]() |
ACCESORIOS![]() |
NATACION![]() |
df['Categoria'] value, should be the name of the list where the word was found df['Categoria'] 值,应为找到单词的列表的名称
Thanks!谢谢!
Not sure about the time efficiency, but if you want to prevent boilerplate coding, you can use apply
function along with a few other steps:不确定时间效率,但如果你想防止样板代码,你可以使用
apply
function 以及其他一些步骤:
import pandas as pd
# Defining the lists of data(rest of the code)
# .
# .
myDict ={'MODA':MODA, "TENIS":TENIS, "NATACION":NATACION}
def search(valueToSearch):
for key, valuesList in myDict.items():
if valueToSearch in valuesList:
return key
return "Not Found"
df["Categoria"] = df["Categoria"].apply(search)
df
Casa![]() |
Name![]() |
Clase_jfs ![]() |
Categoria![]() |
|
---|---|---|---|---|
0 ![]() |
Just_For_Sports![]() |
mochila reebok active ![]() |
ACCESORIOS![]() |
MODA ![]() |
1 ![]() |
Just_For_Sports![]() |
tubo lejopi de pelotas softee ![]() |
ACCESORIOS![]() |
TENIS![]() |
2 ![]() |
Just_For_Sports![]() |
pack de medias puma x2 ![]() |
ACCESORIOS![]() |
MODA ![]() |
3 ![]() |
Just_For_Sports![]() |
gorro adidas de natación 3 rayas ![]() |
ACCESORIOS![]() |
NATACION![]() |
Note that, you should define the myDict
as shown above.请注意,您应该如上所示定义
myDict
。 If you have any other list, you should define them in myDict
variable in the same way.如果您有任何其他列表,您应该以相同的方式在
myDict
变量中定义它们。
There are few approaches I would suggests我建议的方法很少
The complexity of finding something in a list is O(n)
.在列表中查找内容的复杂度为
O(n)
。 it optimise that you can use a set instead which is O(1)
.它优化了你可以使用一个集合而不是
O(1)
。
MODA = set(['mochila', 'wear', 'urban', 'pack'])
If all of the value of all the list is unique, you can create a dict
that map values to key.如果所有列表的所有值都是唯一的,则可以创建一个
dict
,将 map 个值作为键值。 You can just write a loop to map value to key the result should be like below:你可以只写一个循环到 map 值来键入结果应该如下所示:
{
'mochila': "MODA",
'wear': "MODA",
'urban': "MODA",
'pack': "MODA",
...
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.