简体   繁体   中英

If list of lists values are present in Pandas dataframe column replace them with values from another Pandas column

I have a list of lists with word tockens of the following type:

[['java_developer'],
['ETL', 'database_administrator'],
...
['web-developer', 'c#', 'ms_sql']]

Also I have a key-value pandas dataframe, where the first column key and the second one is value. Eg:

     Key                      Value
0    java_developer           java
1    web-developer            web
2    database_administrator   database
3    ETL                      ETL
4    ms_sql                   database
... ... ...
100  c#                       c#

I want to receive a list of lists of the folowing type:

[['java'],
['ETL', 'database'],
...
['web', 'c#', 'database']]

How it can be implemented?

Use get for add some value for missing values from DataFrame like None :

#added val to last sublist for better sample
L = [['java_developer'],
['ETL', 'database_administrator'],
['web-developer', 'c#', 'ms_sql', 'val']]

#create dictionary from DataFrame
d = df.set_index('Key')['Value'].to_dict()
print (d)
{'java_developer': 'java', 'web-developer': 'web', 
 'database_administrator': 'database', 'ETL': 'ETL', 
 'ms_sql': 'database', 'c#': 'c#'}

#in nested list comprehension repalce by dict
L1 = [[d.get(y, None) for y in x] for x in L]
print (L1)
[['java'], ['ETL', 'database'], ['web', 'c#', 'database', None]]

Or remove not matched values add filtering:

L1 = [[d.get(y) for y in x if y in d] for x in L]
print (L1)
[['java'], ['ETL', 'database'], ['web', 'c#', 'database']]

And if need same values for not exist in dictionary:

L1 = [[d.get(y, y) for y in x] for x in L]
print (L1)
[['java'], ['ETL', 'database'], ['web', 'c#', 'database', 'val']]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM