I have a list:
list = ['United Kingdom', 'Berlin', 'italy']
and a DataFrame:
location
0 London, United Kingdom
1 BerlinGerman
2 Rome,Italy
So what I need to do here is to create a new column in the dataframe that only consist of the word in the list. So the new column should look like this:
location new_col
0 London, United Kingdom United Kingdom
1 BerlinGerman Berlin
2 Rome,Italy italy
How can I do that?
You could define a function to search and translate the 'long name' to the shorter name from the list, and use apply it onto a new column in the dataframe!
def search(row):
mylist = ['United Kingdom', 'Berlin', 'italy']
for i in range(len(mylist)):
if mylist[i].lower() in row['location'].lower():
return mylist[i]
return ""
df['new_col'] = df.apply(lambda row: search(row), axis=1)
Original dataframe:
location
0 London, United Kingdom
1 BerlinGerman
2 Rome,Italy
3 Singapore
Resulting dataframe:
location new_col
0 London, United Kingdom United Kingdom
1 BerlinGerman Berlin
2 Rome,Italy italy
3 Singapore
Note that the function returns an empty string if the search yields no results, in this case, for the "Singapore" row.
I don't know any library that does anything like that, so I would just make the program. I'll let you try to develop your own program (the goal is to learn:P), here are some advice if you are stuck:
Try first to get the sub-string (from list
) matching a given location, by implementing for example a function getWord(location:str, mylist:list)
such that:
getWord('London, United Kingdom', list) # Gives 'United Kingdom'
getWord('BerlinGerman', list) # Gives 'Berlin'
# and so on...
Once this is done, you simply need to do a new column containing the result of this function.
To make this function, for each element of the list you'll have to check if it is a substring of the location. You can use for example a generator for this. Here is an example of usage:
matches = [x for x in mylist if x < 2] # filter all elements of mylist that are < 2
Just by replacing the if x < 2
by something a bit smart, most of your function is done;-)
Note that if you want italy
to match Italy
(even through one has a capital letter), it is a good idea to use .lower()
.
Sometimes you might have problems if no string of the list matches or multiple ones matches. If this kind of situation may happen, think of it. For example, you can store a list of all substrings that match in the second column instead of a string, or give a default string in case there is no match and the first match in case of multiple matches.
Assuming that you forgot the capital letter I on Italy
, you can create new_col
with
import pandas as pd
import re
list = ['United Kingdom', 'Berlin', 'Italy']
df = pd.DataFrame({'location': ['London, United Kingdom', 'BerlinGerman', 'Rome,Italy']})
df['new_col'] = df['location'].apply(lambda x: re.findall('|'.join(list), x)[0])
Output
location new_col
0 London, United Kingdom United Kingdom
1 BerlinGerman Berlin
2 Rome,Italy Italy
import pandas as pd
list1 = ['United Kingdom', 'Berlin', 'italy']
data= {'location' : [['London', 'United Kingdom'], ['Berlin', 'Germany'], ['Rome', 'italy']]}
df = pd.DataFrame(data=data)
df['new_col'] = 'mutual'
for i in range(len(df['location'])):
for ele in list1:
if ele in df['location'][i]:
df['new_col'][i] = ele
else:
continue
print(df)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.