简体   繁体   中英

Searching for Multiple Word in Python DataFrame/List

I have a list:

list = ['United Kingdom', 'Berlin', 'italy']

and a DataFrame:

   location
0  London, United Kingdom
1  BerlinGerman
2  Rome,Italy

So what I need to do here is to create a new column in the dataframe that only consist of the word in the list. So the new column should look like this:

   location               new_col
0  London, United Kingdom United Kingdom
1  BerlinGerman           Berlin
2  Rome,Italy             italy

How can I do that?

You could define a function to search and translate the 'long name' to the shorter name from the list, and use apply it onto a new column in the dataframe!

def search(row):
    mylist = ['United Kingdom', 'Berlin', 'italy']
    for i in range(len(mylist)):
        if mylist[i].lower() in row['location'].lower():
            return mylist[i]
    return ""

df['new_col'] = df.apply(lambda row: search(row), axis=1)

Original dataframe:

                 location
0  London, United Kingdom
1            BerlinGerman
2              Rome,Italy
3               Singapore

Resulting dataframe:

                 location         new_col
0  London, United Kingdom  United Kingdom
1            BerlinGerman          Berlin
2              Rome,Italy           italy
3               Singapore

Note that the function returns an empty string if the search yields no results, in this case, for the "Singapore" row.

I don't know any library that does anything like that, so I would just make the program. I'll let you try to develop your own program (the goal is to learn:P), here are some advice if you are stuck:


Try first to get the sub-string (from list ) matching a given location, by implementing for example a function getWord(location:str, mylist:list) such that:

getWord('London, United Kingdom', list) # Gives 'United Kingdom'
getWord('BerlinGerman', list) # Gives 'Berlin'
# and so on...

Once this is done, you simply need to do a new column containing the result of this function.


To make this function, for each element of the list you'll have to check if it is a substring of the location. You can use for example a generator for this. Here is an example of usage:

matches = [x for x in mylist if x < 2] # filter all elements of mylist that are < 2

Just by replacing the if x < 2 by something a bit smart, most of your function is done;-)
Note that if you want italy to match Italy (even through one has a capital letter), it is a good idea to use .lower() .


Sometimes you might have problems if no string of the list matches or multiple ones matches. If this kind of situation may happen, think of it. For example, you can store a list of all substrings that match in the second column instead of a string, or give a default string in case there is no match and the first match in case of multiple matches.

Assuming that you forgot the capital letter I on Italy , you can create new_col with

import pandas as pd
import re

list = ['United Kingdom', 'Berlin', 'Italy']
df = pd.DataFrame({'location': ['London, United Kingdom', 'BerlinGerman', 'Rome,Italy']})

df['new_col'] = df['location'].apply(lambda x: re.findall('|'.join(list), x)[0])

Output

                 location         new_col
0  London, United Kingdom  United Kingdom
1            BerlinGerman          Berlin
2              Rome,Italy           Italy
import pandas as pd

list1 = ['United Kingdom', 'Berlin', 'italy']
data= {'location' : [['London', 'United Kingdom'], ['Berlin', 'Germany'], ['Rome', 'italy']]}
df = pd.DataFrame(data=data)
df['new_col'] = 'mutual'

for i in range(len(df['location'])):
    for ele in list1:
        if ele in df['location'][i]:
            df['new_col'][i] = ele
        else:
            continue
print(df)

You can simply assign the list to the column. Original data frame在此处输入图像描述

After assigning to new column

a = ['United Kingdom', 'Berlin', 'italy']
df['new_col'] = a

在此处输入图像描述

After update

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM