why does my code if elif else apply to all?

Question

can I have solution for this let say, I have this

df['Location'] *run and i got this

0           New York, NY
1          Chantilly, VA
2             Boston, MA
3             Newton, MA
4           New York, NY
             ...        
667         Fort Lee, NJ
668    San Francisco, CA
669        Irwindale, CA
670    San Francisco, CA
671         New York, NY
Name: Location, Length: 659, dtype: object

then I want to make it simplified by if it contain Nwe York, NY then I want it become NY. If it contain Boston, MA then I want it become MA. Etc

so I write this code

def clean_location_1(x):
    if 'CA':
        return 'CA'
    elif 'NY':
        return 'NY'
    elif 'DC':
        return 'DC'
    elif 'MA':
        return 'MA'
    elif 'IL':
        return 'IL'
    elif 'VA':
        return 'VA'
    else:
        return 'others'


df['Location'] = df['Location'].apply(clean_location_1)

but, when I run my script, all the Location become CA

how can I solve this?

Answer 1

One of possible solutions of solving this using your approach is the following. import pandas as pd

data = pd.DataFrame([{'location': 'New York, NY'},
                     {'location': 'Chantilly, VA'},
                     {'location': 'Boston, MA'},
                     {'location': 'Newton, MA'},
                     {'location': 'San Francisco, CA'}])

def clean_location_1(x):
    if 'CA' in x:
        return 'CA'
    elif 'NY' in x:
        return 'NY'
    elif 'DC' in x:
        return 'DC'
    elif 'MA' in x:
        return 'MA'
    elif 'IL' in x:
        return 'IL'
    elif 'VA' in x:
        return 'VA'
    else:
        return 'others'

data['location'].apply(clean_location_1)

Your problem was incorrect condition in the if/else block.

Another way of doing this might be.

list_states = ['CA', 'NY', 'DC', 'MA', 'IL', 'VA']
data['location'].apply(lambda x: x.split(' ')[-1] if x.split(' ')[-1] in list_states else 'others')

Then you won't need a huge if/else block.

Answer 2

When you write if 'CA' it doesn't mean much, you have to check the value.

This should do it using pd.Series.str.contains :

def clean_location_1(x):
    if x.str.contains('CA'):
        return 'CA'
    elif x.str.contains('NY'):
        return 'NY'
    elif x.str.contains('DC'):
        return 'DC'
    elif x.str.contains('MA'):
        return 'MA'
    elif x.str.contains('IL'):
        return 'IL'
    elif x.str.contains('VA'):
        return 'VA'
    else:
        return 'others'

Answer 3

The problem is simple. You are not comparing the string with x. And 'CA' will always return true as non empty strings are truthy. That is why everything changes to CA

Answer 4

Doing

if "<str>":

always returns True and that means in your code, it will always return CA . So, you can try this, check if x is in <word> or not.

def clean_location_1(x):
    if 'CA' in x:
        return 'CA'
    elif 'NY' in x:
        return 'NY'
    elif 'DC' in x:
        return 'DC'
    elif 'MA' in x:
        return 'MA'
    elif 'IL' in x:
        return 'IL'
    elif 'VA' in x:
        return 'VA'
    else:
        return 'others'
df['Location'] = df['Location'].apply(clean_location_1)

Or you can try this, which is easy, clean and simple:

check=["CA","NY","DC","MA","IL","VA"]
def clean_location_1(x):
    y=x.rsplit(", ",1)[1]
    if y in check:
        return y
    else:
        return "others"

df['Location'] = df['Location'].apply(clean_location_1)

Here we are creating the list of short form of locations, as you did in every if-else statements and storing that in check and checking that if x has values of check or not.

Or one-liner solution, same as second approach but in one line:

check=["CA","NY","DC","MA","IL","VA"]
df['Location'] = df['Location'].apply(lambda x: x.rsplit(", ",1)[1] if x.rsplit(", ",1)[1] in check else "others")

Answer 5

You can do:

states = ['CA', 'NY', 'DC', 'MA', 'IL', 'VA']
df['State'] = df['Location'].str.split(', ', expand=True)[1] \
                             .rename('State').to_frame().query('State in @states')
df['State'] = df['State'].fillna('other')

>>> df
            Location  State
0       New York, NY     NY
1      Chantilly, VA     VA
2         Boston, MA     MA
3         Newton, MA     MA
4       New York, NY     NY
5       Fort Lee, NJ  other
6  San Francisco, CA     CA
7      Irwindale, CA     CA
8  San Francisco, CA     CA
9       New York, NY     NY

why does my code if elif else apply to all?

Question

5 answers

solution1
1 ACCPTED 2021-07-25 12:56:37

solution2
0 2021-07-25 12:50:12

solution3
0 2021-07-25 12:51:00

solution4
0 2021-07-25 12:51:49

solution5
0 2021-07-25 13:04:21

why does my code if elif else apply to all?

Question

5 answers

solution1 1 ACCPTED 2021-07-25 12:56:37

solution2 0 2021-07-25 12:50:12

solution3 0 2021-07-25 12:51:00

solution4 0 2021-07-25 12:51:49

solution5 0 2021-07-25 13:04:21

solution1
1 ACCPTED 2021-07-25 12:56:37

solution2
0 2021-07-25 12:50:12

solution3
0 2021-07-25 12:51:00

solution4
0 2021-07-25 12:51:49

solution5
0 2021-07-25 13:04:21