简体   繁体   中英

python, x in list and x == list[28] deliver different results

Im trying to find if some string is in a list. when using: 'if string in list' i get a false . but when im trying 'if string == list[28]' i get a true .

how come? the string is definitely in the list.

import pandas as pd
import numpy as np
import scipy.stats as stats
import re

nba_df=pd.read_csv("assets/nba.csv")
cities=pd.read_html("assets/wikipedia_data.html")[1]
cities=cities.iloc[:-1,[0,3,5,6,7,8]]
nba_df = nba_df[(nba_df['year'] == 2018)]
nba_df['team'] = nba_df['team'].apply(lambda x: x.split('*')[0])
nba_df['team'] = nba_df['team'].apply(lambda x: x.split('(')[0])
nba_df['team'] = nba_df['team'].str.strip()

cityList = cities['Metropolitan area'].str.strip()

actualCities = []
for idx, city in enumerate(nba_df['team']):
    if city == 'New Orleans Pelicans':
        print('string: ', city.split()[0] + ' ' + city.split()[1])
        print('cityList[28]: ', cityList[28])
        print('is string in list: ', (city.split()[0] + ' ' + city.split()[1]) in cityList)
        print('is string == list[28]: ', (city.split()[0] + ' ' + city.split()[1]) == cityList[28])

output:

string:  New Orleans
cityList[28]:  New Orleans
is string in list:  False
is string == list[28]:  True

It looks like your issue is related to membership testing with the in operator, particularly as it relates to pandas "containers" such as DataFrames and Series. Keep in mind when you say:

how come? the string is definitely in the list.

This is not quite accurate. Your cityList is a Series object, not a list . This creates some quirks we have to work around, since we cannot treat a Series the same as a list. In general Series behave a bit more like a dictionary rather than a list.

I've created a truncated test example for your code, using the setup here:

import pandas as pd

data = {
    "Teams": [ "Boston Celtics", "Brooklyn Nets", "New York Knicks", "Philadelphia 76ers", "Toronto Raptors", "Chicago Bulls", "Cleveland Cavaliers", "Detroit Pistons", "Indiana Pacers", "Milwaukee Bucks", "Atlanta Hawks", "Charlotte Hornets", "Miami Heat", "Orlando Magic", "Washington Wizards", "Denver Nuggets", "Minnesota Timberwolves", "Oklahoma City Thunder", "Portland Trail Blazers", "Utah Jazz", "Golden State Warriors", "Los Angeles Clippers", "Los Angeles Lakers", "Phoenix Suns", "Sacramento Kings", "Houston Rockets", "Memphis Grizzlies", "San Antonio Spurs", "New Orleans Pelicans" ],
    "Cities": [ "Boston", "Brooklyn", "New York", "Philadelphia", "Toronto", "Chicago", "Cleveland", "Detroit", "Indiana", "Milwaukee", "Atlanta", "Charlotte", "Miami", "Orlando", "Washington", "Denver", "Minnesota", "Oklahoma City", "Portland", "Utah", "Golden", "Los Angeles", "Los Angeles", "Phoenix", "Sacramento", "Houston", "Memphis", "San Antonio", "New Orleans" ]
}

nba_df = pd.DataFrame(data, columns = ['Teams', 'Cities'])
# doing this to mimic your code of storing the Series to cityList
cityList = nba_df['Cities'].str.strip()

print(cityList)
print(type(cityList))

Output:

0            Boston
1          Brooklyn
2          New York
...
28      New Orleans
<class 'pandas.core.series.Series'>

The key is to use cityList.values , rather than just cityList . However, I encourage you to read the Series.values documentation , as Pandas does not recommend using this property anymore (it looks like Series.array was added in 0.24, and they recommend using that instead). Both PandasArray and numpy.ndarray appear to behave a bit more like a list , at least in this example when it comes to membership test. Again, reading the Series.array documentation is highly encouraged.

Example from the terminal:

>>> cityList[28]
'New Orleans'
>>> 'New Orleans' in cityList
False
>>> 'New Orleans' in cityList.values
True
>>> 'New Orleans' in cityList.array
True

You could also just create a list from your cityList (which again, is a Series )

>>> list(cityList)
['Boston', 'Brooklyn', ..., 'New Orleans']
>>> 'New Orleans' in list(cityList)
True

Side Note

I would probably rename your cityList to citySeries or something similar, to make a note in your code that you are not dealing with a list, but a "special" container from the pandas library.

Alternatively, you could just create your cityList like so ( note: I'm using your code now, not my example):

cityList = list(cities['Metropolitan area'].str.strip())

I did have to do a bit of research for this answer as I am by no means a pandas expert, so here are the three questions that helped me figure this out:

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM