Im trying to find if some string is in a list. when using: 'if string in list'
i get a false
. but when im trying 'if string == list[28]'
i get a true
.
how come? the string is definitely in the list.
import pandas as pd
import numpy as np
import scipy.stats as stats
import re
nba_df=pd.read_csv("assets/nba.csv")
cities=pd.read_html("assets/wikipedia_data.html")[1]
cities=cities.iloc[:-1,[0,3,5,6,7,8]]
nba_df = nba_df[(nba_df['year'] == 2018)]
nba_df['team'] = nba_df['team'].apply(lambda x: x.split('*')[0])
nba_df['team'] = nba_df['team'].apply(lambda x: x.split('(')[0])
nba_df['team'] = nba_df['team'].str.strip()
cityList = cities['Metropolitan area'].str.strip()
actualCities = []
for idx, city in enumerate(nba_df['team']):
if city == 'New Orleans Pelicans':
print('string: ', city.split()[0] + ' ' + city.split()[1])
print('cityList[28]: ', cityList[28])
print('is string in list: ', (city.split()[0] + ' ' + city.split()[1]) in cityList)
print('is string == list[28]: ', (city.split()[0] + ' ' + city.split()[1]) == cityList[28])
output:
string: New Orleans
cityList[28]: New Orleans
is string in list: False
is string == list[28]: True
It looks like your issue is related to membership testing with the in
operator, particularly as it relates to pandas
"containers" such as DataFrames and Series. Keep in mind when you say:
how come? the string is definitely in the list.
This is not quite accurate. Your cityList
is a Series
object, not a list
. This creates some quirks we have to work around, since we cannot treat a Series
the same as a list. In general Series
behave a bit more like a dictionary
rather than a list.
I've created a truncated test example for your code, using the setup here:
import pandas as pd
data = {
"Teams": [ "Boston Celtics", "Brooklyn Nets", "New York Knicks", "Philadelphia 76ers", "Toronto Raptors", "Chicago Bulls", "Cleveland Cavaliers", "Detroit Pistons", "Indiana Pacers", "Milwaukee Bucks", "Atlanta Hawks", "Charlotte Hornets", "Miami Heat", "Orlando Magic", "Washington Wizards", "Denver Nuggets", "Minnesota Timberwolves", "Oklahoma City Thunder", "Portland Trail Blazers", "Utah Jazz", "Golden State Warriors", "Los Angeles Clippers", "Los Angeles Lakers", "Phoenix Suns", "Sacramento Kings", "Houston Rockets", "Memphis Grizzlies", "San Antonio Spurs", "New Orleans Pelicans" ],
"Cities": [ "Boston", "Brooklyn", "New York", "Philadelphia", "Toronto", "Chicago", "Cleveland", "Detroit", "Indiana", "Milwaukee", "Atlanta", "Charlotte", "Miami", "Orlando", "Washington", "Denver", "Minnesota", "Oklahoma City", "Portland", "Utah", "Golden", "Los Angeles", "Los Angeles", "Phoenix", "Sacramento", "Houston", "Memphis", "San Antonio", "New Orleans" ]
}
nba_df = pd.DataFrame(data, columns = ['Teams', 'Cities'])
# doing this to mimic your code of storing the Series to cityList
cityList = nba_df['Cities'].str.strip()
print(cityList)
print(type(cityList))
Output:
0 Boston
1 Brooklyn
2 New York
...
28 New Orleans
<class 'pandas.core.series.Series'>
The key is to use cityList.values
, rather than just cityList
. However, I encourage you to read the Series.values
documentation , as Pandas does not recommend using this property anymore (it looks like Series.array
was added in 0.24, and they recommend using that instead). Both PandasArray
and numpy.ndarray
appear to behave a bit more like a list
, at least in this example when it comes to membership test. Again, reading the Series.array
documentation is highly encouraged.
Example from the terminal:
>>> cityList[28]
'New Orleans'
>>> 'New Orleans' in cityList
False
>>> 'New Orleans' in cityList.values
True
>>> 'New Orleans' in cityList.array
True
You could also just create a list from your cityList
(which again, is a Series
)
>>> list(cityList)
['Boston', 'Brooklyn', ..., 'New Orleans']
>>> 'New Orleans' in list(cityList)
True
Side Note
I would probably rename your cityList
to citySeries
or something similar, to make a note in your code that you are not dealing with a list, but a "special" container from the pandas
library.
Alternatively, you could just create your cityList
like so ( note: I'm using your code now, not my example):
cityList = list(cities['Metropolitan area'].str.strip())
I did have to do a bit of research for this answer as I am by no means a pandas
expert, so here are the three questions that helped me figure this out:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.