[英]python, x in list and x == list[28] deliver different results
我试图找出某个字符串是否在列表中。 使用时: 'if string in list'
我得到一个false
。 但是当我尝试'if string == list[28]'
时,我得到了一个true
。
怎么来的? 该字符串肯定在列表中。
import pandas as pd
import numpy as np
import scipy.stats as stats
import re
nba_df=pd.read_csv("assets/nba.csv")
cities=pd.read_html("assets/wikipedia_data.html")[1]
cities=cities.iloc[:-1,[0,3,5,6,7,8]]
nba_df = nba_df[(nba_df['year'] == 2018)]
nba_df['team'] = nba_df['team'].apply(lambda x: x.split('*')[0])
nba_df['team'] = nba_df['team'].apply(lambda x: x.split('(')[0])
nba_df['team'] = nba_df['team'].str.strip()
cityList = cities['Metropolitan area'].str.strip()
actualCities = []
for idx, city in enumerate(nba_df['team']):
if city == 'New Orleans Pelicans':
print('string: ', city.split()[0] + ' ' + city.split()[1])
print('cityList[28]: ', cityList[28])
print('is string in list: ', (city.split()[0] + ' ' + city.split()[1]) in cityList)
print('is string == list[28]: ', (city.split()[0] + ' ' + city.split()[1]) == cityList[28])
output:
string: New Orleans
cityList[28]: New Orleans
is string in list: False
is string == list[28]: True
看起来您的问题与in
运算符的成员资格测试有关,特别是与pandas
“容器”(例如 DataFrames 和 Series)有关。 当你说:
怎么来的? 该字符串肯定在列表中。
这不太准确。 您的cityList
是 object Series
,而不是list
。 这会产生一些我们必须解决的怪癖,因为我们不能将Series
视为列表。 一般来说, Series
的行为更像是dictionary
而不是列表。
我使用此处的设置为您的代码创建了一个截断的测试示例:
import pandas as pd
data = {
"Teams": [ "Boston Celtics", "Brooklyn Nets", "New York Knicks", "Philadelphia 76ers", "Toronto Raptors", "Chicago Bulls", "Cleveland Cavaliers", "Detroit Pistons", "Indiana Pacers", "Milwaukee Bucks", "Atlanta Hawks", "Charlotte Hornets", "Miami Heat", "Orlando Magic", "Washington Wizards", "Denver Nuggets", "Minnesota Timberwolves", "Oklahoma City Thunder", "Portland Trail Blazers", "Utah Jazz", "Golden State Warriors", "Los Angeles Clippers", "Los Angeles Lakers", "Phoenix Suns", "Sacramento Kings", "Houston Rockets", "Memphis Grizzlies", "San Antonio Spurs", "New Orleans Pelicans" ],
"Cities": [ "Boston", "Brooklyn", "New York", "Philadelphia", "Toronto", "Chicago", "Cleveland", "Detroit", "Indiana", "Milwaukee", "Atlanta", "Charlotte", "Miami", "Orlando", "Washington", "Denver", "Minnesota", "Oklahoma City", "Portland", "Utah", "Golden", "Los Angeles", "Los Angeles", "Phoenix", "Sacramento", "Houston", "Memphis", "San Antonio", "New Orleans" ]
}
nba_df = pd.DataFrame(data, columns = ['Teams', 'Cities'])
# doing this to mimic your code of storing the Series to cityList
cityList = nba_df['Cities'].str.strip()
print(cityList)
print(type(cityList))
Output:
0 Boston
1 Brooklyn
2 New York
...
28 New Orleans
<class 'pandas.core.series.Series'>
关键是使用cityList.values
,而不仅仅是cityList
。 但是,我鼓励您阅读Series.values
文档,因为 Pandas 不再建议使用此属性(看起来Series.array
是在 0.24 中添加的,他们建议改用它)。 PandasArray
和numpy.ndarray
看起来都更像一个list
,至少在这个例子中,当涉及到成员资格测试时。 同样,强烈建议阅读Series.array
文档。
来自终端的示例:
>>> cityList[28]
'New Orleans'
>>> 'New Orleans' in cityList
False
>>> 'New Orleans' in cityList.values
True
>>> 'New Orleans' in cityList.array
True
您也可以从您的cityList
创建一个列表(这又是一个Series
)
>>> list(cityList)
['Boston', 'Brooklyn', ..., 'New Orleans']
>>> 'New Orleans' in list(cityList)
True
边注
我可能会将您的cityList
重命名为citySeries
或类似名称,以在您的代码中说明您处理的不是列表,而是pandas
库中的“特殊”容器。
或者,您可以像这样创建您的cityList
(注意:我现在使用的是您的代码,而不是我的示例):
cityList = list(cities['Metropolitan area'].str.strip())
我确实需要为这个答案做一些研究,因为我绝不是pandas
专家,所以这里有三个问题帮助我解决了这个问题:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.