简体   繁体   English

python, x in list 和 x == list[28] 提供不同的结果

[英]python, x in list and x == list[28] deliver different results

Im trying to find if some string is in a list.我试图找出某个字符串是否在列表中。 when using: 'if string in list' i get a false .使用时: 'if string in list'我得到一个false but when im trying 'if string == list[28]' i get a true .但是当我尝试'if string == list[28]'时,我得到了一个true

how come?怎么来的? the string is definitely in the list.该字符串肯定在列表中。

import pandas as pd
import numpy as np
import scipy.stats as stats
import re

nba_df=pd.read_csv("assets/nba.csv")
cities=pd.read_html("assets/wikipedia_data.html")[1]
cities=cities.iloc[:-1,[0,3,5,6,7,8]]
nba_df = nba_df[(nba_df['year'] == 2018)]
nba_df['team'] = nba_df['team'].apply(lambda x: x.split('*')[0])
nba_df['team'] = nba_df['team'].apply(lambda x: x.split('(')[0])
nba_df['team'] = nba_df['team'].str.strip()

cityList = cities['Metropolitan area'].str.strip()

actualCities = []
for idx, city in enumerate(nba_df['team']):
    if city == 'New Orleans Pelicans':
        print('string: ', city.split()[0] + ' ' + city.split()[1])
        print('cityList[28]: ', cityList[28])
        print('is string in list: ', (city.split()[0] + ' ' + city.split()[1]) in cityList)
        print('is string == list[28]: ', (city.split()[0] + ' ' + city.split()[1]) == cityList[28])

output: output:

string:  New Orleans
cityList[28]:  New Orleans
is string in list:  False
is string == list[28]:  True

It looks like your issue is related to membership testing with the in operator, particularly as it relates to pandas "containers" such as DataFrames and Series.看起来您的问题与in运算符的成员资格测试有关,特别是与pandas “容器”(例如 DataFrames 和 Series)有关。 Keep in mind when you say:当你说:

how come?怎么来的? the string is definitely in the list.该字符串肯定在列表中。

This is not quite accurate.这不太准确。 Your cityList is a Series object, not a list .您的cityList是 object Series ,而不是list This creates some quirks we have to work around, since we cannot treat a Series the same as a list.这会产生一些我们必须解决的怪癖,因为我们不能将Series视为列表。 In general Series behave a bit more like a dictionary rather than a list.一般来说, Series的行为更像是dictionary而不是列表。

I've created a truncated test example for your code, using the setup here:我使用此处的设置为您的代码创建了一个截断的测试示例:

import pandas as pd

data = {
    "Teams": [ "Boston Celtics", "Brooklyn Nets", "New York Knicks", "Philadelphia 76ers", "Toronto Raptors", "Chicago Bulls", "Cleveland Cavaliers", "Detroit Pistons", "Indiana Pacers", "Milwaukee Bucks", "Atlanta Hawks", "Charlotte Hornets", "Miami Heat", "Orlando Magic", "Washington Wizards", "Denver Nuggets", "Minnesota Timberwolves", "Oklahoma City Thunder", "Portland Trail Blazers", "Utah Jazz", "Golden State Warriors", "Los Angeles Clippers", "Los Angeles Lakers", "Phoenix Suns", "Sacramento Kings", "Houston Rockets", "Memphis Grizzlies", "San Antonio Spurs", "New Orleans Pelicans" ],
    "Cities": [ "Boston", "Brooklyn", "New York", "Philadelphia", "Toronto", "Chicago", "Cleveland", "Detroit", "Indiana", "Milwaukee", "Atlanta", "Charlotte", "Miami", "Orlando", "Washington", "Denver", "Minnesota", "Oklahoma City", "Portland", "Utah", "Golden", "Los Angeles", "Los Angeles", "Phoenix", "Sacramento", "Houston", "Memphis", "San Antonio", "New Orleans" ]
}

nba_df = pd.DataFrame(data, columns = ['Teams', 'Cities'])
# doing this to mimic your code of storing the Series to cityList
cityList = nba_df['Cities'].str.strip()

print(cityList)
print(type(cityList))

Output: Output:

0            Boston
1          Brooklyn
2          New York
...
28      New Orleans
<class 'pandas.core.series.Series'>

The key is to use cityList.values , rather than just cityList .关键是使用cityList.values ,而不仅仅是cityList However, I encourage you to read the Series.values documentation , as Pandas does not recommend using this property anymore (it looks like Series.array was added in 0.24, and they recommend using that instead).但是,我鼓励您阅读Series.values文档,因为 Pandas 不再建议使用此属性(看起来Series.array是在 0.24 中添加的,他们建议改用它)。 Both PandasArray and numpy.ndarray appear to behave a bit more like a list , at least in this example when it comes to membership test. PandasArraynumpy.ndarray看起来都更像一个list ,至少在这个例子中,当涉及到成员资格测试时。 Again, reading the Series.array documentation is highly encouraged.同样,强烈建议阅读Series.array文档

Example from the terminal:来自终端的示例:

>>> cityList[28]
'New Orleans'
>>> 'New Orleans' in cityList
False
>>> 'New Orleans' in cityList.values
True
>>> 'New Orleans' in cityList.array
True

You could also just create a list from your cityList (which again, is a Series )您也可以从您的cityList创建一个列表(这又是一个Series

>>> list(cityList)
['Boston', 'Brooklyn', ..., 'New Orleans']
>>> 'New Orleans' in list(cityList)
True

Side Note边注

I would probably rename your cityList to citySeries or something similar, to make a note in your code that you are not dealing with a list, but a "special" container from the pandas library.我可能会将您的cityList重命名为citySeries或类似名称,以在您的代码中说明您处理的不是列表,而是pandas库中的“特殊”容器。

Alternatively, you could just create your cityList like so ( note: I'm using your code now, not my example):或者,您可以像这样创建您的cityList注意:我现在使用的是您的代码,而不是我的示例):

cityList = list(cities['Metropolitan area'].str.strip())

I did have to do a bit of research for this answer as I am by no means a pandas expert, so here are the three questions that helped me figure this out:我确实需要为这个答案做一些研究,因为我绝不是pandas专家,所以这里有三个问题帮助我解决了这个问题:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM