简体   繁体   English

带有循环遍历数据框的Python列表理解

[英]Python list comprehension with loop over dataframe

I am looking for something quite specific that I can't quite find the answer to. 我正在寻找一些非常具体的东西,而我却找不到答案。

I have two dataframes. 我有两个数据框。 One that contains an ID, latitude and longitude. 包含ID,纬度和经度的一个。 The other has just an ID. 另一个只有一个ID。

I want to store in a list the latitude and longitude as long as the ID from Dataframe A exists in Dataframe B using list comprehension. 我想使用列表推导将纬度和经度存储在列表中,只要数据框A中的ID存在于数据框B中。 I can get the first part working fine, but matching the IDs appears to be causing a problem. 我可以使第一部分工作正常,但与ID匹配似乎会引起问题。 This is what I have so far: 这是我到目前为止的内容:

heat_data = [[row['latitude'],row['longitude']] for index, row in extract.iterrows() if row['NBN Location Id'] in closed['SP Order Location ID']]

To me, that says store 'latitude' and 'longitude' from extract as long as the ID exists in the other dataframe (closed). 对我来说,这意味着只要ID在另一个数据框中(关闭)中存在,就从提取中存储“纬度”和“经度”。 However this causes no data to be retrieved. 但是,这将导致没有数据被检索。 Can anyone guide me as to where I'm going wrong? 谁能指导我哪里出问题了? If I exclude the last 'if' statement, it works fine. 如果我排除最后一个'if'语句,则可以正常工作。 So how else am I supposed to be doing this if statement? 那么我应该怎么做呢if语句呢?

Thanks! 谢谢!

I think list comprehension is not necessary, better and faster is use vectorized solution by filter by boolean indexing with isin and then convert to lists: 我认为列表理解不是必需的,更好,更快的方法是使用isin boolean indexing通过过滤器使用矢量化解决方案,然后转换为列表:

mask = extract['NBN Location Id'].isin(closed['SP Order Location ID'])
cheat_data = extract.loc[mask, ['latitude', 'longitude']].values.tolist()

Sample : 样品

closed = pd.DataFrame({'SP Order Location ID':list('ace')})
print (closed)
  SP Order Location ID
0                    a
1                    c
2                    e

extract = pd.DataFrame({'NBN Location Id':list('abcde'),
                       'latitude':['lat1','lat2','lat3','lat4','lat4'],
                       'longitude':['long1','long2','long3','long4','long4']})

print (extract)
  NBN Location Id latitude longitude
0               a     lat1     long1
1               b     lat2     long2
2               c     lat3     long3
3               d     lat4     long4
4               e     lat4     long4

mask = extract['NBN Location Id'].isin(closed['SP Order Location ID'])
cheat_data = extract.loc[mask, ['latitude', 'longitude']].values.tolist()
print (cheat_data)
[['lat1', 'long1'], ['lat3', 'long3'], ['lat4', 'long4']]

Your solution failed, because in in pandas check index values , not values of Series , so need convert to list: 您的解决方案失败了,因为 pandas中检查index值 ,而不是Series值,因此需要转换为list:

cheat_data = [[row['latitude'],row['longitude']] for index, row in extract.iterrows() 
              if row['NBN Location Id'] in closed['SP Order Location ID'].tolist()]
print (cheat_data)
[['lat1', 'long1'], ['lat3', 'long3'], ['lat4', 'long4']]

#changed index values
closed = pd.DataFrame({'SP Order Location ID':list('ace')}, index=list('dbw'))
print (closed)
  SP Order Location ID
d                    a
b                    c
w                    e

cheat_data = [[row['latitude'],row['longitude']] for index, row in extract.iterrows() 
              if row['NBN Location Id'] in closed['SP Order Location ID']]
print (cheat_data)

[['lat2', 'long2'], ['lat4', 'long4']]

Using @jezrael's data 使用@jezrael的数据

ids = {*closed['SP Order Location ID']}
cols = ['latitude', 'longitude', 'NBN Location Id']
[p for *p, i in zip(*map(extract.get, cols)) if i in ids]

[['lat1', 'long1'], ['lat3', 'long3'], ['lat4', 'long4']]

closed = pd.DataFrame({'SP Order Location ID':list('ace')})

extract = pd.DataFrame({'NBN Location Id':list('abcde'),
                       'latitude':['lat1','lat2','lat3','lat4','lat4'],
                       'longitude':['long1','long2','long3','long4','long4']})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM