简体   繁体   中英

Python list comprehension with loop over dataframe

I am looking for something quite specific that I can't quite find the answer to.

I have two dataframes. One that contains an ID, latitude and longitude. The other has just an ID.

I want to store in a list the latitude and longitude as long as the ID from Dataframe A exists in Dataframe B using list comprehension. I can get the first part working fine, but matching the IDs appears to be causing a problem. This is what I have so far:

heat_data = [[row['latitude'],row['longitude']] for index, row in extract.iterrows() if row['NBN Location Id'] in closed['SP Order Location ID']]

To me, that says store 'latitude' and 'longitude' from extract as long as the ID exists in the other dataframe (closed). However this causes no data to be retrieved. Can anyone guide me as to where I'm going wrong? If I exclude the last 'if' statement, it works fine. So how else am I supposed to be doing this if statement?

Thanks!

I think list comprehension is not necessary, better and faster is use vectorized solution by filter by boolean indexing with isin and then convert to lists:

mask = extract['NBN Location Id'].isin(closed['SP Order Location ID'])
cheat_data = extract.loc[mask, ['latitude', 'longitude']].values.tolist()

Sample :

closed = pd.DataFrame({'SP Order Location ID':list('ace')})
print (closed)
  SP Order Location ID
0                    a
1                    c
2                    e

extract = pd.DataFrame({'NBN Location Id':list('abcde'),
                       'latitude':['lat1','lat2','lat3','lat4','lat4'],
                       'longitude':['long1','long2','long3','long4','long4']})

print (extract)
  NBN Location Id latitude longitude
0               a     lat1     long1
1               b     lat2     long2
2               c     lat3     long3
3               d     lat4     long4
4               e     lat4     long4

mask = extract['NBN Location Id'].isin(closed['SP Order Location ID'])
cheat_data = extract.loc[mask, ['latitude', 'longitude']].values.tolist()
print (cheat_data)
[['lat1', 'long1'], ['lat3', 'long3'], ['lat4', 'long4']]

Your solution failed, because in in pandas check index values , not values of Series , so need convert to list:

cheat_data = [[row['latitude'],row['longitude']] for index, row in extract.iterrows() 
              if row['NBN Location Id'] in closed['SP Order Location ID'].tolist()]
print (cheat_data)
[['lat1', 'long1'], ['lat3', 'long3'], ['lat4', 'long4']]

#changed index values
closed = pd.DataFrame({'SP Order Location ID':list('ace')}, index=list('dbw'))
print (closed)
  SP Order Location ID
d                    a
b                    c
w                    e

cheat_data = [[row['latitude'],row['longitude']] for index, row in extract.iterrows() 
              if row['NBN Location Id'] in closed['SP Order Location ID']]
print (cheat_data)

[['lat2', 'long2'], ['lat4', 'long4']]

Using @jezrael's data

ids = {*closed['SP Order Location ID']}
cols = ['latitude', 'longitude', 'NBN Location Id']
[p for *p, i in zip(*map(extract.get, cols)) if i in ids]

[['lat1', 'long1'], ['lat3', 'long3'], ['lat4', 'long4']]

closed = pd.DataFrame({'SP Order Location ID':list('ace')})

extract = pd.DataFrame({'NBN Location Id':list('abcde'),
                       'latitude':['lat1','lat2','lat3','lat4','lat4'],
                       'longitude':['long1','long2','long3','long4','long4']})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM