简体   繁体   中英

How to get a value from a pandas core series?

I have a dataframe df in which are the timezones for particular ip numbers:

ip1         ip2           timezone
0           16777215          0
16777216    16777471       +10:00
16777472    16778239       +08:00
16778240    16779263       +11:00
16779264    16781311       +08:00
16781312    16785407       +09:00
...

The first row is valid for the ip numbers from 0 to 16777215, the second from 16777216 to 16777471 an so on. Now, I go through a folder an want to know the timezone for every file (after I calculate the ip_number of the file). I use:

time=df.loc[(df['ip1'] <= ip_number) & (ip_number <= df['ip2']), 'timezone']

and become my expected output:

1192    +05:30
Name: timezone, dtype: object

But this is panda core series series and I just want to have "+5:30". How do I become this? Or is there another way instead of df.loc[...] to become directly the value of the column timezone in df ?

To pull the only value out of a Series of size 1, use the Series.item() method :

time = df.loc[(df['ip1'] <= ip_number) & (ip_number <= df['ip2']), 'timezone'].item()

Note that this raises a ValueError if the Series contains more than one item.


Usually pulling single values out of a Series is an anti-pattern. NumPy/Pandas is built around the idea that applying vectorized functions to large arrays is going to be much much faster than using a Python loop that processes single values one at a time.

Given your df and a list of IP numbers, here is a way to find the corresponding timezone offsets for all the IP numbers with just one call to pd.merge_asof .

import pandas as pd
df = pd.DataFrame({'ip1': [0, 16777216, 16777472, 16778240, 16779264, 16781312],
                   'ip2': [16777215, 16777471, 16778239, 16779263, 16781311, 16785407],
                   'timezone': ['0', '+10:00', '+08:00', '+11:00', '+08:00', '+09:00']})

df1 = df.melt(id_vars=['timezone'], value_name='ip').sort_values(by='ip').drop('variable', axis=1)
ip_nums = [16777473, 16777471, 16778238, 16785406]
df2 = pd.DataFrame({'ip':ip_nums}).sort_values(by='ip')
result = pd.merge_asof(df2, df1)
print(result)

yields

         ip timezone
0  16777471   +10:00
1  16777473   +08:00
2  16778238   +08:00
3  16785406   +09:00

Ideally, your next step would be to apply more NumPy/Pandas vectorized functions to process the whole DataFrame at once. But if you must, you could iterate through the result DataFrame row-by-row. Still, your code will look a little bit cleaner since you'll be able to read off ip and corresponding offset easily (and without calling .item() ).

for row in result.itertuples():
    print('{} --> {}'.format(row.ip, row.timezone))
# 16777471 --> +10:00
# 16777473 --> +08:00
# 16778238 --> +08:00
# 16785406 --> +09:00

just list it

list(time)

if you are excepting only one value

list(time)[0]

or you can make it earlier:

#for numpy array
time=df.loc[(df['ip1'] <= ip_number) & (ip_number <= df['ip2']), 'timezone'].values

#for list
time=list(df.loc[(df['ip1'] <= ip_number) & (ip_number <= df['ip2']), 'timezone'].values)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM