简体   繁体   中英

Want to find list index of element correpsonding to pandas dataframe (np.where with .index())

I want to find the index of a dictionary or a list item, where a condition meets and write it to a new column in a dataframe.

I start with the following setup:

import pandas as pd
import numpy as np
df = pd.DataFrame(data = {'col1': ['2018_08', '2008_02','2019_01','2017_04']})

dates = {0: ['2019-01-15 00:00:00', '2019_01', 1, 2019, 0],
         -1: ['2018-12-15 00:00:00', '2018_12', 12, 2018, -1],
         -2: ['2018-11-15 00:00:00', '2018_11', 11, 2018, -2],
         -3: ['2018-10-15 00:00:00', '2018_10', 10, 2018, -3],
         -4: ['2018-09-15 00:00:00', '2018_09', 9, 2018, -4],
         -5: ['2018-08-15 00:00:00', '2018_08', 8, 2018, -5]}

I want to check whether the values of the column col1 in the dataframe df are included in the dictionary dates or not. If yes, then give back the key or the last entry of the corresponding list in the dictionary. If not, then return NaT or NaN. I've tried:

df['month_seq'] = np.where(df.col1.isin([dates[i][1] for i in range(0,-6,-1)]), '?' ,pd.NaT)

which identifies the correct entries but does not return the corresponding negative numbers. The output reads as:

    col1    month_seq
0   2018_08     ?
1   2008_02     NaT
2   2019_01     ?
3   2017_04     NaT

If have tried something with

[dates[i][1] for i in range(0,-6,-1)].index(df.col1)

returning an error.

Thanks in advance for your help.

Use map with dictionary created by dictionary comprehension:

df = pd.DataFrame(data = {'col1': ['2018_08', '2008_02','2019_01','2017_04']})

dates = {0: ['2019-01-15 00:00:00', '2019_01', 1, 2019, 0],
         -1: ['2018-12-15 00:00:00', '2018_12', 12, 2018, -1],
         -2: ['2018-11-15 00:00:00', '2018_11', 11, 2018, -2],
         -3: ['2018-10-15 00:00:00', '2018_10', 10, 2018, -3],
         -4: ['2018-09-15 00:00:00', '2018_09', 9, 2018, -4],
         -5: ['2018-08-15 00:00:00', '2018_08', 8, 2018, -5]}

d = {v[1]:k for k, v in dates.items()}
print (d)
{'2019_01': 0, '2018_12': -1, '2018_11': -2, '2018_10': -3, '2018_09': -4, '2018_08': -5}

df['new'] = df['col1'].map(d)
print (df)
      col1  new
0  2018_08 -5.0
1  2008_02  NaN
2  2019_01  0.0
3  2017_04  NaN

You could use apply with a proper function ( locate in this case):

import pandas as pd
import numpy as np
df = pd.DataFrame(data = {'col1': ['2018_08', '2008_02','2019_01','2017_04']})

dates = {0: ['2019-01-15 00:00:00', '2019_01', 1, 2019, 0],
         -1: ['2018-12-15 00:00:00', '2018_12', 12, 2018, -1],
         -2: ['2018-11-15 00:00:00', '2018_11', 11, 2018, -2],
         -3: ['2018-10-15 00:00:00', '2018_10', 10, 2018, -3],
         -4: ['2018-09-15 00:00:00', '2018_09', 9, 2018, -4],
         -5: ['2018-08-15 00:00:00', '2018_08', 8, 2018, -5]}


def locate(e, d=dates):
    for k, values in dates.items():
        if e in values:
            return k
    return np.nan


result = df['col1'].apply(locate)
print(result)

Output

0   -5.0
1    NaN
2    0.0
3    NaN
Name: col1, dtype: float64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM