简体   繁体   English

想要找到与pandas数据框对应的元素的列表索引(带有.index()的np.where)

[英]Want to find list index of element correpsonding to pandas dataframe (np.where with .index())

I want to find the index of a dictionary or a list item, where a condition meets and write it to a new column in a dataframe. 我想找到一个满足条件的字典或列表项的索引,并将其写入数据框中的新列。

I start with the following setup: 我从以下设置开始:

import pandas as pd
import numpy as np
df = pd.DataFrame(data = {'col1': ['2018_08', '2008_02','2019_01','2017_04']})

dates = {0: ['2019-01-15 00:00:00', '2019_01', 1, 2019, 0],
         -1: ['2018-12-15 00:00:00', '2018_12', 12, 2018, -1],
         -2: ['2018-11-15 00:00:00', '2018_11', 11, 2018, -2],
         -3: ['2018-10-15 00:00:00', '2018_10', 10, 2018, -3],
         -4: ['2018-09-15 00:00:00', '2018_09', 9, 2018, -4],
         -5: ['2018-08-15 00:00:00', '2018_08', 8, 2018, -5]}

I want to check whether the values of the column col1 in the dataframe df are included in the dictionary dates or not. 我想检查数据帧dfcol1列的值是否包含在字典dates If yes, then give back the key or the last entry of the corresponding list in the dictionary. 如果是,则返回键或字典中相应列表的最后一个条目。 If not, then return NaT or NaN. 如果不是,则返回NaT或NaN。 I've tried: 我试过了:

df['month_seq'] = np.where(df.col1.isin([dates[i][1] for i in range(0,-6,-1)]), '?' ,pd.NaT)

which identifies the correct entries but does not return the corresponding negative numbers. 它标识正确的条目,但不返回相应的负数。 The output reads as: 输出为:

    col1    month_seq
0   2018_08     ?
1   2008_02     NaT
2   2019_01     ?
3   2017_04     NaT

If have tried something with 如果尝试过

[dates[i][1] for i in range(0,-6,-1)].index(df.col1)

returning an error. 返回错误。

Thanks in advance for your help. 在此先感谢您的帮助。

Use map with dictionary created by dictionary comprehension: map与通过词典理解创建的词典一起使用:

df = pd.DataFrame(data = {'col1': ['2018_08', '2008_02','2019_01','2017_04']})

dates = {0: ['2019-01-15 00:00:00', '2019_01', 1, 2019, 0],
         -1: ['2018-12-15 00:00:00', '2018_12', 12, 2018, -1],
         -2: ['2018-11-15 00:00:00', '2018_11', 11, 2018, -2],
         -3: ['2018-10-15 00:00:00', '2018_10', 10, 2018, -3],
         -4: ['2018-09-15 00:00:00', '2018_09', 9, 2018, -4],
         -5: ['2018-08-15 00:00:00', '2018_08', 8, 2018, -5]}

d = {v[1]:k for k, v in dates.items()}
print (d)
{'2019_01': 0, '2018_12': -1, '2018_11': -2, '2018_10': -3, '2018_09': -4, '2018_08': -5}

df['new'] = df['col1'].map(d)
print (df)
      col1  new
0  2018_08 -5.0
1  2008_02  NaN
2  2019_01  0.0
3  2017_04  NaN

You could use apply with a proper function ( locate in this case): 您可以使用带有适当功能的apply (在这种情况下locate ):

import pandas as pd
import numpy as np
df = pd.DataFrame(data = {'col1': ['2018_08', '2008_02','2019_01','2017_04']})

dates = {0: ['2019-01-15 00:00:00', '2019_01', 1, 2019, 0],
         -1: ['2018-12-15 00:00:00', '2018_12', 12, 2018, -1],
         -2: ['2018-11-15 00:00:00', '2018_11', 11, 2018, -2],
         -3: ['2018-10-15 00:00:00', '2018_10', 10, 2018, -3],
         -4: ['2018-09-15 00:00:00', '2018_09', 9, 2018, -4],
         -5: ['2018-08-15 00:00:00', '2018_08', 8, 2018, -5]}


def locate(e, d=dates):
    for k, values in dates.items():
        if e in values:
            return k
    return np.nan


result = df['col1'].apply(locate)
print(result)

Output 产量

0   -5.0
1    NaN
2    0.0
3    NaN
Name: col1, dtype: float64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM