I want to find the index of a dictionary or a list item, where a condition meets and write it to a new column in a dataframe.
I start with the following setup:
import pandas as pd
import numpy as np
df = pd.DataFrame(data = {'col1': ['2018_08', '2008_02','2019_01','2017_04']})
dates = {0: ['2019-01-15 00:00:00', '2019_01', 1, 2019, 0],
-1: ['2018-12-15 00:00:00', '2018_12', 12, 2018, -1],
-2: ['2018-11-15 00:00:00', '2018_11', 11, 2018, -2],
-3: ['2018-10-15 00:00:00', '2018_10', 10, 2018, -3],
-4: ['2018-09-15 00:00:00', '2018_09', 9, 2018, -4],
-5: ['2018-08-15 00:00:00', '2018_08', 8, 2018, -5]}
I want to check whether the values of the column col1
in the dataframe df
are included in the dictionary dates
or not. If yes, then give back the key or the last entry of the corresponding list in the dictionary. If not, then return NaT or NaN. I've tried:
df['month_seq'] = np.where(df.col1.isin([dates[i][1] for i in range(0,-6,-1)]), '?' ,pd.NaT)
which identifies the correct entries but does not return the corresponding negative numbers. The output reads as:
col1 month_seq
0 2018_08 ?
1 2008_02 NaT
2 2019_01 ?
3 2017_04 NaT
If have tried something with
[dates[i][1] for i in range(0,-6,-1)].index(df.col1)
returning an error.
Thanks in advance for your help.
Use map
with dictionary created by dictionary comprehension:
df = pd.DataFrame(data = {'col1': ['2018_08', '2008_02','2019_01','2017_04']})
dates = {0: ['2019-01-15 00:00:00', '2019_01', 1, 2019, 0],
-1: ['2018-12-15 00:00:00', '2018_12', 12, 2018, -1],
-2: ['2018-11-15 00:00:00', '2018_11', 11, 2018, -2],
-3: ['2018-10-15 00:00:00', '2018_10', 10, 2018, -3],
-4: ['2018-09-15 00:00:00', '2018_09', 9, 2018, -4],
-5: ['2018-08-15 00:00:00', '2018_08', 8, 2018, -5]}
d = {v[1]:k for k, v in dates.items()}
print (d)
{'2019_01': 0, '2018_12': -1, '2018_11': -2, '2018_10': -3, '2018_09': -4, '2018_08': -5}
df['new'] = df['col1'].map(d)
print (df)
col1 new
0 2018_08 -5.0
1 2008_02 NaN
2 2019_01 0.0
3 2017_04 NaN
You could use apply with a proper function ( locate
in this case):
import pandas as pd
import numpy as np
df = pd.DataFrame(data = {'col1': ['2018_08', '2008_02','2019_01','2017_04']})
dates = {0: ['2019-01-15 00:00:00', '2019_01', 1, 2019, 0],
-1: ['2018-12-15 00:00:00', '2018_12', 12, 2018, -1],
-2: ['2018-11-15 00:00:00', '2018_11', 11, 2018, -2],
-3: ['2018-10-15 00:00:00', '2018_10', 10, 2018, -3],
-4: ['2018-09-15 00:00:00', '2018_09', 9, 2018, -4],
-5: ['2018-08-15 00:00:00', '2018_08', 8, 2018, -5]}
def locate(e, d=dates):
for k, values in dates.items():
if e in values:
return k
return np.nan
result = df['col1'].apply(locate)
print(result)
Output
0 -5.0
1 NaN
2 0.0
3 NaN
Name: col1, dtype: float64
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.