简体   繁体   English

np.where() 转换失败

[英]np.where() conversion failing

I'm trying to map an integer which represents Year-Month ('Period' column in code below) into a new column that represents the Year-Quarter ('DIST_PERIOD' column).我试图将代表年-月(下面代码中的“期间”列)的整数映射到代表年-季度(“DIST_PERIOD”列)的新列中。

For instance,例如,

202101, 202102, and 202103 become '20211' 202101、202102 和 202103 变为“20211”

202104, 202105, and 202106 become '20212' 202104、202105 和 202106 变为“20212”

etc.等等。

My code below is running but not how I thought it would.我下面的代码正在运行,但不是我想象的那样。

df['DIST_PERIOD'] = np.where((str(df['Period'])[4:] == '01') | (str(df['Period'])[4:] == '02') | (str(df['Period'])[4:] == '03'), str(df['Period'])[:4]+'1', df['DIST_PERIOD'])
df['DIST_PERIOD'] = np.where((str(df['Period'])[4:] == '04') | (str(df['Period'])[4:] == '05') | (str(df['Period'])[4:] == '06'), str(df['Period'])[:4]+'2', df['DIST_PERIOD'])
df['DIST_PERIOD'] = np.where((str(df['Period'])[4:] == '07') | (str(df['Period'])[4:] == '08') | (str(df['Period'])[4:] == '09'), str(df['Period'])[:4]+'3', df['DIST_PERIOD'])
df['DIST_PERIOD'] = np.where((str(df['Period'])[4:] == '10') | (str(df['Period'])[4:] == '11') | (str(df['Period'])[4:] == '12'), str(df['Period'])[:4]+'4', df['DIST_PERIOD'])

Not sure how to correct my str() so that I am correctly capturing the last two characters for each row.不知道如何更正我的 str() 以便我正确捕获每行的最后两个字符。

A better way is to convert the column to datetime and then access and combine the datetime properties year and quarter .更好的方法是将列转换为日期时间,然后访问并组合日期时间属性yearquarter

month_year = pd.to_datetime(df['DIST_PERIOD'], format="%Y%m")
df['DIST_PERIOD'] = month_year.dt.year.astype(str) + month_year.dt.quarter.astype(str)

If you want to automate it, you can look at a solution like this.如果你想自动化它,你可以看看这样的解决方案。 You may have to play around with the datatype depending on your dataframe.您可能需要根据您的数据框来使用数据类型。 Here I'm just passing the column into a function and returning a new value.在这里,我只是将列传递给函数并返回一个新值。 Also assumes you'll always have 4 digit year and two digit month还假设你总是有 4 位数的年份和两位数的月份

data='''yrmo
202101
202102
202103
202104
202105
202106
202109
202111'''
df = pd.read_csv(io.StringIO(data), sep=' \s+', engine='python')

def get_quarter(x):
    mo = str(x)[-2:]
    yr = str(x)[0:4]
    if mo in ['01', '02', '03']:
        return yr + '1'
    elif mo in ['04', '05', '06']:
        return yr + '2'
    elif mo in ['07', '08', '09']:
        return yr + '3'
    else:
        return yr + '4'

df['yrmo'].apply(get_quarter)

0    20211
1    20211
2    20211
3    20212
4    20212
5    20212
6    20213
7    20214
Name: yrmo, dtype: object

not sure if the np.where is a good choice here, instead use the map function from pandas.不确定np.where在这里np.where是一个不错的选择,而是使用 pandas 的map函数。

create a dictionary objects of this mapping such for example: di= {'202101':'20211', '202102':'20211', '202103':'20211'}创建此映射的字典对象,例如: di= {'202101':'20211', '202102':'20211', '202103':'20211'}

same way, add the more dictionary object in above dictionary to create the mapping object.同样的方法,在上面的字典中添加更多的字典对象来创建映射对象。

after that do this:之后这样做:

df['yourcolumnnametobemapped'] = df['yourcolumnnametobemapped'].map(di)

Note: it will create a NAN if cannot find the all the mapping object.注意:如果找不到所有映射对象,它将创建一个 NAN。 If you want to map only few objects and leave other as untouched, then use:如果您只想映射少数对象而其他对象保持不变,请使用:

df['yourcolumnnametobemapped'] = df['yourcolumnnametobemapped'].map(di).fillna(df['yourcolumnnametobemapped'])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM