[英]np.where() conversion failing
I'm trying to map an integer which represents Year-Month ('Period' column in code below) into a new column that represents the Year-Quarter ('DIST_PERIOD' column).我试图将代表年-月(下面代码中的“期间”列)的整数映射到代表年-季度(“DIST_PERIOD”列)的新列中。
For instance,例如,
202101, 202102, and 202103 become '20211' 202101、202102 和 202103 变为“20211”
202104, 202105, and 202106 become '20212' 202104、202105 和 202106 变为“20212”
etc.等等。
My code below is running but not how I thought it would.我下面的代码正在运行,但不是我想象的那样。
df['DIST_PERIOD'] = np.where((str(df['Period'])[4:] == '01') | (str(df['Period'])[4:] == '02') | (str(df['Period'])[4:] == '03'), str(df['Period'])[:4]+'1', df['DIST_PERIOD'])
df['DIST_PERIOD'] = np.where((str(df['Period'])[4:] == '04') | (str(df['Period'])[4:] == '05') | (str(df['Period'])[4:] == '06'), str(df['Period'])[:4]+'2', df['DIST_PERIOD'])
df['DIST_PERIOD'] = np.where((str(df['Period'])[4:] == '07') | (str(df['Period'])[4:] == '08') | (str(df['Period'])[4:] == '09'), str(df['Period'])[:4]+'3', df['DIST_PERIOD'])
df['DIST_PERIOD'] = np.where((str(df['Period'])[4:] == '10') | (str(df['Period'])[4:] == '11') | (str(df['Period'])[4:] == '12'), str(df['Period'])[:4]+'4', df['DIST_PERIOD'])
Not sure how to correct my str() so that I am correctly capturing the last two characters for each row.不知道如何更正我的 str() 以便我正确捕获每行的最后两个字符。
A better way is to convert the column to datetime and then access and combine the datetime properties year
and quarter
.更好的方法是将列转换为日期时间,然后访问并组合日期时间属性
year
和quarter
。
month_year = pd.to_datetime(df['DIST_PERIOD'], format="%Y%m")
df['DIST_PERIOD'] = month_year.dt.year.astype(str) + month_year.dt.quarter.astype(str)
If you want to automate it, you can look at a solution like this.如果你想自动化它,你可以看看这样的解决方案。 You may have to play around with the datatype depending on your dataframe.
您可能需要根据您的数据框来使用数据类型。 Here I'm just passing the column into a function and returning a new value.
在这里,我只是将列传递给函数并返回一个新值。 Also assumes you'll always have 4 digit year and two digit month
还假设你总是有 4 位数的年份和两位数的月份
data='''yrmo
202101
202102
202103
202104
202105
202106
202109
202111'''
df = pd.read_csv(io.StringIO(data), sep=' \s+', engine='python')
def get_quarter(x):
mo = str(x)[-2:]
yr = str(x)[0:4]
if mo in ['01', '02', '03']:
return yr + '1'
elif mo in ['04', '05', '06']:
return yr + '2'
elif mo in ['07', '08', '09']:
return yr + '3'
else:
return yr + '4'
df['yrmo'].apply(get_quarter)
0 20211
1 20211
2 20211
3 20212
4 20212
5 20212
6 20213
7 20214
Name: yrmo, dtype: object
not sure if the np.where
is a good choice here, instead use the map
function from pandas.不确定
np.where
在这里np.where
是一个不错的选择,而是使用 pandas 的map
函数。
create a dictionary objects of this mapping such for example: di= {'202101':'20211', '202102':'20211', '202103':'20211'}
创建此映射的字典对象,例如:
di= {'202101':'20211', '202102':'20211', '202103':'20211'}
same way, add the more dictionary object in above dictionary to create the mapping object.同样的方法,在上面的字典中添加更多的字典对象来创建映射对象。
after that do this:之后这样做:
df['yourcolumnnametobemapped'] = df['yourcolumnnametobemapped'].map(di)
Note: it will create a NAN if cannot find the all the mapping object.注意:如果找不到所有映射对象,它将创建一个 NAN。 If you want to map only few objects and leave other as untouched, then use:
如果您只想映射少数对象而其他对象保持不变,请使用:
df['yourcolumnnametobemapped'] = df['yourcolumnnametobemapped'].map(di).fillna(df['yourcolumnnametobemapped'])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.