I'm doing data wrangling on Python, using the package dfply.
I want to create a new variable "a06", from 'FC06' of the dataset data_a, so that :
For instance, with the input :
df = pd.DataFrame({'FC06':[173,170,220,float('nan'),110,230,float('nan')]})
I want to get the output :
df1= pd.DataFrame({'a06':[1,1,2,float('nan'),1,2,float('nan')]})
On R it would be obtained by :
data_a %>% mutate(a06 = ifelse(substr(FC06,1,1)=="1",1,ifelse(substr(FC06,1,1)=="1",2,NaN)))
but I don't find how to do this with Python.
I achieved a first version with just 2 alternatives : NaN or 1, with :
data_a >> mutate(a06=if_else((X['FC06'].apply(pd.isnull)),float('nan'),1)
but I can't find how to differentiate the result according to the first character of FC06.
(I tried things like :
(data_a >> mutate(a06=if_else(X['FC06'].apply(pd.isnull),float('nan'),if_else(X['FC06'].apply(str)[0]=='1',1,2))))
but without success : - [0] doesn't work there to get the first character - and/or str() can't be used with apply (neither str.startswith('1'))
Does anybody knows how to solve such situations ?
Or another package to do that on Python ?
Thank you !!
If you only have 3-digit numbers, you can use floor division:
df['FC06'] //= 100
If you have strings, you can use pd.Series.mask
:
ints = pd.to_numeric(df['FC06'].astype(str).str[:1], errors='coerce')
df['FC06'].mask(df['FC06'].notnull(), ints, inplace=True)
print(df)
FC06
0 1.0
1 1.0
2 2.0
3 NaN
4 1.0
5 2.0
6 NaN
You will notice that your integers become floats. This is forced by the existence of NaN
values, which are considered float
. In general, this shouldn't be a problem.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.