How to use conditionnal statement with startswith() on Python - dfply?

Question

I'm doing data wrangling on Python, using the package dfply.

I want to create a new variable "a06", from 'FC06' of the dataset data_a, so that :

a06 = 1 if FC06[i] starts with the character "1" (ex : FC06[i]=173)
a06 = 2 if FC06[i] starts with the character "2"
a06 = NaN if FC06[i] = NaN

For instance, with the input :

df = pd.DataFrame({'FC06':[173,170,220,float('nan'),110,230,float('nan')]})

I want to get the output :

df1= pd.DataFrame({'a06':[1,1,2,float('nan'),1,2,float('nan')]})

On R it would be obtained by :

data_a %>% mutate(a06 = ifelse(substr(FC06,1,1)=="1",1,ifelse(substr(FC06,1,1)=="1",2,NaN)))

but I don't find how to do this with Python.

I achieved a first version with just 2 alternatives : NaN or 1, with :

data_a >>        mutate(a06=if_else((X['FC06'].apply(pd.isnull)),float('nan'),1)

but I can't find how to differentiate the result according to the first character of FC06.

(I tried things like :

(data_a >> mutate(a06=if_else(X['FC06'].apply(pd.isnull),float('nan'),if_else(X['FC06'].apply(str)[0]=='1',1,2))))

but without success : - [0] doesn't work there to get the first character - and/or str() can't be used with apply (neither str.startswith('1'))

Does anybody knows how to solve such situations ?

Or another package to do that on Python ?

Thank you !!

Answer 1

If you only have 3-digit numbers, you can use floor division:

df['FC06'] //= 100

If you have strings, you can use pd.Series.mask :

ints = pd.to_numeric(df['FC06'].astype(str).str[:1], errors='coerce')
df['FC06'].mask(df['FC06'].notnull(), ints, inplace=True)

print(df)

   FC06
0   1.0
1   1.0
2   2.0
3   NaN
4   1.0
5   2.0
6   NaN

You will notice that your integers become floats. This is forced by the existence of NaN values, which are considered float . In general, this shouldn't be a problem.

How to use conditionnal statement with startswith() on Python - dfply?

Question

1 answers

solution1
0 ACCPTED 2018-08-13 13:24:25

How to use conditionnal statement with startswith() on Python - dfply?

Question

1 answers

solution1 0 ACCPTED 2018-08-13 13:24:25

solution1
0 ACCPTED 2018-08-13 13:24:25