Split pandas dataframe column based on number of digits

Question

I have a pandas dataframe which has two columns key and value, and the value always consists of a 8 digit number something like

>df1
key value
10  10000100
20  10000000
30  10100000
40  11110000

Now I need to take the value column and split it on the digits present, such that my result is a new data frame

>df_res
key 0 1 2 3 4 5 6 7
10  1 0 0 0 0 1 0 0
20  1 0 0 0 0 0 0 0
30  1 0 1 0 0 0 0 0
40  1 1 1 1 0 0 0 0

I cannot change the input data format, the most conventional thing I thought was to convert the value to a string and loop through each digit char and put it in a list, however am looking for something more elegant and faster, kindly help.

EDIT: The input is not in string, it is integer.

Answer 1

This should work:

df.value.astype(str).apply(list).apply(pd.Series).astype(int)

Answer 2

Assuming your input is stored as strings and all have the same length (8, as posed), then the following works:

df1 = pd.concat([df1,pd.DataFrame(columns=range(8))])
df1[list(range(8))] = df1['Value'].apply(lambda x: pd.Series(list(str(x)),index=range(8)))

Answer 3

One approach could be -

arr = df.value.values.astype('S8')
df = pd.DataFrame(np.fromstring(arr, dtype=np.uint8).reshape(-1,8)-48)

Sample run -

In [58]: df
Out[58]: 
   key     value
0   10  10000100
1   20  10000000
2   30  10100000
3   40  11110000

In [59]: arr = df.value.values.astype('S8')

In [60]: pd.DataFrame(np.fromstring(arr, dtype=np.uint8).reshape(-1,8)-48)
Out[60]: 
   0  1  2  3  4  5  6  7
0  1  0  0  0  0  1  0  0
1  1  0  0  0  0  0  0  0
2  1  0  1  0  0  0  0  0
3  1  1  1  1  0  0  0  0

Answer 4

A vectorized version would be:

df['value'].astype(str).str.join(' ').str.split(' ', expand=True)

This first introduces spaces between characters and then splits. It's just a workaround to be able to use str.split (maybe not necessary, not sure). But it is quite faster:

df = pd.DataFrame({'value': np.random.randint(10**7, 10**8, 10**4)})

%timeit df['value'].astype(str).str.join(' ').str.split(' ', expand=True)
10 loops, best of 3: 25.5 ms per loop

%timeit df.value.astype(str).apply(list).apply(pd.Series).astype(int)
1 loop, best of 3: 1.27 s per loop

%timeit df['value'].apply(lambda x: pd.Series(list(str(x)),index=range(8)))
1 loop, best of 3: 1.33 s per loop


%%timeit
arr = df.value.values.astype('S8')
pd.DataFrame(np.fromstring(arr, dtype=np.uint8).reshape(-1,8)-48)

1000 loops, best of 3: 1.14 ms per loop

Update: Divakar's solution seems to be the fastest.

Split pandas dataframe column based on number of digits

Question

4 answers

solution1
9 2016-07-13 16:46:37

solution2
3 2016-07-13 16:45:37

solution3
3 ACCPTED 2016-07-13 16:53:21

solution4
2 2016-07-13 16:53:42

Split pandas dataframe column based on number of digits

Question

4 answers

solution1 9 2016-07-13 16:46:37

solution2 3 2016-07-13 16:45:37

solution3 3 ACCPTED 2016-07-13 16:53:21

solution4 2 2016-07-13 16:53:42

solution1
9 2016-07-13 16:46:37

solution2
3 2016-07-13 16:45:37

solution3
3 ACCPTED 2016-07-13 16:53:21

solution4
2 2016-07-13 16:53:42