Pandas Dataframe：如何将整数解析为0和1的字符串？

Question

I have the following pandas DataFrame. 我有以下pandas DataFrame。

import pandas as pd
df = pd.read_csv('filename.csv')

print(df)

      sample      column_A         
0     sample1        6/6    
1     sample2        0/4
2     sample3        2/6    
3     sample4       12/14   
4     sample5       15/21   
5     sample6       12/12   
..    ....

The values in column_A are not fractions, and these data must be manipulated such that I can convert each value into 0s and 1s (not convert the integers into their binary counterparts). column_A中的值不是分数，必须操纵这些数据，以便我可以将每个值转换为0s和1s （不将整数转换为它们的二进制对应）。

The "numerator" above gives the total number of 1s , while the "denominator" gives the total number of 0s and 1s together. 的“分子”上面给出的总数1s ，而“分母”给出的总数0s和1s到一起。

So, the table should actually be in the following format: 因此，该表实际上应采用以下格式：

      sample      column_A         
0     sample1     111111    
1     sample2     0000
2     sample3     110000    
3     sample4     11111111111100    
4     sample5     111111111111111000000 
5     sample6     111111111111  
..    ....

I've never parsed an integer to output strings of 0s and 1s like this. 我从来没有解析过整数来输出像这样的0和1的字符串。 How does one do this? 怎么做到这一点？ Is there a "pandas method" to use with lambda expressions? 是否有一个“pandas方法”与lambda表达式一起使用？ Pythonic string parsing or regex? Pythonic字符串解析还是正则表达式？

Answer 1

First, suppose you write a function: 首先，假设你写了一个函数：

def to_binary(s):
    n_d = s.split('/')
    n, d = int(n_d[0]), int(n_d[1])
    return '1' * n + '0' * (d - n)

So that, 以便，

>>> to_binary('4/5')
'11110'

Now you just need to use pandas.Series.apply : 现在你只需要使用pandas.Series.apply ：

 df.column_A.apply(to_binary)

Answer 2

An alternative: 替代：

df2 = df['column_A'].str.split('/', expand=True).astype(int)\
                    .assign(ones='1').assign(zeros='0')

df2
Out: 
    0   1 ones zeros
0   6   6    1     0
1   0   4    1     0
2   2   6    1     0
3  12  14    1     0
4  15  21    1     0
5  12  12    1     0

(df2[0] * df2['ones']).str.cat((df2[1]-df2[0])*df2['zeros'])
Out: 
0                   111111
1                     0000
2                   110000
3           11111111111100
4    111111111111111000000
5             111111111111
dtype: object

Note: I was actually trying to find a faster alternative thinking apply would be slow but this one turns out to be slower. 注意：我实际上试图找到一个更快的替代思维应用会很慢，但这个结果会变慢。

Answer 3

Here are some alternative solutions using extract() and .str.repeat() methods: 以下是使用extract（）和.str.repeat（）方法的一些替代解决方案：

In [187]: x = df.column_A.str.extract(r'(?P<ones>\d+)/(?P<len>\d+)', expand=True).astype(int).assign(o='1', z='0')

In [188]: x
Out[188]:
   ones  len  o  z
0     6    6  1  0
1     0    4  1  0
2     2    6  1  0
3    12   14  1  0
4    15   21  1  0
5    12   12  1  0

In [189]: x.o.str.repeat(x.ones) + x.z.str.repeat(x.len-x.ones)
Out[189]:
0                   111111
1                     0000
2                   110000
3           11111111111100
4    111111111111111000000
5             111111111111
dtype: object

or a slow (two apply() ) one-liner: 或缓慢（两个apply() ）单行：

In [190]: %paste
(df.column_A.str.extract(r'(?P<one>\d+)/(?P<len>\d+)', expand=True)
   .astype(int)
   .apply(lambda x: ['1'] * x.one + ['0'] * (x.len-x.one), axis=1)
   .apply(''.join)
)
## -- End pasted text --
Out[190]:
0                   111111
1                     0000
2                   110000
3           11111111111100
4    111111111111111000000
5             111111111111
dtype: object

Pandas Dataframe：如何将整数解析为0和1的字符串？

问题描述

3 个解决方案

解决方案1
6 已采纳 2016-07-25 15:16:12

解决方案2
4 2016-07-25 15:35:17

解决方案3
1 2016-07-25 18:20:21

Pandas Dataframe：如何将整数解析为0和1的字符串？

问题描述

3 个解决方案

解决方案1 6 已采纳 2016-07-25 15:16:12

解决方案2 4 2016-07-25 15:35:17

解决方案3 1 2016-07-25 18:20:21

解决方案1
6 已采纳 2016-07-25 15:16:12

解决方案2
4 2016-07-25 15:35:17

解决方案3
1 2016-07-25 18:20:21