简体   繁体   中英

How to extract a particular bits of a specific column of Python pandas dataframe

For a particular column of pandas dataframe, the column is actually a 16 bit data converted to BCD. I want to extract only bit 14-8 of a particular row and convert to BCD. The below formula works for small dataframe as below.

df=pd.DataFrame({'Value':[128,128,436,465], 'Minutes':[1280,16384,1792,1536] })

df['Minutes_1']=df.Minutes.apply(int).apply(bin).str[2:].str[:-8].apply(int, base=2)
df

But when I apply

df['Minutes_1']=df.Minutes.apply(int).apply(bin).str[2:].str[:-8].apply(int, base=2)

for bigger data frame of 688126 rows I get an error saying

invalid literal for int() with base 2: ''

Note:  Few values of the row are 
0, 256,512,768,1024,1280,1536,1792,2048,2304,4096,4352,4608,4864,
5120,5276,5632,5888,6144,6400,8192,8448,8704,8960,9216,9472,9728,9984,10240,10496,12288,
12544,12800,13056,13312,13568,13824,14080,14336,14592,16384,16640,16896,17152,17408,17920,
18176,18432,18688,20480,20736,20992,21248,21504,21760,22016,22272,22528,22784

Error is as below

ValueError Traceback (most recent call last) in 1 df.LO_TIME_0_J2_0 ----> 2 df['Minutes_1']=df.LO_TIME_0_J2_0.apply(int).apply(bin).str[2:].str[:-8].apply(int, base=2) 3 df.LO_TIME_0_J2_0

C:\\ProgramData\\Anaconda3\\lib\\site-packages\\pandas\\core\\series.py in apply(self, func, convert_dtype, args, **kwds) 3192 else: 3193 values = self.astype(object).values -> 3194 mapped = lib.map_infer(values, f, convert=convert_dtype) 3195 3196 if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/src\\inference.pyx in pandas._libs.lib.map_infer()

C:\\ProgramData\\Anaconda3\\lib\\site-packages\\pandas\\core\\series.py in (x) 3179 # handle ufuncs and lambdas 3180 if kwds or args and not isinstance(func, np.ufunc): -> 3181 f = lambda x: func(x, *args, **kwds) 3182 else: 3183 f = func

ValueError: invalid literal for int() with base 2: ''

Please Help

you have a value 0 so when you transform this value to bin, 0 becomes 0b0, so with extract str[2:].str[:-8] you have no value.

I suggest you to apply a zfill(16) between extraction to pad with 0:

df['Minutes_1'] = df.Minutes.apply(int).apply(bin).str[2:].str.zfill(16).str[:-8].apply(int, base=2)

maybe using astype is faster than apply(int):


df['Minutes_1'] = df.Minutes.astype(int).apply(bin).str[2:].str.zfill(16).str[:-8].apply(int, base=2)

Example:

df = pd.DataFrame( {'Minutes': [1280, 16384, 1792, 1536, 0, 256]})                                    
df['Minutes_1'] = df.Minutes.apply(int).apply(bin).str[2:].str.zfill(16).str[:-8].apply(int, base=2)  

output:

   Minutes  Minutes_1  
0     1280          5  
1    16384         64  
2     1792          7  
3     1536          6  
4        0          0  
5      256          1  

Without zfill, you have an error:

ValueError: invalid literal for int() with base 2: ''

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM