如何从给定文本文件的十六进制数中提取特定位数

Question

This is input file: input.txt这是输入文件： input.txt

PS name         above bit      below bit      original            1_info           2_info            new      
PS_AS_0         PS_00[31]      PS_00[00]      0x00000000          0x156A17[00]     0x156A17[31]      0x0003F4a1 
PS_RST_D2       PS_03[05]      PS_03[00]      0x00000003          0x1678A1[00]     0x1678A1[05]      0x0a56F001
PS_N_YD_C       PS_03[06]      PS_03[06]      0x00000000          0x1678A1[06]     0x1678A1[06]      0x0a56F001
PS_1_FG         PS_03[31]      PS_03[07]      0x000000FF          0x1678A1[07]     0x1678A1[31]      0x0a56F001
PS_F_23_ASD     PS_04[07]      PS_03[00]      0x00000000          0x18C550[00]     0x18C550[07]      0x00000000
PS_A_0_STR      PS_04[15]      PS_04[08]      0x00000FFF          0x18C550[08]     0x18C550[15]      0x00000000
PS_AD_0         PS_04[31]      PS_04[16]      0x00000000          0x18C550[16]     0x18C550[31]      0x00000000

here i need to extract the bits in this way:在这里我需要以这种方式提取位：

if value of new = 0x0a56F001 then first i need that to be converted to binary 0000 1010 0101 0110 1111 0000 0000 0001 .如果new = 0x0a56F001的值，那么首先我需要将其转换为二进制0000 1010 0101 0110 1111 0000 0000 0001 。

Then check above bit and below bit column.然后检查above bit和below bit列。

for eg: PS_03[05] PS_03[00] then take 0 to 5th bit of new binary value which is 000001 which is 0x1 and then convert this to 32 bit value ie 0x00000001 .例如： PS_03[05] PS_03[00]然后取新二进制值的第 0 位到第 5 位，即000001 ，即0x1 ，然后将其转换为 32 位值，即0x00000001 。 and replace new column of that row with this value.并用该值替换该行的新列。

PS_RST_D2       PS_03[05]      PS_03[00]      0x00000003          0x1678A1[00]     0x1678A1[05]      0x00000001

similarly for all and finally the output file should look like this:同样，对于所有文件，最后 output 文件应如下所示：

PS name         above bit      below bit      original            1_info           2_info            new      
PS_AS_0         PS_00[31]      PS_00[00]      0x00000000          0x156A17[00]     0x156A17[31]      0x0003F4a1 
PS_RST_D2       PS_03[05]      PS_03[00]      0x00000003          0x1678A1[00]     0x1678A1[05]      0x00000001
PS_N_YD_C       PS_03[06]      PS_03[06]      0x00000000          0x1678A1[06]     0x1678A1[06]      0x00000000
PS_1_FG         PS_03[31]      PS_03[07]      0x000000FF          0x1678A1[07]     0x1678A1[31]      0x0014ADE0
PS_F_23_ASD     PS_04[07]      PS_03[00]      0x00000000          0x18C550[00]     0x18C550[07]      0x00000000
PS_A_0_STR      PS_04[15]      PS_04[08]      0x00000FFF          0x18C550[08]     0x18C550[15]      0x00000000
PS_AD_0         PS_04[31]      PS_04[16]      0x00000000          0x18C550[16]     0x18C550[31]      0x00000000

Is this possible in Python?这在 Python 中可能吗？ This is current attempt:这是当前的尝试：

with open("input.txt") as fin:
    with open("output.txt", "w") as fout:
         for line in fin:
             if line.strip():
                 line = line.strip("\n' '")
                 cols = l.split(" ")
                 cols[6] = int(cols[6],16)

i tried by selecting specific column but it is not working.我尝试选择特定的列，但它不起作用。

Answer 1

For reading input-Data like this I like to use pandas .为了读取这样的输入数据，我喜欢使用pandas 。 (update at the end of answer) （在答案末尾更新）

To get the number of the above and the below bit, you can use indexing of the string like:要获取上方和下方位的编号，您可以使用字符串索引，例如：

sAboveBit ="PS_03[05]"
iAboveBit = int(sAboveBit[-3:-1])

Or much safer:或者更安全：

iAboveBit = int(sAboveBit.split("[")[-1].split("]")[0])

For creating the new value, you could use a bitwise-AND with an integer which you can calculate with your aboveBit and belowBit要创建新值，您可以使用按位与 integer，您可以使用 aboveBit 和 belowBit 计算

first way I think of is a for loop:我想到的第一种方法是 for 循环：

iSumUp = 0
for i in range(iBelowBit,iAboveBit+1):
    iSumUp+=2**i

To getting your number in hex you can use the module/package bitstring.要以十六进制获取您的号码，您可以使用模块/包位串。

import bitstring as bs
sOldNew = "0x0a56F001"
iOldNew = bs.BitArray(sOldNew).uint

Now you can use a bitwise AND现在您可以使用按位 AND

iNewNew = iOldNew & iSumUp

And finally create your new hex-string with a formatted string.最后使用格式化字符串创建新的十六进制字符串。

sNewNew = f"0x{iNewNew:08x}"

At least save your date to your (new) file, for which I also prefer using pandas.至少将您的日期保存到您的（新）文件中，为此我也更喜欢使用 pandas。

Update:更新：

For reading your data with pandas:使用 pandas 读取数据：

import pandas as pd
df =pd.read_csv(r'input.txt',delimiter="\t")
print(df)

Answer 2

You can use split to split the lines, then a regex to extract the above and below values.您可以使用split来拆分行，然后使用正则表达式来提取上面和下面的值。

To compute the new value, you can only keep the (above_bit + 1) least signicant bits with a bitwise and with 2**n - 1 , and then right shift the result by below_bit.要计算新值，您只能使用按位和2**n - 1保留 (above_bit + 1) 最低有效位，然后将结果右移 below_bit。

Possible code:可能的代码：

import re

# compile the regex
bit_re = re.compile(r'.*\[(\d{2})\]')

with open("input.txt") as fin, open("output.txt", "w") as fout:
    line = next(fin)          # skip header line
    fout.write(line)
    for line in fin:
        row = line.split()    # extract fields
        # print(row)          # uncomment for traces
        # extract above and below values
        above = int(bit_re.match(row[1]).group(1))
        below = int(bit_re.match(row[2]).group(1))
        val = int(row[6],16) & (2**(above +1) - 1)
        val = val >> below & (2**(above +1) - 1)
        row[6] = format(val, '#010x')    # format the result as a 32 bits hex number
        print(*row, file=fout)

with for sample data it gives as expected:对于示例数据，它按预期提供：

PS name         above bit      below bit      original            1_info           2_info            new      
PS_AS_0 PS_00[31] PS_00[00] 0x00000000 0x156A17[00] 0x156A17[31] 0x0003f4a1
PS_RST_D2 PS_03[05] PS_03[00] 0x00000003 0x1678A1[00] 0x1678A1[05] 0x00000001
PS_N_YD_C PS_03[06] PS_03[06] 0x00000000 0x1678A1[06] 0x1678A1[06] 0x00000000
PS_1_FG PS_03[31] PS_03[07] 0x000000FF 0x1678A1[07] 0x1678A1[31] 0x0014ade0
PS_F_23_ASD PS_04[07] PS_03[00] 0x00000000 0x18C550[00] 0x18C550[07] 0x00000000
PS_A_0_STR PS_04[15] PS_04[08] 0x00000FFF 0x18C550[08] 0x18C550[15] 0x00000000
PS_AD_0 PS_04[31] PS_04[16] 0x00000000 0x18C550[16] 0x18C550[31] 0x00000000

You could get a better formatting by replacing the end of line with the new value...您可以通过用新值替换line尾来获得更好的格式...

Answer 3

The first problem is that you have many spaces.第一个问题是你有很多空间。 When splitting at the space, you get a lot of empty columns.在空格处拆分时，您会得到很多空列。 Replace many spaces with a single one first:首先用一个空格替换多个空格：

import re
line = re.sub(' +', ' ', line)

Then, 0x0a56F001 is a hexadecimal number.那么， 0x0a56F001就是一个十六进制数。 To read it from the text file, use int(cols[6], 16) , not int(cols[6], 2) , which attempts to read it as binary.要从文本文件中读取它，请使用int(cols[6], 16) ，而不是int(cols[6], 2) ，它会尝试将其读取为二进制文件。

You can then get a 32 digit binary string like this然后你可以得到一个像这样的 32 位二进制字符串

number = int(cols[6],16)
binary_string = f"{number:032b}"

Now do the slicing, then convert it back with现在进行切片，然后将其转换回来

sliced_number = int( ..., 2)

如何从给定文本文件的十六进制数中提取特定位数

问题描述

3 个解决方案

解决方案1
1 2022-11-15 07:42:09

解决方案2
1 已采纳 2022-11-15 09:24:17

解决方案3
0 2022-11-15 07:07:12

如何从给定文本文件的十六进制数中提取特定位数

问题描述

3 个解决方案

解决方案1 1 2022-11-15 07:42:09

解决方案2 1 已采纳 2022-11-15 09:24:17

解决方案3 0 2022-11-15 07:07:12

解决方案1
1 2022-11-15 07:42:09

解决方案2
1 已采纳 2022-11-15 09:24:17

解决方案3
0 2022-11-15 07:07:12