['0' '58699' '443' '55420' '53' '1900' '80' '0xb058' '0xacd9' '0xc0a8'
'0x1432' '0x0000' '123' '67' '5353' '2104' '547' '1' '53290' '4805'
'2151' '58767' '27643' '58652' '64416' '62529' '55952' '57286' '64466'
'50497' '0xa29f' '0x2d8e' '0x5b79' '0xb0eb' '0x87b5' '0x8efa' '0xd83a'
'52142' '52138' '52920' '60162' '54214' '50848' '56986' '50367' '49460'
'55963' '53327' '52022' '57400' '51755' '52834' '54183' '62724' '54871'
'59845' '56309' '61878' '58326' '56686']
The column's unique values look like this. When I run:
df[df.DstPort.apply(lambda x: x.isnumeric())].set_index('DstPort')
It takes too long to process because it has 250k rows and I was not able to see the result too. My concern is that they are not numerical all. Like '443', '80' instead of 443, 80 and there are 0xb0eb. How can I get rid of 0xb0eb them and convert this column to int datatype?
Those are actually integers, just represented in a different base (base 16, also known as hexadecimal). The int()
function takes an optional second argument for the base. We can check if a string consists only of numeric characters, and if so use 10 as the base, 16 otherwise:
df.DstPort.apply(lambda x: int(x, 10 if x.isnumeric() else 16))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.