简体   繁体   English

如何从 pandas 中的特定列中删除非数字值?

[英]How do I remove non-numeric values from specific column in pandas?

['0' '58699' '443' '55420' '53' '1900' '80' '0xb058' '0xacd9' '0xc0a8'
 '0x1432' '0x0000' '123' '67' '5353' '2104' '547' '1' '53290' '4805'
 '2151' '58767' '27643' '58652' '64416' '62529' '55952' '57286' '64466'
 '50497' '0xa29f' '0x2d8e' '0x5b79' '0xb0eb' '0x87b5' '0x8efa' '0xd83a'
 '52142' '52138' '52920' '60162' '54214' '50848' '56986' '50367' '49460'
 '55963' '53327' '52022' '57400' '51755' '52834' '54183' '62724' '54871'
 '59845' '56309' '61878' '58326' '56686']

The column's unique values look like this.该列的唯一值如下所示。 When I run:当我跑步时:

df[df.DstPort.apply(lambda x: x.isnumeric())].set_index('DstPort')

It takes too long to process because it has 250k rows and I was not able to see the result too.处理时间太长,因为它有 250k 行,而且我也看不到结果。 My concern is that they are not numerical all.我担心的是它们并不全是数字。 Like '443', '80' instead of 443, 80 and there are 0xb0eb.像'443','80'代替443、80还有0xb0eb。 How can I get rid of 0xb0eb them and convert this column to int datatype?我怎样才能摆脱 0xb0eb 它们并将此列转换为 int 数据类型?

Those are actually integers, just represented in a different base (base 16, also known as hexadecimal).这些实际上是整数,只是以不同的基数(基数 16,也称为十六进制)表示。 The int() function takes an optional second argument for the base. int() function 采用可选的第二个基数参数。 We can check if a string consists only of numeric characters, and if so use 10 as the base, 16 otherwise:我们可以检查一个字符串是否只包含数字字符,如果是,则使用 10 作为基数,否则使用 16:

df.DstPort.apply(lambda x: int(x, 10 if x.isnumeric() else 16))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM