简体   繁体   中英

How to get raw hex values from pcap file?

I've been playing around with scapy and want to read through and analyse every hex byte. So far I've been using scapy simply because I don't know another way currently. Before just writing tools myself to go through the pcap files I was wondering if there was an easy way to do it. Here's what I've done so far.

packets = rdpcap('file.pcap')
tcpPackets = []
  for packet in packets:
    if packet.haslayer(TCP):
      tcpPackets.append(packet)

When I run type(tcpPackets[0]) the type I get is:

<class 'scapy.layers.l2.Ether'>

Then when I try to covert the Ether object into a string it gives me a mix of hex and ascii (as noted by the random parenthesis and brackets).

str(tcpPackets[0])
"b'$\\xa2\\xe1\\xe6\\xee\\x9b(\\xcf\\xe9!\\x14\\x8f\\x08\\x00E\\x00\\x00[:\\xc6@\\x00@\\x06\\x0f\\xb9\\n\\x00\\x01\\x04\\xc6)\\x1e\\xf1\\xc0\\xaf\\x07[\\xc1\\xe1\\xff0y<\\x11\\xe3\\x80\\x18 1(\\xb8\\x00\\x00\\x01\\x01\\x08\\n8!\\xd1\\x888\\xac\\xc2\\x9c\\x10%\\x00\\x06MQIsdp\\x03\\x02\\x00\\x05\\x00\\x17paho/34AAE54A75D839566E'"

I have also tried using hexdump but I can't find a way to parse through it.

I can't find the proper dupe now, but this is just a miss-use/miss-understanding of str() . The original data is in a bytes format, for instance x = b'moo' .

When str() retrieves your bytes string, it will do so by calling the __str__ function of the bytes class/object. That will return a representation of itself. The representation will keep b at the beginning because it's believed to distinguish and make it easier for humans to understand that it's a bytes object, as well as avoid encoding issues I guess (alltho that's speculations) .

Same as if you tried accessing tcpPackets[0] from a terminal, it would call __repr__ and show you something like <class 'scapy.layers.l2.Ether'> most likely.

As an example code you can experiment with, try this out:

class YourEther(bytes):
    def __str__(self):
        return '<Made Up Representation>'

print(YourEther())

Obviously scapy's returns another representation, not just a static string that says "made up representation". But you probably get the idea.

So in the case of <class 'scapy.layers.l2.Ether'> it's __repr__ or __str__ function probably returns b'$\\xa2\\....... instead of just it's default class representation (some correction here might be in place tho as I don't remember/know all the technical namification of the behaviors) .

As a workaround, this might fix your issue:

hexlify(str(tcpPackets[0]))

All tho you probably have to account for the prepended b' as well as trailing ' and remove those accordingly. (Note: " won't be added in the beginning or end, those are just a second representation in your console when printing. They're not actually there in terms of data)

Scapy is probably more intended to use tcpPackets[0].dst rather than grabing the raw data. But I've got very little experience with Scapy, but it's an abstraction layer for a reason and it's probably hiding the raw data or it's in the core docs some where which I can't find right now.

More info on the __str__ description: Does python `str()` function call `__str__()` function of a class?

Last note, and that is if you actually want to access the raw data, it seams like you can access it with the Raw class: Raw load found, how to access?

You can put all the bytes of a packet into a numpy array as follows:

for p in tcpPackets:
    raw_pack_data = np.frombuffer(p.load, dtype = np.uint8)
    # Manipulate the bytes stored in raw_pack_data as you like.

This is fast. In my case, rdpcap takes ~20 times longer than putting all the packets into a big array in a similar for loop for a 1.5GB file.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM