如何從字符串（Wireshark 輸出）中提取某些子字符串（IP 地址）？

Question

我正在逐行讀取文本文件中的 Wireshark 轉儲內容。 我可以輕松挑選的一件事是在wireshark輸出的特定行中使用的協議（如下面的代碼所示）。 我遇到的問題是從線路中取出 ip 地址。 正如您在下面的示例輸出和我的代碼中看到的那樣，提取協議相當容易，因為它總是大寫，並且兩邊都有一個空格。 然而，IP 地址並不統一，我也不太確定如何將它們拉出來。 這主要是因為我不太確定re.match()所有部分是如何工作的。 有人可以幫我解決這個問題，並可能解釋re.match()參數是如何工作的嗎？

file = open('tcpdump.txt', 'r');
     for line in file:
          matchObj = re.match(r'(.*) TCP (.*?) .*', line, re.M)

Wireshark 輸出示例：

604 1820.381625 10.200.59.77 -> 114.113.226.43 TCP 54 ssh > 47820 [FIN, ACK] Seq=1848 Ack=522 Win=16616 Len=0

Answer 1

第一個正則表達式組是greedy (.*)並匹配一切，你可以把它non-greedy加入? ， IE：

file = open('tcpdump.txt', 'r');
     for line in file:
          matchObj = re.match(r"->\s(.*?)\s(\w+)\s(.*?)\s", line, re.M)

上面的例子是將分別捕獲包含遠程地址114.113.226.43 、協議TCP和端口54 3 個組。

Regex101 演示

Answer 2

首先查看正則表達式文檔，對於 python，它在這里：

https://docs.python.org/3/library/re.html

也有許多站點提供很好的教程、示例和交互式測試器，例如：

http://regexr.com/

我不知道wireshark的輸出格式，但我想它在某處有記錄。

這應該會得到你的 IP 地址：

\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b

Answer 3

正如人們已經回應的那樣，正則表達式是要走的路。 用於此目的的示例代碼

import unittest
import re
from collections import namedtuple

protocol = namedtuple('Protocol', ['source', 'destination', 'protocol'])


def parse_wireshark(input):
  pat = re.findall(r"(\d+\.\d+\.\d+\.\d+) -> (\d+\.\d+\.\d+\.\d+) (\w+)", input)[0]
  return protocol(source=pat[0], destination=pat[1], protocol=pat[2])

class TestWireShark(unittest.TestCase):

  def test_sample(self):
    self.assertEqual(parse_wireshark("604 1820.381625 10.200.59.77 -> 114.113.226.43 TCP 54 ssh > 47820 [FIN, ACK] Seq=1848 Ack=522 Win=16616 Len=0"),
                     protocol(source='10.200.59.77',
                              destination='114.113.226.43',
                              protocol='TCP'))
if __name__ == '__main__':
   unittest.main()

如何從字符串（Wireshark 輸出）中提取某些子字符串（IP 地址）？

問題描述

3 個解決方案

解決方案1
1 2016-04-29 01:57:57

解決方案2
0 2016-04-29 01:28:07

解決方案3
0 2016-04-29 02:01:30

如何從字符串（Wireshark 輸出）中提取某些子字符串（IP 地址）？

問題描述

3 個解決方案

解決方案1 1 2016-04-29 01:57:57

解決方案2 0 2016-04-29 01:28:07

解決方案3 0 2016-04-29 02:01:30

解決方案1
1 2016-04-29 01:57:57

解決方案2
0 2016-04-29 01:28:07

解決方案3
0 2016-04-29 02:01:30