[英]Regex processing of input data, subsequent visualization using Python and histogram
目前,我有成千上萬種以下形式的記錄:
0x4f0DAA112142FFC4BA1B9f3B76bcd238A094D65B_000000 82557
0x4f0DAA112142FFC4BA1B9f3B76bcd238A094D65B_000001 128805
0x4f0DAA112142FFC4BA1B9f3B76bcd238A094D65B_000002 94990
0x4f0DAA112142FFC4BA1B9f3B76bcd238A094D65B_000003 121020
0x4f0DAA112142FFC4BA1B9f3B76bcd238A094D65B_000004 58111390
0x4f0DAA112142FFC4BA1B9f3B76bcd238A094D65B_000005 167079
0x4f0DAA112142FFC4BA1B9f3B76bcd238A094D65B_000006 130795
0x4f0DAA112142FFC4BA1B9f3B76bcd238A094D65B_000007 236926
0x4f0DAA112142FFC4BA1B9f3B76bcd238A094D65B_000008 24754217
0x4f0DAA112142FFC4BA1B9f3B76bcd238A094D65B_000009 75407
0x4f0DAA112142FFC4BA1B9f3B76bcd238A094D65B_000010 136461
0x4f0DAA112142FFC4BA1B9f3B76bcd238A094D65B_000011 136748
0x4f0DAA112142FFC4BA1B9f3B76bcd238A094D65B_000012 146258
0x4f0DAA112142FFC4BA1B9f3B76bcd238A094D65B_000013 381091
0x4f0DAA112142FFC4BA1B9f3B76bcd238A094D65B_000014 129815
在簡單的電子表格程序中,可視化此數據的一些記錄很簡單,如下所示:
我一直在嘗試修改此代碼以使其可視化,但迄今未成功:
# Call like this:
#
# python opcode-farmer.py 'tst21' '6005600401'
#
import re
import numpy as np
import matplotlib.pyplot as plt
import csv
import sys
import pprint
import itertools
import subprocess
import collections
def my_test_func(filename, data):
with open(filename, 'w') as fd:
fd.write(data)
fd.write('\n')
return subprocess.check_output(['evm', 'disasm', filename])
if '__main__' == __name__:
file_name = sys.argv[1]
byte_code = sys.argv[2]
status = my_test_func(file_name, byte_code)
opcodes_list = list()
for element in status.split('\n'):
result = re.search(r"\b[A-Z].+", element)
if result:
# eliminate individual 0x05 specification
simple_opcode = re.sub(r'\s(.*)', '', result.group(0))
opcodes_list.append(simple_opcode)
# Count up the values
cnt = collections.Counter()
for word in opcodes_list:
cnt[word] += 1
print(cnt)
# THRESHOLD
threshold = 30
cnt = collections.Counter(record for record in cnt.elements() if cnt[record] >= threshold)
# VISUALIZATION
# Transpose the data to get the x and y values
labels, values = zip(*cnt.items())
# generates this representation: [0 1 2 3 4 5 6 7],
# from the number of the length
indexes = np.arange(len(labels))
width = 1
plt.xlabel("most common opcodes in tx")
plt.ylabel("number of occurances")
plt.bar(indexes, values, width)
plt.xticks(indexes + width * 0.5, labels)
plt.show()
如何遍歷上面指定的那些輸入記錄,以便消除前綴0x4f0DAA112142FFC4BA1B9f3B76bcd238A094D65B_
,然后在Python 0x4f0DAA112142FFC4BA1B9f3B76bcd238A094D65B_
它們呈現為直方圖?
您可以嘗試以下方法:
import re
data = [b for b in [re.split("\s+", i.strip('\n')) for i in open('filename.txt')] if len(b) > 1]
final_data = [[int(re.sub("\w+_", '', a)), int(b)] for a, b in data]
我在您提供的數據上運行了此代碼,並得到了輸出:
[[0, 82557], [1, 128805], [2, 94990], [3, 121020], [4, 58111390], [5, 167079], [6, 130795], [7, 236926], [8, 24754217], [9, 75407], [10, 136461], [11, 136748], [12, 146258], [13, 381091], [14, 129815]]
全部放在一起...
import re
import re
import numpy as np
import matplotlib.pyplot as plt
import csv
import sys
import pprint
import itertools
import subprocess
import collections
data = [b for b in [re.split("\s+", i.strip('\n')) for i in open('40000_output.txt')] if len(b) > 1]
final_data = [[int(re.sub("\w+_", '', a)), int(b)] for a, b in data]
# VISUALIZATION
# Transpose the data to get the x and y values
labels, values = zip(*final_data)
# generates this representation: [0 1 2 3 4 5 6 7],
# from the number of the length
indexes = np.arange(len(labels))
width = 1
plt.xlabel("most common opcodes in tx")
plt.ylabel("number of occurances")
plt.bar(indexes, values, width)
plt.xticks(indexes + width * 0.5, labels)
plt.show()
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.