[英]How to split CSV columns?
我正在使用以下代碼將字典/json 數據導出到 CSV,我試圖將“process_hash”列拆分為兩列,以便 MD5 一列,SHA256 另一列,以及其他現有列。
“process_hash”列當前包含列表值,我不確定如何將它們拆分為 MD5 和 SHA256 列?
[{'device_name': 'fk6sdc2',
'device_timestamp': '2020-10-27T00:50:46.176Z',
'event_id': '9b1bvf6e17ee11eb81b',
'process_effective_reputation': 'LIST',
'process_hash': ['bfc7dcf5935830f3a9df8e9b6425c37a',
'ca9f3a24506cc518fc939a33c100b2d557f96e040f712f6dd4641ad1734e2f19'],
'process_name': 'c:\\program files '
'(x86)\\toh122soft\\thcasdf3\\toho34rce.exe',
'process_username': ['JOHN\\user1']},
{'device_name': 'fk6sdc2',
'device_timestamp': '2020-10-27T00:50:46.176Z',
'event_id': '9b151f6e17ee11eb81b',
'process_effective_reputation': 'LIST',
'process_hash': ['bfc7dcf5935f3a9df8e9b6830425c37a',
'ca9f3a24506cc518fc939a33c100b2d557f96e040f712f6dd4641ad1734e2f19'],
'process_name': 'c:\\program files (x86)\\oft\\tf3\\tootsice.exe',
'process_username': ['JOHN\\user2']},
{'device_name': '6asdsdc2',
'device_timestamp': '2020-10-27T00:50:46.176Z',
'event_id': '9b151f698e11eb81b',
'process_effective_reputation': 'LIST',
'process_hash': ['9df8ebfc7dcf5935830f3a9b6425c37a',
'ca9f3a24506cc518ff6ddc939a33c100b2d557f96e040f7124641ad1734e2f19'],
'process_name': 'c:\\program files (x86)\\toht\\th3\\tohce.exe',
'process_username': ['JOHN\\user3']}]
導出到csv的代碼:
def toCSV(res):
with open('EnrichedEvents.csv', 'w', newline='', encoding='utf-8') as csvfile:
fieldnames = ['process_hash', 'process_name', "process_effective_reputation"]
dict_writer = csv.DictWriter(csvfile, fieldnames=fieldnames, extrasaction='ignore')
dict_writer.writeheader()
entries = set()
for data in res:
val = tuple(','.join(v) if isinstance(v, list) else v for v in data.values())
if val not in entries:
dict_writer.writerow(data)
entries.add(val)
csv數據:
process_hash process_name process_effective_reputation
['f810a809e9cdf70c3189008e07c83619', '58d44528b60d36b515359fe234c9332ccef6937f5c950472230ce15dca8812e2'] c:\windows\system32\delltpad\apmsgfwd.exe ADAPTIVE_WHITE_LIST
['73ca11f2acf1adb7802c2914e1026db899a3c851cd9500378c0045e0'] c:\users\zdr3dds01\documents\sap\sap gui\export.mhtml NOT_LISTED
['f810a809e9cdf70c3189008e07c83619', '58d44528b60d36b515359fe234c9332ccef6937f5c950472230ce15dca8812e2'] c:\windows\system32\delltpad\apmsgfwd.exe ADAPTIVE_WHITE_LIST
['f810a809e9cdf70c3189008e07c83619', '58d44528b60d36b515359fe234c9332ccef6937f5c950472230ce15dca8812e2'] c:\windows\system32\delltpad\apmsgfwd.exe ADAPTIVE_WHITE_LIST
['582f018bc7a732d63f624d6f92b3d143', '66505bcb9975d61af14dd09cddd9ac0d11a3e2b5ae41845c65117e7e2b046d37'] c:\users\jij09\appdata\local\kingsoft\power word 2016\2016.3.3.0368\powerword.exe ADAPTIVE_WHITE_LIST
我試圖用 CSV 文件實現的目標:
md5 sha256 process_name process_effective_reputation
這是一種方法。 函數apply()
將一個列表轉換為多列。
import pandas as pd
data = [{'device_name': 'fk6sdc2',
'device_timestamp': '2020-10-27T00:50:46.176Z',
'event_id': '9b1bvf6e17ee11eb81b',
'process_effective_reputation': 'LIST',
'process_hash': ['bfc7dcf5935830f3a9df8e9b6425c37a',
'ca9f3a24506cc518fc939a33c100b2d557f96e040f712f6dd4641ad1734e2f19'],
'process_name': 'c:\\program files '
'(x86)\\toh122soft\\thcasdf3\\toho34rce.exe',
'process_username': ['JOHN\\user1']},
{'device_name': 'fk6sdc2',
'device_timestamp': '2020-10-27T00:50:46.176Z',
'event_id': '9b151f6e17ee11eb81b',
'process_effective_reputation': 'LIST',
'process_hash': ['bfc7dcf5935f3a9df8e9b6830425c37a',
'ca9f3a24506cc518fc939a33c100b2d557f96e040f712f6dd4641ad1734e2f19'],
'process_name': 'c:\\program files (x86)\\oft\\tf3\\tootsice.exe',
'process_username': ['JOHN\\user2']},
{'device_name': '6asdsdc2',
'device_timestamp': '2020-10-27T00:50:46.176Z',
'event_id': '9b151f698e11eb81b',
'process_effective_reputation': 'LIST',
'process_hash': ['9df8ebfc7dcf5935830f3a9b6425c37a',
'ca9f3a24506cc518ff6ddc939a33c100b2d557f96e040f7124641ad1734e2f19'],
'process_name': 'c:\\program files (x86)\\toht\\th3\\tohce.exe',
'process_username': ['JOHN\\user3']}]
現在處理數據:
df = pd.DataFrame(data, columns=['process_hash', 'process_name', 'process_effective_reputation'])
df[['md5', 'sha256']] = df['process_hash'].apply(lambda x: pd.Series(x))
df = df.drop(columns='process_hash')
最后是結果:
print(df)
process_name \
0 c:\program files (x86)\toh122soft\thcasdf3\toh...
1 c:\program files (x86)\oft\tf3\tootsice.exe
2 c:\program files (x86)\toht\th3\tohce.exe
process_effective_reputation md5 \
0 LIST bfc7dcf5935830f3a9df8e9b6425c37a
1 LIST bfc7dcf5935f3a9df8e9b6830425c37a
2 LIST 9df8ebfc7dcf5935830f3a9b6425c37a
sha256
0 ca9f3a24506cc518fc939a33c100b2d557f96e040f712f...
1 ca9f3a24506cc518fc939a33c100b2d557f96e040f712f...
2 ca9f3a24506cc518ff6ddc939a33c100b2d557f96e040f...
您可以通過單獨處理process_hash
字段列表並復制其他兩個字段來完成,如下所示:
import csv
data = [{'device_name': 'fk6sdc2',
rest of your data ...
def toCSV(res):
with open('EnrichedEvents.csv', 'w', newline='', encoding='utf-8') as csvfile:
fieldnames = 'md5,sha256,process_name,process_effective_reputation'.split(',')
dict_writer = csv.DictWriter(csvfile, fieldnames=fieldnames, extrasaction='ignore')
dict_writer.writeheader()
for obj in res:
md5, sha256 = obj['process_hash'] # Extract values from process_hash list.
row = {'md5': md5, 'sha256': sha256} # Initialize a row with them.
row.update({field: obj[field] # Copy the last two fields into it.
for field in fieldnames[-2:]})
dict_writer.writerow(row)
toCSV(data)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.