簡體   English   中英

如何拆分CSV列?

[英]How to split CSV columns?

我正在使用以下代碼將字典/json 數據導出到 CSV,我試圖將“process_hash”列拆分為兩列,以便 MD5 一列,SHA256 另一列,以及其他現有列。

“process_hash”列當前包含列表值,我不確定如何將它們拆分為 MD5 和 SHA256 列?

[{'device_name': 'fk6sdc2',
  'device_timestamp': '2020-10-27T00:50:46.176Z',
  'event_id': '9b1bvf6e17ee11eb81b',
  'process_effective_reputation': 'LIST',
  'process_hash': ['bfc7dcf5935830f3a9df8e9b6425c37a',
                   'ca9f3a24506cc518fc939a33c100b2d557f96e040f712f6dd4641ad1734e2f19'],
  'process_name': 'c:\\program files '
                  '(x86)\\toh122soft\\thcasdf3\\toho34rce.exe',
  'process_username': ['JOHN\\user1']},
 {'device_name': 'fk6sdc2',
  'device_timestamp': '2020-10-27T00:50:46.176Z',
  'event_id': '9b151f6e17ee11eb81b',
  'process_effective_reputation': 'LIST',
  'process_hash': ['bfc7dcf5935f3a9df8e9b6830425c37a',
                   'ca9f3a24506cc518fc939a33c100b2d557f96e040f712f6dd4641ad1734e2f19'],
  'process_name': 'c:\\program files (x86)\\oft\\tf3\\tootsice.exe',
  'process_username': ['JOHN\\user2']},
 {'device_name': '6asdsdc2',
  'device_timestamp': '2020-10-27T00:50:46.176Z',
  'event_id': '9b151f698e11eb81b',
  'process_effective_reputation': 'LIST',
  'process_hash': ['9df8ebfc7dcf5935830f3a9b6425c37a',
                   'ca9f3a24506cc518ff6ddc939a33c100b2d557f96e040f7124641ad1734e2f19'],
  'process_name': 'c:\\program files (x86)\\toht\\th3\\tohce.exe',
  'process_username': ['JOHN\\user3']}]

導出到csv的代碼:

def toCSV(res):
    with open('EnrichedEvents.csv', 'w', newline='', encoding='utf-8') as csvfile:
        fieldnames = ['process_hash', 'process_name', "process_effective_reputation"]
        dict_writer = csv.DictWriter(csvfile, fieldnames=fieldnames, extrasaction='ignore')
        dict_writer.writeheader()

        entries = set()
        for data in res:
            val = tuple(','.join(v) if isinstance(v, list) else v for v in data.values())
            if val not in entries:
                dict_writer.writerow(data)
                entries.add(val)

csv數據:

 process_hash    process_name    process_effective_reputation
 ['f810a809e9cdf70c3189008e07c83619', '58d44528b60d36b515359fe234c9332ccef6937f5c950472230ce15dca8812e2']    c:\windows\system32\delltpad\apmsgfwd.exe   ADAPTIVE_WHITE_LIST
 ['73ca11f2acf1adb7802c2914e1026db899a3c851cd9500378c0045e0']    c:\users\zdr3dds01\documents\sap\sap gui\export.mhtml   NOT_LISTED
 ['f810a809e9cdf70c3189008e07c83619', '58d44528b60d36b515359fe234c9332ccef6937f5c950472230ce15dca8812e2']    c:\windows\system32\delltpad\apmsgfwd.exe   ADAPTIVE_WHITE_LIST
 ['f810a809e9cdf70c3189008e07c83619', '58d44528b60d36b515359fe234c9332ccef6937f5c950472230ce15dca8812e2']    c:\windows\system32\delltpad\apmsgfwd.exe   ADAPTIVE_WHITE_LIST
 ['582f018bc7a732d63f624d6f92b3d143', '66505bcb9975d61af14dd09cddd9ac0d11a3e2b5ae41845c65117e7e2b046d37']    c:\users\jij09\appdata\local\kingsoft\power word 2016\2016.3.3.0368\powerword.exe   ADAPTIVE_WHITE_LIST

我試圖用 CSV 文件實現的目標:

 md5   sha256   process_name  process_effective_reputation

這是一種方法。 函數apply()將一個列表轉換為多列。

import pandas as pd

data = [{'device_name': 'fk6sdc2',
  'device_timestamp': '2020-10-27T00:50:46.176Z',
  'event_id': '9b1bvf6e17ee11eb81b',
  'process_effective_reputation': 'LIST',
  'process_hash': ['bfc7dcf5935830f3a9df8e9b6425c37a',
                   'ca9f3a24506cc518fc939a33c100b2d557f96e040f712f6dd4641ad1734e2f19'],
  'process_name': 'c:\\program files '
                  '(x86)\\toh122soft\\thcasdf3\\toho34rce.exe',
  'process_username': ['JOHN\\user1']},
 {'device_name': 'fk6sdc2',
  'device_timestamp': '2020-10-27T00:50:46.176Z',
  'event_id': '9b151f6e17ee11eb81b',
  'process_effective_reputation': 'LIST',
  'process_hash': ['bfc7dcf5935f3a9df8e9b6830425c37a',
                   'ca9f3a24506cc518fc939a33c100b2d557f96e040f712f6dd4641ad1734e2f19'],
  'process_name': 'c:\\program files (x86)\\oft\\tf3\\tootsice.exe',
  'process_username': ['JOHN\\user2']},
 {'device_name': '6asdsdc2',
  'device_timestamp': '2020-10-27T00:50:46.176Z',
  'event_id': '9b151f698e11eb81b',
  'process_effective_reputation': 'LIST',
  'process_hash': ['9df8ebfc7dcf5935830f3a9b6425c37a',
                   'ca9f3a24506cc518ff6ddc939a33c100b2d557f96e040f7124641ad1734e2f19'],
  'process_name': 'c:\\program files (x86)\\toht\\th3\\tohce.exe',
  'process_username': ['JOHN\\user3']}]

現在處理數據:

df = pd.DataFrame(data, columns=['process_hash', 'process_name', 'process_effective_reputation'])
df[['md5', 'sha256']] = df['process_hash'].apply(lambda x: pd.Series(x))
df = df.drop(columns='process_hash')

最后是結果:

print(df)

                                        process_name  \
0  c:\program files (x86)\toh122soft\thcasdf3\toh...   
1        c:\program files (x86)\oft\tf3\tootsice.exe   
2          c:\program files (x86)\toht\th3\tohce.exe   

  process_effective_reputation                               md5  \
0                         LIST  bfc7dcf5935830f3a9df8e9b6425c37a   
1                         LIST  bfc7dcf5935f3a9df8e9b6830425c37a   
2                         LIST  9df8ebfc7dcf5935830f3a9b6425c37a   

                                              sha256  
0  ca9f3a24506cc518fc939a33c100b2d557f96e040f712f...  
1  ca9f3a24506cc518fc939a33c100b2d557f96e040f712f...  
2  ca9f3a24506cc518ff6ddc939a33c100b2d557f96e040f...  

您可以通過單獨處理process_hash字段列表並復制其他兩個字段來完成,如下所示:

import csv

data = [{'device_name': 'fk6sdc2',
          rest of your data ...

def toCSV(res):
    with open('EnrichedEvents.csv', 'w', newline='', encoding='utf-8') as csvfile:
        fieldnames = 'md5,sha256,process_name,process_effective_reputation'.split(',')

        dict_writer = csv.DictWriter(csvfile, fieldnames=fieldnames, extrasaction='ignore')
        dict_writer.writeheader()

        for obj in res:
            md5, sha256 = obj['process_hash']  # Extract values from process_hash list.
            row = {'md5': md5, 'sha256': sha256}  # Initialize a row with them.
            row.update({field: obj[field]  # Copy the last two fields into it.
                            for field in fieldnames[-2:]})
            dict_writer.writerow(row)

toCSV(data)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM