Splunk 将 csv 从 GCP 加载到使用 Python SDK 的 KVStore 查找中

Question

We currently have a 45mb CSV file that we're going to be loading into a Splunk kvstore.我们目前有一个 45mb 的 CSV 文件，我们将要加载到 Splunk kvstore 中。 I want to be able to accomplish this via the python SDK but I'm running into a bit of trouble loading the records.我希望能够通过 python SDK 完成此操作，但在加载记录时遇到了一些麻烦。

The only way I can find to update a kvstore is the service.collection.insert() function which as far as I can tell only accepts 1 row at a time.我能找到的更新 kvstore 的唯一方法是 service.collection.insert() 函数，据我所知一次只接受 1 行。 Being that we have 250k rows in this file, I can't afford to wait for all lines to upload every day.由于我们在这个文件中有 250k 行，我不能每天等待所有行都上传。

This is what I have so far:这是我到目前为止：

 from splunklib import client, binding
 import json, pandas as pd
 from copy import deepcopy

 data_file = '/path/to/file.csv'

 username = 'user'
 password = 'splunk_pass'
 connectionHandler = binding.handler(timeout=12400)
 connect_kwargs = {
     'host': 'splunk-host.com',
     'port': 8089,
     'username': username,
     'password': password,
     'scheme': 'https',
     'autologin': True,
     'handler': connectionHandler
 }
 flag = True
 while flag:
     try:
         service = client.connect(**connect_kwargs)
         service.namespace['owner'] = 'Nobody'
         flag = False
     except binding.HTTPError:
         print('Splunk 504 Error')

 kv = service.kvstore
 kv['test_data'].delete()
 df = pd.read_csv(data_file)
 df.replace(pd.np.nan, '', regex=True)
 df['_key'] = df['key_field']
 result = df.to_dict(orient='records')
 fields = deepcopy(result[0])
 for field in fields.keys():
     fields[field] = type(fields[field]).__name__
 df = df.astype(fields)
 kv.create(name='test_data', fields=fields, owner='nobody', sharing='system')
 for row in result:
     row = json.dumps(row)
     row.replace("nan", "'nan'")
     kv['learning_center'].data.insert(row)
 transforms = service.confs['transforms']
 transforms.create(name='learning_center_lookup', **{'external_type': 'kvstore', 'collection': 'learning_center', 'fields_list': '_key, userGuid', 'owner': 'nobody'})
 # transforms['learning_center_lookup'].delete()
 collection = service.kvstore['learning-center']
 print(collection.data.query())

In addition to the problem of taking forever to load a quarter million records, it keeps failing on a row with nan as the value, and no matter what I put in there to try to deal with the nan, it persists in the dictionary value.除了永远加载 25 万条记录的问题之外，它在以 nan 为值的一行中不断失败，无论我在那里放什么来尝试处理 nan，它仍然存在于字典值中。

Answer 1

You could interface with the REST endpoint directly, then use storage/collections/data/{collection}/batch_save to save multiple items as required.您可以直接与 REST 端点交互，然后使用storage/collections/data/{collection}/batch_save根据需要保存多个项目。

Refer to https://docs.splunk.com/Documentation/Splunk/8.0.1/RESTREF/RESTkvstore#storage.2Fcollections.2Fdata.2F.7Bcollection.7D.2Fbatch_save请参阅https://docs.splunk.com/Documentation/Splunk/8.0.1/RESTREF/RESTkvstore#storage.2Fcollections.2Fdata.2F.7Bcollection.7D.2Fbatch_save

Splunk 将 csv 从 GCP 加载到使用 Python SDK 的 KVStore 查找中

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-01-26 23:27:33

Splunk 将 csv 从 GCP 加载到使用 Python SDK 的 KVStore 查找中

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-01-26 23:27:33

解决方案1
1 已采纳 2020-01-26 23:27:33