简体   繁体   English

Splunk 将 csv 从 GCP 加载到使用 Python SDK 的 KVStore 查找中

[英]Splunk Load csv from GCP into a KVStore lookup using the Python SDK

We currently have a 45mb CSV file that we're going to be loading into a Splunk kvstore.我们目前有一个 45mb 的 CSV 文件,我们将要加载到 Splunk kvstore 中。 I want to be able to accomplish this via the python SDK but I'm running into a bit of trouble loading the records.我希望能够通过 python SDK 完成此操作,但在加载记录时遇到了一些麻烦。

The only way I can find to update a kvstore is the service.collection.insert() function which as far as I can tell only accepts 1 row at a time.我能找到的更新 kvstore 的唯一方法是 service.collection.insert() 函数,据我所知一次只接受 1 行。 Being that we have 250k rows in this file, I can't afford to wait for all lines to upload every day.由于我们在这个文件中有 250k 行,我不能每天等待所有行都上传。

This is what I have so far:这是我到目前为止:

 from splunklib import client, binding
 import json, pandas as pd
 from copy import deepcopy

 data_file = '/path/to/file.csv'

 username = 'user'
 password = 'splunk_pass'
 connectionHandler = binding.handler(timeout=12400)
 connect_kwargs = {
     'host': 'splunk-host.com',
     'port': 8089,
     'username': username,
     'password': password,
     'scheme': 'https',
     'autologin': True,
     'handler': connectionHandler
 }
 flag = True
 while flag:
     try:
         service = client.connect(**connect_kwargs)
         service.namespace['owner'] = 'Nobody'
         flag = False
     except binding.HTTPError:
         print('Splunk 504 Error')

 kv = service.kvstore
 kv['test_data'].delete()
 df = pd.read_csv(data_file)
 df.replace(pd.np.nan, '', regex=True)
 df['_key'] = df['key_field']
 result = df.to_dict(orient='records')
 fields = deepcopy(result[0])
 for field in fields.keys():
     fields[field] = type(fields[field]).__name__
 df = df.astype(fields)
 kv.create(name='test_data', fields=fields, owner='nobody', sharing='system')
 for row in result:
     row = json.dumps(row)
     row.replace("nan", "'nan'")
     kv['learning_center'].data.insert(row)
 transforms = service.confs['transforms']
 transforms.create(name='learning_center_lookup', **{'external_type': 'kvstore', 'collection': 'learning_center', 'fields_list': '_key, userGuid', 'owner': 'nobody'})
 # transforms['learning_center_lookup'].delete()
 collection = service.kvstore['learning-center']
 print(collection.data.query())

In addition to the problem of taking forever to load a quarter million records, it keeps failing on a row with nan as the value, and no matter what I put in there to try to deal with the nan, it persists in the dictionary value.除了永远加载 25 万条记录的问题之外,它在以 nan 为值的一行中不断失败,无论我在那里放什么来尝试处理 nan,它仍然存在于字典值中。

You could interface with the REST endpoint directly, then use storage/collections/data/{collection}/batch_save to save multiple items as required.您可以直接与 REST 端点交互,然后使用storage/collections/data/{collection}/batch_save根据需要保存多个项目。

Refer to https://docs.splunk.com/Documentation/Splunk/8.0.1/RESTREF/RESTkvstore#storage.2Fcollections.2Fdata.2F.7Bcollection.7D.2Fbatch_save请参阅https://docs.splunk.com/Documentation/Splunk/8.0.1/RESTREF/RESTkvstore#storage.2Fcollections.2Fdata.2F.7Bcollection.7D.2Fbatch_save

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM