简体   繁体   English

如何使用 python 从列表中读取数据并将特定值索引到 Elasticsearch?

[英]How can I read data from a list and index specific values into Elasticsearch, using python?

I have used "paramiko" to connect from my PC to a devboard, and execute a script.我使用“paramiko”从我的 PC 连接到开发板,并执行脚本。 Then I am saving the results of this script in a list (output).然后我将此脚本的结果保存在列表(输出)中。 I want to extract some values of the list and insert them into Elasticsearch.我想提取列表的一些值并将它们插入到 Elasticsearch 中。 I have done it manually with the first result of the list.我已经用列表的第一个结果手动完成了。 But how can I automate for the rest of the values?但是我怎样才能自动化 rest 的值呢? Do I need "regex"?我需要“正则表达式”吗? Please give me some clues.请给我一些线索。

Thank you谢谢

THIS IS PART OF THE CODE THAT CONNECTS TO THE DEVBOARD, EXECUTES A SCRIPT AND RETRIEVES A LIST=output这是连接到开发板、执行脚本并检索 LIST=output 的代码的一部分

def main():
    ssh = initialize_ssh()
    stdin, stdout, stderr = ssh.exec_command('cd coral/tflite/python/examples/classification/Auto_benchmark\n python3 auto_benchmark.py')
    output = stdout.readlines()
    type(output)
    #print(type(output))
    print('\n'.join(output))
    ssh.close()

THE LIST LOOKS LIKE THIS:列表如下所示:

labels: imagenet_labels.txt 

Model: efficientnet-edgetpu-S_quant_edgetpu.tflite 

Image: img0000.jpg 


----INFERENCE TIME----

Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.

Time: 6.2ms

Results: wall clock

Score: 0.25781

##################################### 

labels: imagenet_labels.txt 

Model: mobilenet_v1_1.0_224_quant_edgetpu.tflite 

Image: img0000.jpg 


----INFERENCE TIME----

Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.

Time: 2.8ms

Results: umbrella

Score: 0.22266

##################################### 
Temperature: 35C

THIS IS THE MAPPING THAT IS NEEDED TO INDEX DATA INTO ELASTICSEARCH这是将数据索引到 ELASTICSEARCH 所需的映射

def initialize_mapping_classification(es):
    """
    Initialise les mappings
    """
    mapping_classification = {
        'properties': {
            '@timestamp': {'type': 'date'},
            'type': 'coralito',
            'Model': {'type': 'string'},
            'Time': {'type': 'float'},
            'Results': {'type': 'string'},
            'Score': {'type': 'float'},
            'Temperature': {'type': 'float'}
        }
    }

    if not es.indices.exists(CORAL):
        es.indices.create(CORAL)
        es.indices.put_mapping(body=mapping_classification, doc_type=DOC_TYPE, index=CORAL)

THIS IS MY ATTEMPT.这是我的尝试。 I HAVE DONE IT MANUALLY WITH THE FIRST RESULT OF THE LIST.我已经用列表的第一个结果手动完成了。 I WANT TO AUTOMATE IT我想自动化它

if CLASSIFY == 1:
                
        doc = {
            '@timestamp':  str(datetime.datetime.utcnow().strftime("%Y-%m-%d"'T'"%H:%M:%S")),
            'type': 'coralito',
            'Model': "efficientnet-edgetpu-S_quant_edgetpu.tflite",
            'Time': "6.2 ms",
            'Results': "wall clock",
            'Score': "0.25781",
            'Temperature': "35 C"
        }

        response = send_data_elasticsearch(CORAL, DOC_TYPE, doc, es)

        print(doc)

------------------------------EDIT 2--------------------------------------- ------------------------------编辑2------------------ ---------------------

So this is how my data looks like after using regex to extract the values of interest所以这就是我的数据在使用正则表达式提取感兴趣的值后的样子

在此处输入图像描述

This is what I get indexed:这是我得到的索引:

在此处输入图像描述

This is my code:这是我的代码:

import elasticsearch  
from elasticsearch import Elasticsearch, helpers
import datetime
import re

data = ['labels: imagenet_labels.txt \n', '\n', 'Model: efficientnet-edgetpu-S_quant_edgetpu.tflite \n', '\n', 'Image: insect.jpg \n', '\n', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\n', 'Time(ms): 23.1\n', 'Time(ms): 5.7\n', '\n', '\n', 'Inference: corkscrew, bottle screw\n', 'Score: 0.03125 \n', '\n', 'TPU_temp(°C): 57.05\n', '##################################### \n', '\n', 'labels: imagenet_labels.txt \n', '\n', 'Model: efficientnet-edgetpu-M_quant_edgetpu.tflite \n', '\n', 'Image: insect.jpg \n', '\n', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\n', 'Time(ms): 29.3\n', 'Time(ms): 10.8\n', '\n', '\n', "Inference: dragonfly, darning needle, devil's darning needle, sewing needle, snake feeder, snake doctor, mosquito hawk, skeeter hawk\n", 'Score: 0.09375 \n', '\n', 'TPU_temp(°C): 56.8\n', '##################################### \n', '\n', 'labels: imagenet_labels.txt \n', '\n', 'Model: efficientnet-edgetpu-L_quant_edgetpu.tflite \n', '\n', 'Image: insect.jpg \n', '\n', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\n', 'Time(ms): 45.6\n', 'Time(ms): 31.0\n', '\n', '\n', 'Inference: pick, plectrum, plectron\n', 'Score: 0.09766 \n', '\n', 'TPU_temp(°C): 57.55\n', '##################################### \n', '\n', 'labels: imagenet_labels.txt \n', '\n', 'Model: inception_v3_299_quant_edgetpu.tflite \n', '\n', 'Image: insect.jpg \n', '\n', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\n', 'Time(ms): 68.8\n', 'Time(ms): 51.3\n', '\n', '\n', 'Inference: ringlet, ringlet butterfly\n', 'Score: 0.48047 \n', '\n', 'TPU_temp(°C): 57.3\n', '##################################### \n', '\n', 'labels: imagenet_labels.txt \n', '\n', 'Model: inception_v4_299_quant_edgetpu.tflite \n', '\n', 'Image: insect.jpg \n', '\n', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\n', 'Time(ms): 121.8\n', 'Time(ms): 101.2\n', '\n', '\n', 'Inference: admiral\n', 'Score: 0.59375 \n', '\n', 'TPU_temp(°C): 57.05\n', '##################################### \n', '\n', 'labels: imagenet_labels.txt \n', '\n', 'Model: inception_v2_224_quant_edgetpu.tflite \n', '\n', 'Image: insect.jpg \n', '\n', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\n', 'Time(ms): 34.3\n', 'Time(ms): 16.6\n', '\n', '\n', 'Inference: lycaenid, lycaenid butterfly\n', 'Score: 0.41406 \n', '\n', 'TPU_temp(°C): 57.3\n', '##################################### \n', '\n', 'labels: imagenet_labels.txt \n', '\n', 'Model: mobilenet_v2_1.0_224_quant_edgetpu.tflite \n', '\n', 'Image: insect.jpg \n', '\n', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\n', 'Time(ms): 14.4\n', 'Time(ms): 3.3\n', '\n', '\n', 'Inference: leatherback turtle, leatherback, leathery turtle, Dermochelys coriacea\n', 'Score: 0.36328 \n', '\n', 'TPU_temp(°C): 57.3\n', '##################################### \n', '\n', 'labels: imagenet_labels.txt \n', '\n', 'Model: mobilenet_v1_1.0_224_quant_edgetpu.tflite \n', '\n', 'Image: insect.jpg \n', '\n', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\n', 'Time(ms): 14.5\n', 'Time(ms): 3.0\n', '\n', '\n', 'Inference: bow tie, bow-tie, bowtie\n', 'Score: 0.33984 \n', '\n', 'TPU_temp(°C): 57.3\n', '##################################### \n', '\n', 'labels: imagenet_labels.txt \n', '\n', 'Model: inception_v1_224_quant_edgetpu.tflite \n', '\n', 'Image: insect.jpg \n', '\n', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\n', 'Time(ms): 21.2\n', 'Time(ms): 3.6\n', '\n', '\n', 'Inference: pick, plectrum, plectron\n', 'Score: 0.17578 \n', '\n', 'TPU_temp(°C): 57.3\n', '##################################### \n', '\n']


# declare a client instance of the Python Elasticsearch library
client = Elasticsearch("http://localhost:9200")

#using regex 
regex = re.compile(r'(\w+)\((.+)\):\s(.*)|(\w+:)\s(.*)')
match_regex = list(filter(regex.match, data))
match = [line.rstrip('\n') for line in match_regex]


#using "bulk"
def yield_docs():
    """
    Initialise les mappings
    """
    
    doc_source = {
        "data": match
        
        }

    # use a yield generator so that the doc data isn't loaded into memory
    yield {
        "_index": "coralito",
        "_type": "coralote",
        "_source": doc_source
        }

try:
    # make the bulk call using 'actions' and get a response
    resp = helpers.bulk(
        client,
        yield_docs()
    )
    print ("\nhelpers.bulk() RESPONSE:", resp)
    print ("RESPONSE TYPE:", type(resp))
except Exception as err:
    print("\nhelpers.bulk() ERROR:", err)

-----------------------------EDIT 3--------------------- -----------------------------------------编辑 3-------------------- --

在此处输入图像描述 在此处输入图像描述 在此处输入图像描述

  1. Remove the line breaks删除换行符
  2. Split the text by a common delimiter ( ----INFERENCE TIME---- would be a good start I think)用一个共同的分隔符分割文本(我认为----INFERENCE TIME----将是一个好的开始)
  3. Extract the keys & values using for example r'(\w+:)\s(.*)' or a named lookbehind such as r'(?<=Note: ).*' etc使用例如r'(\w+:)\s(.*)'或诸如r'(?<=Note: ).*'等命名的lookbehind提取键和值
  4. Parse the numeric values (time, score, temperature, ...) -- you'll thank me later;)解析数值(时间、分数、温度……)——稍后你会感谢我的;)
  5. Extend the Model mapping w/ a keyword datatype -- otherwise the dot will be tokenized away and you'll wonder why you can't search for exact matches nor aggregate on it使用关键字数据类型扩展Model映射 - 否则点将被标记掉,您会想知道为什么您不能搜索完全匹配或聚合它
  6. Prepare the objects that you'll want to sync准备要同步的对象
  7. Bulk upload to ElasticSearch Bulk上传到 ElasticSearch

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在不使用迭代的情况下从另一个 df 获取具有特定索引、列的值列表? - How can I get list of values with specific index, column from another df without using iteration? 如何从python列表中删除特定索引 - How can I remove specific index from list in python 如何在python中从Elasticsearch访问索引值 - How to access index values from Elasticsearch in python 如何从 elasticsearch 中的特定索引中获取数据? - How can i fetch data from a particular index in elasticsearch? 如何通过将其他列表的值用作第一个列表的索引,将一个列表的值插入到另一个列表中? (Python) - How can I insert values of one list into another by using the values of the other lists an index into the first? (Python) 如何通过在python中指定列索引将列表插入数据帧中的特定列? - How Can I insert a list into a specific column in a data frame by specifying the column index in python? 如何读取我的 exel 表中的特定数据并从读取的每个数据集创建一个图? (Python) - How can I read specific data in my exel sheet and create a plot from each dataset that is read? (Python) 如何从python将动态值传递给elasticsearch查询 - How can I pass dynamic values to elasticsearch query from python 如何将值列表转换为列表中的索引/切片? - How can I convert a list of values into an index/slice from a list? 如何获取列表中最大值的索引,然后使用最大值的索引从另一个列表中打印值? - How can I get the index of max values in list and then print the values from another list with max's index?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM