简体   繁体   中英

Accessing Elasticsearch using python

I'm currently trying to write a script to enrich some data. I've already coded some things that work fine with a demodata txt file, but now I'd like to try and directly requests the latest data from the server in the script.

The data I'm working with is stored on Elasticsearch. I've received a URL, including the port number. I also have a cluster ID, a username, and a password.

I can access the data directly using Kibana, where I enter the following into the console (under Dev Tools):

GET /*projectname*/appevents/_search?pretty=true&size=10000

I can copy the output into a TXT file (well, it's actually JSON data), which currently gets parsed by my script. I'd prefer to just collect the data directly without this intermediate step. Also, I'm currently limited to 10000 records/events, but I'd like to get all of them.

This works:

res = requests.get('*url*:*port*',
               auth=HTTPBasicAuth('*username*','*password*'))
print(res.content)

I'm struggling with the elasticsearch package. How do I mimic the 'get' command listed above in my script, collecting everything in a JSON format?

Fixed, got some help from a programmer. Stored into a list, so I can work with it from there. Code below, identifying info is removed.

es = Elasticsearch(
    hosts=[{'host': '***', 'port': ***}],
    http_auth=('***', '***'),
    use_ssl=True
)

count = es.count(index="***", doc_type="***")
print(count)  # {u'count': 244532, u'_shards': {u'successful': 5, u'failed': 0, u'total': 5}}

# Use scroll to ease strain on cluster (don't pull in all results at once)
results = es.search(index="***", doc_type="***", size=1000,
                    scroll="30s")  
scroll_id = results['_scroll_id']
total_size = results['hits']['total']
print(total_size)

# Save all results in list
dump = []

ct = 1
while total_size > 0:

    results = es.scroll(scroll_id=scroll_id, scroll='30s')

    dump += results['hits']['hits']
    scroll_id = results['_scroll_id']
    total_size = len(results['hits']['hits'])  # As long as there are results, keep going ...
    print("Chunk #", ct, ": ", total_size, "\tList size: ", len(dump))
    ct += 1

es.clear_scroll(body={'scroll_id': [scroll_id]})  # Cleanup (otherwise Scroll id remains in ES memory)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM