简体   繁体   中英

How to store aws athena output from python script in excel?

I am querying from aws athena using python script and pyathena library and I'm getting the correct output in the form of table.

Output

Now the problem is I want to store the output in excel.

Can anyone suggest me, using python script how i can store output in Excel?

Here is the code that I am using for querying in aws athena:

from pyathena import connect
import os
import pandas as pd
%matplotlib inline

conn = connect(aws_access_key_id='*****',
                 aws_secret_access_key='*****',
                 s3_staging_dir='s3://****/',
                 region_name='us-west-1')

cursor = conn.cursor()
%time cursor.execute("SELECT * from my_table;")

Thanks in advance...

The output to Excel is not limited to creating the xlsx file, which you can also write as csv and let Excel load the csv file.

You can create multiple sheets using something like:

from pandas import ExcelWriter
def save_xls(list_dfs, dfs_names, xls_path):
    with ExcelWriter(xls_path) as writer:
        for df,name in zip(list_dfs, dfs_names):
            df.to_excel(writer,name)
        writer.save()

and then you can call the function with some transformation of your data, such as pivot tables and even colors:

save_xls(
    (raw.style.format("{:,.0f}"), 
     actual_table.style.format("{:,.0f}"), 
     diff_table.style.applymap(_color_red_or_green).format("{:,.0f}"), 
     ratio_table.style.applymap(_color_red_yellow_or_green).format("{:.3f}")),
    ('Raw',
    'Actuals',
    'Diff',
    'Ratio'),
    results_with_format.xlsx')

for example, formatting with three colors based on the value of the cells:

def _color_red_yellow_or_green(val):
    color = 'red' if val > 0.1 else 'yellow' if val > 0.05 else 'green'
    return 'background-color: %s' % color

You can use pandas to query and save data in excel:

data = pd.read_sql("SELECT * from my_table;",conn) 
data.to_excel('data.xlsx') 

Depending on how precise you need to insert the data into the sheet, you might checkout OpenPyXl - https://openpyxl.readthedocs.io

I use that when I need to insert the Athena results into specific cells and/or worksheets in the workbook. For when I need to be more precise than displaying a table of results. You can reference individual cells like worksheet['A53'] = 12345

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM