简体   繁体   中英

Issue printing PySpark dataframe as formatted table in Jupyter

I have an PySpark dataframe (df) that I'd like to print as a nicely formatted table in my Jupyter notebook.

As per this post , I thought the following code would work:

import pandas as pd
from IPython.display import display, HTML

pandas_df = df.toPandas()

display(HTML(pandas_df.to_html()))

Unfortunately, this does not work. I get the following error:

ERROR - failed to write data to stream: <__main__.UnicodeDecodingStringIO object at 0x7f75c7a8e750>

Does anyone know how to resolve this issue?

Thanks!

Try the following:

def printDF(inputDF):
    newDF = inputDF.toPandas()
    from IPython.display import display, HTML
    return HTML(newDF.to_html())

You can also move the import statement so that it is imported globally, instead of importing it each time the function is called. Hope this helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM