Iterate values in dataframe to print to Word using python-docx

Question

I am working with a pandas dataframe that contains titles, sources, and links to various news articles sourced from the GoogleNews API. I have then categorized the data into the various keywords I used to find the articles. I am attempting to iterate through the 'keyword' column to print the data neatly, and then export the iterations to Word using python-docx.

To pull the GoogleNews data, I am using a for loop with various keywords set up in a list. It looks like:

for i in list:
    googlenews=GoogleNews()
    googlenews.get_news(i)
    googlenews.set_lang('en')
    googlenews.set_period('1d')
    result=googlenews.result()
    df_ivar = pd.DataFrame(result)
    df_ivar = df_ivar[df_ivar['date'].notna()]
    df_ivar = df_ivar[df_ivar["date"].str.contains('hours ago')] # to only pull articles from within the last 24 hours
    df_ivar = df_ivar[['site', 'title', 'desc', 'link']]
    df_ivar['keyword'] = i
    df = df_ivar.append(df, ignore_index=True)

So far, I have found a way to print the data correctly, but I cannot find a way to only show each keyword once, and then print all the article titles, descriptions, and links below their appropriate keywords.

My data currently looks like this:

article 1    link 1    description 1    keyword 1
article 2    link 2    description 2    keyword 1
article 3    link 3    description 3    keyword 2
article 4    link 4    description 4    keyword 3

Upon export, I would like the python-docx document to display the data categorically, such as:

keyword 1
article 1
article 2

keyword 2
article 3

keyword 3
article 4

I have the python-docx script in working order, but every time I print the document, I am stuck with the keyword being presented ahead of every article name, when I would simply like the keyword displayed once, and any relevant articles posted below it. Currently, my for loop looks like:

for i in df.index:
    document.add_heading(df['keyword'][i], level=1)
    p = document.add_paragraph().add_run(dfs['title'][i]).underline = True
    document.add_paragraph(df['desc'][i], style='List Bullet')
    document.add_paragraph(df['link'][i], style='List Bullet')
    document.add_paragraph('Source: ' + df['site'][i], style='List Bullet')

Any help or guidance would be greatly appreciated! Thank you in advance!

Answer 1

You could use Pandas groupby using the keyword as parameter. The return of this function will be the name of the group (the keyword in this particular case) and the dataframe for this keyword. You can then use the name for the add_heading function and use the remaining logic you already built, but iterating over the group variable ( for i in g.index ).

for name, g in df.groupby('keyword'):
    document.add_heading(name, level=1)
    for i in g.index:
        p = document.add_paragraph().add_run(df['title'][i]).underline = True
        document.add_paragraph(df['desc'][i], style='List Bullet')
        document.add_paragraph(df['link'][i], style='List Bullet')
        document.add_paragraph('Source: ' + df['site'][i], style='List Bullet')

Iterate values in dataframe to print to Word using python-docx

Question

1 answers

solution1
1 ACCPTED 2021-05-02 14:37:53

Iterate values in dataframe to print to Word using python-docx

Question

1 answers

solution1 1 ACCPTED 2021-05-02 14:37:53

solution1
1 ACCPTED 2021-05-02 14:37:53