简体   繁体   中英

Python - Exporting PyGoogleNews output to CSV

I am new to Python and am trying to extract scraped headlines from the Google News Feed using the PyGoogleNews library for a project. I am using Google Colab. The PyGoogleNews code is running perfectly for me and I am happy with that. The code is creating the csv but I have been unable to populate the csv with the scraped headline results. I want to export the scraped headlines output into a csv file as I will be performing a sentiment analysis and downloading/extracting it to perform further analysis on it. I would be really grateful for any help as this has been bugging me for days, I am sure it is something very obvious. Thank you in advance.

!pip install pygooglenews --upgrade 
import pandas as pd
import csv
from pygooglenews import GoogleNews

gn = GoogleNews (lang = 'en', country = 'UK') 

Xmassearch = gn.search('intitle:Christmas', helper = True, from_ = '2019-12-01', to_= '2019-12-31')

print(Xmassearch['feed'].title)

for item in Xmassearch ['entries']:
  print(item['title'])

file = open("Christmassearch.csv", "w")
writer = csv.writer(file)

writer.writerow(["Xmassearch"])

file.close()

I don't have GoogleNews , and I cannot find any documentation that uses the keyword entries , so I cannot say exactly how this should work, so...

If iterating Xmassearch['entries'] works for you, then move that iteration down to where you've opened the CSV and have started writing.

I also re-wrote/structured your file handling to use Python's with compound statement so you don't have to manage the closing of the file:

with open('Christmassearch.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(['Xmassearch'])  # I presume you meant this to be your header
    
    # Use your loop from before...
    for item in Xmassearch ['entries']:
        # And write each item
        writer.writerow([item['title']])

You can use the Pandas library to create a DataFrame and save it as a CSV using the function to_csv .

Newsdata = pd.DataFrame()
for item in Xmassearch['entries']:
  text = item.title
  Newsdata = Newsdata.append(pd.DataFrame({'Date': entry.published,
                                          'text': [text if text else None]}))
  Newsdata = Newsdata.reset_index(drop=True)

To save it into google drive you can use the following:

from google.colab import drive
drive.mount('/content/drive', force_remount=True)
tweets_df.to_csv('data.csv')
!cp data.csv "/content/drive/MyDrive/Colab Notebooks/Data/tweets_df.csv"

I have instead a problem with from_ and to_ , can you please help me?

StartDate = datetime.date(2021,1,28)
EndDate = datetime.date(2022,1,1)
for ticker in tickers:
  search = gn.search(ticker, from_= StartDate.strftime('%Y-%m-%d'), to_= EndDate.strftime('%Y-%m-%d'))

That gives me the error error: bad escape \d at position 7 .

And raise the exception: Could not parse your date

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM