I am new to Python and am trying to extract scraped headlines from the Google News Feed using the PyGoogleNews library for a project. I am using Google Colab. The PyGoogleNews code is running perfectly for me and I am happy with that. The code is creating the csv but I have been unable to populate the csv with the scraped headline results. I want to export the scraped headlines output into a csv file as I will be performing a sentiment analysis and downloading/extracting it to perform further analysis on it. I would be really grateful for any help as this has been bugging me for days, I am sure it is something very obvious. Thank you in advance.
!pip install pygooglenews --upgrade
import pandas as pd
import csv
from pygooglenews import GoogleNews
gn = GoogleNews (lang = 'en', country = 'UK')
Xmassearch = gn.search('intitle:Christmas', helper = True, from_ = '2019-12-01', to_= '2019-12-31')
print(Xmassearch['feed'].title)
for item in Xmassearch ['entries']:
print(item['title'])
file = open("Christmassearch.csv", "w")
writer = csv.writer(file)
writer.writerow(["Xmassearch"])
file.close()
I don't have GoogleNews , and I cannot find any documentation that uses the keyword entries
, so I cannot say exactly how this should work, so...
If iterating Xmassearch['entries']
works for you, then move that iteration down to where you've opened the CSV and have started writing.
I also re-wrote/structured your file handling to use Python's with
compound statement so you don't have to manage the closing of the file:
with open('Christmassearch.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerow(['Xmassearch']) # I presume you meant this to be your header
# Use your loop from before...
for item in Xmassearch ['entries']:
# And write each item
writer.writerow([item['title']])
You can use the Pandas library to create a DataFrame and save it as a CSV using the function to_csv .
Newsdata = pd.DataFrame()
for item in Xmassearch['entries']:
text = item.title
Newsdata = Newsdata.append(pd.DataFrame({'Date': entry.published,
'text': [text if text else None]}))
Newsdata = Newsdata.reset_index(drop=True)
To save it into google drive you can use the following:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)
tweets_df.to_csv('data.csv')
!cp data.csv "/content/drive/MyDrive/Colab Notebooks/Data/tweets_df.csv"
I have instead a problem with from_ and to_ , can you please help me?
StartDate = datetime.date(2021,1,28)
EndDate = datetime.date(2022,1,1)
for ticker in tickers:
search = gn.search(ticker, from_= StartDate.strftime('%Y-%m-%d'), to_= EndDate.strftime('%Y-%m-%d'))
That gives me the error error: bad escape \d at position 7 .
And raise the exception: Could not parse your date
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.