简体   繁体   中英

How can I save sort .txt into files? Python or shell script

I have a python program that downloads article text and then turns it into a txt file. The program currently spits out the txt files in the directory the program is located in. I would like to arrange this text in folders specific to the news source they came from. Could I save the data in the folder in the python program itself and change the directory as the news source file changes? Or should I create a shell script that runs the python program inside the folder it needs to be in? Or is there a better way to sort these files that I am missing?

Here is the code of the Python program:

import feedparser
from goose import Goose
import urllib2
import codecs

url = "http://rss.cnn.com/rss/cnn_tech.rss"

feed = feedparser.parse(url)
g = Goose()

entryLength = len(feed['entries'])
count = 0

while True:
    article = g.extract(feed.entries[count]['link'])
    title = article.title
    text = article.cleaned_text

    file = codecs.open(feed['entries'][count]['title'] + ".txt", 'w', encoding = 'utf-8')
    file.write(text)
    file.close()

    count = count + 1
    if count == entryLength:
        break

If you only give your save functions filenames, they will save to the current directory. However, if you provide them with paths, your files will end up there. Python takes care of it.

folder = 'whatever' #the folder you wish to save the files in
name = 'somefilename.txt'
filename = os.path.join(folder, filename)

Using that filename will make the file end up in the folder 'whatever/'

Edit: I see you've posted your code now. As br1ckb0t mentioned in his comment below, in your code you could write something like codecs.open(folder + feed['entries']... . Make sure to append a slash to folder if you do that, or it'll just end up as part of the filename.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM