简体   繁体   中英

How to write multiple txt files in Python?

I am doing preprocessing tweet in Python. My unpreprocess tweets are in a folder. Each file containing unpreprocess tweet named 1.txt, 2.txt,...10000.txt. I want to preprocess them and write them into new files that also named 1.txt , 2.txt,...10000.txt. My code is as follows :

for filename in glob.glob(os.path.join(path, '*.txt')):
with open(filename) as file:
    tweet=file.read()
    def processTweet(tweet):
        tweet = tweet.lower()
        tweet = re.sub('((www\.[^\s]+)|(https?://[^\s]+))','URL',tweet)
        tweet = re.sub('@[^\s]+','USER',tweet)
        tweet = re.sub('[\s]+', ' ', tweet)
        tweet = re.sub(r'#([^\s]+)', r'\1', tweet)            
        tweet = tweet.translate(None, string.punctuation)
        tweet = tweet.strip('\'"')
        return tweet

    fp = open(filename)
    line = fp.readline()

    count = 0
    processedTweet = processTweet(line)
    line = fp.readline()
    count += 1
    name = str(count) + ".txt"
    file = open(name, "w")
    file.write(processedTweet)
    file.close()

But that code just give me a new file named 1.txt that already preprocessed. How can I write the other 9999 files? Is there any mistake in my code?

Your count is getting reset to 0 with the call to count=0. So everytime it is about to write a file, it write "1.txt". Why are you trying to reconstruct the filename, instead of just using the existing filename for the tweet you are preprocessing. Also, you should move your function definition to outside the loop:

def processTweet(tweet):
    tweet = tweet.lower()
    tweet = re.sub('((www\.[^\s]+)|(https?://[^\s]+))','URL',tweet)
    tweet = re.sub('@[^\s]+','USER',tweet)
    tweet = re.sub('[\s]+', ' ', tweet)
    tweet = re.sub(r'#([^\s]+)', r'\1', tweet)            
    tweet = tweet.translate(None, string.punctuation)
    tweet = tweet.strip('\'"')
    return tweet

for filename in glob.glob(os.path.join(path, '*.txt')):
  with open(filename) as file:
    tweet=file.read()

  processedTweet = processTweet(tweet)

  file = open(filename, "w")
  file.write(processedTweet)
  file.close()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM