简体   繁体   English

如何在Python中编写多个txt文件?

[英]How to write multiple txt files in Python?

I am doing preprocessing tweet in Python. 我正在用Python进行预处理tweet。 My unpreprocess tweets are in a folder. 我的未处理推文在一个文件夹中。 Each file containing unpreprocess tweet named 1.txt, 2.txt,...10000.txt. 每个包含未处理tweet的文件都名为1.txt,2.txt,... 10000.txt。 I want to preprocess them and write them into new files that also named 1.txt , 2.txt,...10000.txt. 我想对其进行预处理,并将它们写入也名为1.txt,2.txt,... 10000.txt的新文件中。 My code is as follows : 我的代码如下:

for filename in glob.glob(os.path.join(path, '*.txt')):
with open(filename) as file:
    tweet=file.read()
    def processTweet(tweet):
        tweet = tweet.lower()
        tweet = re.sub('((www\.[^\s]+)|(https?://[^\s]+))','URL',tweet)
        tweet = re.sub('@[^\s]+','USER',tweet)
        tweet = re.sub('[\s]+', ' ', tweet)
        tweet = re.sub(r'#([^\s]+)', r'\1', tweet)            
        tweet = tweet.translate(None, string.punctuation)
        tweet = tweet.strip('\'"')
        return tweet

    fp = open(filename)
    line = fp.readline()

    count = 0
    processedTweet = processTweet(line)
    line = fp.readline()
    count += 1
    name = str(count) + ".txt"
    file = open(name, "w")
    file.write(processedTweet)
    file.close()

But that code just give me a new file named 1.txt that already preprocessed. 但是该代码只是给了我一个经过预处理的名为1.txt的新文件。 How can I write the other 9999 files? 如何写其他9999个文件? Is there any mistake in my code? 我的代码有什么错误吗?

Your count is getting reset to 0 with the call to count=0. 调用count = 0时,您的计数将重置为0。 So everytime it is about to write a file, it write "1.txt". 因此,每次要写入文件时,都会写入“ 1.txt”。 Why are you trying to reconstruct the filename, instead of just using the existing filename for the tweet you are preprocessing. 为什么要尝试重建文件名,而不是仅对要处理的推文使用现有的文件名。 Also, you should move your function definition to outside the loop: 另外,您应该将函数定义移到循环之外:

def processTweet(tweet):
    tweet = tweet.lower()
    tweet = re.sub('((www\.[^\s]+)|(https?://[^\s]+))','URL',tweet)
    tweet = re.sub('@[^\s]+','USER',tweet)
    tweet = re.sub('[\s]+', ' ', tweet)
    tweet = re.sub(r'#([^\s]+)', r'\1', tweet)            
    tweet = tweet.translate(None, string.punctuation)
    tweet = tweet.strip('\'"')
    return tweet

for filename in glob.glob(os.path.join(path, '*.txt')):
  with open(filename) as file:
    tweet=file.read()

  processedTweet = processTweet(tweet)

  file = open(filename, "w")
  file.write(processedTweet)
  file.close()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM