How to save tokenization result for each file in a new separate text file?

Question

I have 349 text file. I use the following code to read and tokenize all of them.

import glob
path = "C:\\texts\\*.txt"
for file in files:
   with open (file) as in_file, open ("C:\\texts\\file_tokens.txt", 'w') as out_file:
       for line in in_file:
           words = line.split()
           for word in words:
               out_file.write(word)
               out_file.write("\n")

This code save the result (all tokens) in one file (file_tokens.txt). How can I save the tokens of each file in a new .txt file? I mean I want 349 files in output as each contains the tokens of each file.

Answer 1

from os import path
base_path = "C:\\texts\\*.txt"  #RENAMED
for file in files:
    with open (file) as in_file:
        with open(path.join(base_path,"%s_tokenized.txt" % file)) as out_file:  #ATTENTION
            for line in in_file:
                words = line.split()
                for word in words:
                out_file.write(word)
                out_file.write("\n")

You create a new file with a name specific to the current file you're proccessing. In this example it's ($file_name)_tokenized.txt .

path.join is used to output the file to the correct directory. Ie

>>> path.join("~/Documents","out.txt")
'~/Documents/out.txt'

Answer 2

给每个输出文件一个不同的名称。

How to save tokenization result for each file in a new separate text file?

Question

2 answers

solution1
1 ACCPTED 2013-11-05 05:33:16

solution2
0 2013-11-05 05:32:02

How to save tokenization result for each file in a new separate text file?

Question

2 answers

solution1 1 ACCPTED 2013-11-05 05:33:16

solution2 0 2013-11-05 05:32:02

solution1
1 ACCPTED 2013-11-05 05:33:16

solution2
0 2013-11-05 05:32:02