)
I have a CSV-File that looks like this:
Blockquote
Blockquote
I want to write a Code to iterate over every row of this CSV-Table. Then to count the number of tokens in every row (eg every Text) Then make a new CSV-Table as Output, in which should only be the Text-ID with the number of Tokens in this text.
Blockquote
The Output CSV-File should look like this:
Blockquote
So far I have this Code:
import csv
from textblob_de import TextBlobDE as TextBlob
data = open('myInputFile.csv', encoding="utf-8").readlines()
blob = TextBlob(str(data))
csv_file = open('myOutputFile.csv', 'w', encoding="utf-8")
csv_writer = csv.writer(csv_file)
# Define the Headers of the CSV
csv_writer.writerow(['Text-ID', 'Tokens])
def numOfWordTokens(document):
myList = []
for eachRow in document:
myList.append(eachRow)
return "\n".join(myList)
#return eachRow
#print(eachRow)
# Count Tokens
#countTokens = len(wordTokens2.split()) # Output: integer
#return countTokens
#myList.append(str(countTokens))
wordTokens = numOfWordTokens(data)
# Write Content in the CSV-Table Rows
csv_writer.writerow([wordTokens])
csv_file.close()
So, first of all I have the following question?
When I do return eachRow I get no Output in the Shell and only the 1. row as output in the new created CSV-File. When I do print (eachRow) I get really each row printed as Output in the Shell, but my new created CSV-file is just empty!
So that is the first part that I have trouble with, so I can't continue to go to the part where I actually count the tokens in each row and write the number of tokens into the new CSV-File.
It's super easy with pandas, but if you prefer not to use other modules, that's fine as well :) I've added the code for both pandas and for manually iterating over the data:
import pandas as pd
import csv
def main_pandas(path_to_csv: str, target_path: str):
df = pd.read_csv(path_to_csv, encoding='utf-8')
df['tokens'] = df['Content'].apply(lambda x: len(x.split()))
sub_df = df[['ID', 'tokens']]
sub_df.to_csv(target_path, index=False)
def main_manual(path_to_csv: str, target_path: str):
with open(path_to_csv, 'r') as r_fp:
csv_reader = csv.reader(r_fp)
next(csv_reader) # Skip headers
with open(target_path, 'w') as w_fp:
csv_writer = csv.writer(w_fp)
csv_writer.writerow(['Text ID', 'tokens']) # Write headers
for line in csv_reader:
text_id, text_content = line
csv_writer.writerow([text_id, len(text_content.split())])
if __name__ == '__main__':
main_manual('text.csv', 'tokens.csv')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.