I have an excel file with multiple columns. In one column I have different comments. I want to create a column just beside it to find the number of words in the comment columns using python code. Is there any possibility.
Try this:
import xlrd
import os
from string import punctuation, translate
from collections import Counter
filename = u'test.xlsx'
sheet_no = 1 # To get the first sheet of the workbook
path = 'C:\Users\myUsername\Directory for Excel files'
punctuation_map = dict((ord(c), u' ') for c in punctuation)
for filename in os.listdir(path):
if filename.endswith('.xlsx'):
print filename
workbook = xlrd.open_workbook(filename)
sheet = workbook.sheet_by_index(sheet_no)
values = []
for row in range(sheet.nrows):
for col in range(sheet.ncols):
c = sheet.cell(row, col)
if c.ctype == xlrd.XL_CELL_TEXT:
cv = unicode(c.value)
wordlist = cv.translate(punctuation_map).split()
values.extend(wordlist)
numberWords = Counter(wordlist)
print sum(numberWords.values()), ' words for that column'
count = Counter(values)
print sum(count.values()), ' total words counted (from all columns)'
import pandas as pd
df #is your dataframe
counter = [] #future column you want
for string in df.Comments.values: #for each string in your "Comments"
counter.append(string.count(' ') + 1) #num of spaces + 1
df['num_words'] = counter #add the column
df = df[['num_words', 'Comments']] #change the order of columns
my df was my df
and I finally got this df
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.