Something I discovered while working on this problem was concordance
does not like to display the context at the start of a Text
:
>>> from nltk.book import *
>>> text3.concordance("beginning",lines=1)
Displaying 1 of 5 matches:
beginning God created the heaven and the ear
Note there is no "In the" in the output above. However concordance
has no problem with the end of the Text
.
>>> text3.concordance("coffin",lines=1)
Displaying 1 of 1 matches:
embalmed him , and he was put in a coffin in Egypt .
Interestingly, if you specify a width
things work out better (default width=79
, I believe).
>>> text3.concordance("beginning",width=11, lines=1)
Displaying 1 of 5 matches:
In the beginning
Anyone have an explanation for this? The doc at nltk.org says:
Print a concordance for word with the specified context window. Word matching is not case-sensitive.
Consider this function concordance
which i have modified from original source code in the class ConcordanceIndex()
from source code HERE .
def print_concordance(self, word, width=35, lines=25):
"""
Print a concordance for ``word`` with the specified context window.
:param word: The target word
:type word: str
:param width: The width of each line, in characters (default=80)
:type width: int
:param lines: The number of lines to display (default=25)
:type lines: int
"""
#print ("inside:")
#print (width)
half_width = (width - len(word) - 2) // 2
#print (half_width)
context = width // 4 # approx number of words of context
#print ("Context:"+str(context))
offsets = self.offsets(word)
if offsets:
lines = min(lines, len(offsets))
print("Displaying %s of %s matches:" % (lines, len(offsets)))
for i in offsets:
#print(i)
if lines <= 0:
break
left = (' ' * half_width +
' '.join(self._tokens[i-context:i])) #This is were you have to concentrate
#print(i-context)
#print(self._tokens[i-context:i])
right = ' '.join(self._tokens[i+1:i+context])
left = left[-half_width:]
right = right[:half_width]
print(left, self._tokens[i], right)
lines -= 1
else:
print("No matches")
From the commented area you can observe whenever the value becomes '-ve' then nothing prints out on the console.
You can have ['+ve':'-ve'] but not ['-ve':'+ve']. hence nothing gets printed, in other way null string gets printed.
When self._tokens[i-context:i]
initially value will be positive as width increases it tends to negative and hence no output.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.