How to tokenize sentence using nlp

Question

I'm new in NLP. I'm trying to tokenize sentence using nlp on python 3.7.So I used following code

import nltk
text4="This is the first sentence.A gallon of milk in the U.S. cost 
$2.99.Is this the third sentence?Yes,it is!"
x=nltk.sent_tokenize(text4)
x[0]

I was expecting that x[0] will return first sentence but I got

Out[4]: 'This is the first sentence.A gallon of milk in the U.S. cost $2.99.Is this the third sentence?Yes,it is!'

Am I doing anything wrong?

Answer 1

You need valid spacing and punctuation in your sentences for the tokenizer to behave properly:

import nltk

text4 = "This is a sentence. This is another sentence."
nltk.sent_tokenize(text4)

# ['This is a sentence.', 'This is another sentence.']

## Versus What you had before

nltk.sent_tokenize("This is a sentence.This is another sentence.")

# ['This is a sentence.This is another sentence.']

Answer 2

NLTK sent_tokenizer does not handle Ill formated text well. If you provide proper spacings then it works.

import nltk
nltk.download('punkt')
text4="This is the first sentence. A gallon of milk in the U.S. cost $2.99. Is this 
the third sentence? Yes, it is"
x=nltk.sent_tokenize(text4)
x[0]

OR You could use this.

import re
text4 = "This is the first sentence. A gallon of milk in the U.S. cost 2.99. Is this 
the third sentence? Yes it is"
sentences = re.split(r' *[\.\?!][\'"\)\]]* *', text4)
sentences

How to tokenize sentence using nlp

Question

2 answers

solution1
1 ACCPTED 2019-04-08 18:25:08

solution2
1 2019-04-08 18:30:14

How to tokenize sentence using nlp

Question

2 answers

solution1 1 ACCPTED 2019-04-08 18:25:08

solution2 1 2019-04-08 18:30:14

solution1
1 ACCPTED 2019-04-08 18:25:08

solution2
1 2019-04-08 18:30:14