简体   繁体   中英

Issue in Combining splitted String

I have extracted text from "web 2.0 wikipedia" article, and splitted it into "sentences". After that, I am going to create "Strings" which each string containing 5 sentences.

When extracted, the text looks like below, in EditText

在此处输入图片说明

Below is my code

finalText = textField.getText().toString();

String[] textArrayWithFullStop = finalText.split("\\. ");
String colelctionOfFiveSentences = "";

List<String>textCollection = new ArrayList<String>();
for(int i=0;i<textArrayWithFullStop.length;i++)
{
    colelctionOfFiveSentences = colelctionOfFiveSentences +        textArrayWithFullStop[i];
    if( (i%5==0) )
    {
        textCollection.add(colelctionOfFiveSentences);
        colelctionOfFiveSentences = "";
    }
 }

But, when I use the Toast to display the text, here what is gives

Toast.makeText(Talk.this, textCollection.get(0), Toast.LENGTH_LONG).show();

在此处输入图片说明

As you can see, this is only one sentence! But I expected it to have 5 sentences!

And the other thing is, the second sentence is starting from somewhere else. Here how I have extracted it into Toast

Toast.makeText(Talk.this, textCollection.get(1), Toast.LENGTH_LONG).show();

在此处输入图片说明

This make no sense to me! How can I properly split the text into sentences and, create Strings containing 5 sentences each?

textArrayWithFullStop[i]添加". "

colelctionOfFiveSentences = colelctionOfFiveSentences + textArrayWithFullStop[i]+". ";

The problem is that for the first sentence, 0 % 5 = 0, so it is being added to the array list immediately. You should use another counter instead of mod.

finalText = textField.getText().toString();

String[] textArrayWithFullStop = finalText.split("\\. ");
String colelctionOfFiveSentences = "";
int sentenceAdded = 0;

List<String>textCollection = new ArrayList<String>();
for(int i=0;i<textArrayWithFullStop.length;i++)
{
    colelctionOfFiveSentences += textArrayWithFullStop[i] + ". ";
    sentenceAdded++;
    if(sentenceAdded == 5)
    {
        textCollection.add(colelctionOfFiveSentences);
        colelctionOfFiveSentences = "";
        sentenceAdded = 0;
    }
 }

I believe that if you modify the mod line to this:

if(i%5==4)

you will have what you need.

You probably realize this, but there are other reasons why someone might use a ". ", that doesn't actually end a sentence, for instance

I spoke to John and he said... "I went to the store. 
Then I went to the Tennis courts.", 
and I don't believe he was telling the truth because 
1. Why would someone go to play tennis after going to the store and 
2. John has no legs!  
I had to ask, am I going to let him get away with these lies?

That's two sentences that don't end with a period and would mislead your code into thinking it's 5 sentences broken up at entirely the wrong places, so this approach is really fraught with problems. However, as an exercise in splitting strings, I guess it's as good as any other.

As a side problem(splitting sentences) solution I would suggest to start with this regexp

string.split(".(\\[[0-9\\[\\]]+\\])? ")

And for main problem may be you could use copyOfRange()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM