自定义Java中字符串数组的解析

Question

我有一个这样的字符串数组（来自Twitter）：

String str= "The Green New Deal is viable. It is the same vision that FDR had for his New Deal programs: nationwide mobilization http://94739 #thegreendeal #nationwide"

我想要的是1）将此字符串转换为数组，2）删除停用词并包含词干3）删除除“＃”（表示术语是主题标签）以外的所有字符。

因此，我尝试使用这个很酷的库https://github.com/uttesh/exude ，它确实阻止并删除了停用词，并小写并删除了字符。 问题在于这会删除主题标签。 代码如下：

String tweetString = ExudeData.getInstance().filterStoppingsKeepDuplicates(str);

我也尝试过这个：

String[] wordArray = str.replaceAll("[^a-zA-Z ]", "").toLowerCase().split("\\s+");

但这也删除了标签。 使用这两种方法保留主题标签的任何解决方法？ （我更愿意为此保留渗出库）

Answer 1

使用regex方法，您可以尝试在不应该删除的字符列表中添加# ，如下所示：

        String[] wordArray = str.replaceAll("[^a-zA-Z #]", "").toLowerCase().split("\\s+");

自定义Java中字符串数组的解析

问题描述

1 个解决方案

解决方案1
1 2019-03-04 17:42:51

自定义Java中字符串数组的解析

问题描述

1 个解决方案

解决方案1 1 2019-03-04 17:42:51

解决方案1
1 2019-03-04 17:42:51