R拼写检查/标记器

Question

I'm not sure if R is the right place to try this or not but here's my situation. 我不确定R是否适合尝试这个或不是，但这是我的情况。 I have a character vector full of strings. 我有一个充满字符串的字符向量。

id    Words
 1    'The'
 2    'victory'
 3    'wasgreat'
...   ...

The original data had some encoding problems and some of the strings are concatenizations of several words: 原始数据有一些编码问题，一些字符串是几个单词的连接：

 (ie 'My name is' -> 'Mynameis').

I need to leave the correct words alone and get the misspelled concatenizations separated into their correct substrings. 我需要单独留下正确的单词，并将拼写错误的连接分成正确的子串。

I'm curious if there's any setup in R to handle this type of problem. 我很好奇R中是否有任何设置来处理这类问题。 I think that there are several programs in python that would handle this much better but my python skills are substantially weaker (bordering on non-existent). 我认为python中有几个程序可以更好地处理这个程序但是我的python技能要弱得多（接近不存在）。 However, I'd be willing to consider it as an alternative. 但是，我愿意将其作为替代方案。

Any suggestions? 有什么建议？

Answer 1

最新一期的R Journal有一篇Hornik和Murdoch在R上的一篇关于拼写检查的文章，对救援的递归，它们适用于R源本身。

R拼写检查/标记器

问题描述

1 个解决方案

解决方案1
6 已采纳 2012-03-20 15:58:21

R拼写检查/标记器

问题描述

1 个解决方案

解决方案1 6 已采纳 2012-03-20 15:58:21

解决方案1
6 已采纳 2012-03-20 15:58:21