I notice that common applications on a given machine (Mac, Linux, or Windows) have their respective spell checkers. Everything from various IDE, to MS Word/Office, to Note taking software.
I am trying to utilize the built in utility of our respective machines in order to analyze strings for syntactic correctness. It seems that I cant just use what is on the machine and would have to likely download a dictionary in which to compare against.
I was not sure if there was a better way to accomplish this. I was looking at trying to do things locally, but I was not opposed to doing api or curl requests to determine if the words in a string are spelled correctly.
I was looking at:
I was looking at Node packages and noticed spell checker modules which encapsulate wordlists as well.
Is there a way to utilize the built in machine dictionaries at all, or would it be ideal if I download a dictionary / wordlist to compare against?
I am thinking a wordlist might be best bet, but i didnt want to reinvent the wheel. What have others done to accomplish similar?
The Credit is going to Lukas Knuth. I want to give an explicit how to for using dictionary and nspell.
Install The following 2 dependancies:
npm install nspell dictionary-en-us
Here is a Sample File I wrote in order to solve the problem.
// Node File
// node spellcheck.js [path]
// path: [optional] either absolute or local path from pwd/cwd
// if you run the file from within Seg.Ui.Frontend/ it works as well.
// node utility/spellcheck.js
// OR from the utility directory using a path:
// node spellcheck.js ../src/assets/i18n/en.json
var fs = require("fs");
var dictionary = require("dictionary-en-us");
var nspell = require("nspell");
var process = require("process");
// path to use if not defined.
var path = "src/assets/i18n/en.json"
let strings = [];
function getStrings(json){
let keys = Object.keys(json);
for (let idx of keys){
let val = json[idx];
if (isObject(val)) getStrings(val);
if (isString(val)) strings.push(val)
}
}
function sanitizeStrings(strArr){
let set = new Set();
for (let sentence of strArr){
sentence.split(" ").forEach(word => {
word = word.trim().toLowerCase();
if (word.endsWith(".") || word.endsWith(":") || word.endsWith(",")) word = word.slice(0, -1);
if (ignoreThisString(word)) return;
if (word == "") return;
if (isNumber(word)) return;
set.add(word)
});
}
return [ ...set ];
}
function ignoreThisString(word){
// we need to ignore special cased strings, such as items with
// Brackets, Mustaches, Question Marks, Single Quotes, Double Quotes
let regex = new RegExp(/[\{\}\[\]\'\"\?]/, "gi");
return regex.test(word);
}
function spellcheck(err, dict){
if (err) throw err;
var spell = nspell(dict);
let misspelled_words = strings.filter( word => {
return !spell.correct(word)
});
misspelled_words.forEach( word => console.log(`Plausible Misspelled Word: ${word}`))
return misspelled_words;
}
function isObject(obj) { return obj instanceof Object }
function isString(obj) { return typeof obj === "string" }
function isNumber(obj) { return !!parseInt(obj, 10)}
function main(args){
//node file.js path
if (args.length >= 3) path = args[2]
if (!fs.existsSync(path)) {
console.log(`The path does not exist: ${process.cwd()}/${path}`);
return;
}
var content = fs.readFileSync(path)
var json = JSON.parse(content);
getStrings(json);
// console.log(`String Array (length: ${strings.length}): ${strings}`)
strings = sanitizeStrings(strings);
console.log(`String Array (length: ${strings.length}): ${strings}\n\n`)
dictionary(spellcheck);
}
main(process.argv);
This will return a subset of strings to look at and they may be misspelled or false positives.
A false positive will be denoted as:
Obviously, this isnt for all cases, but i added an ignore this string function you can leverage if say it contains a special word or phrase the developers want ignored.
This is meant to be run as a node script.
Your question is tagged as both NodeJS and Python. This is the NodeJS specific part, but I imagine it's very similar to python.
Windows (from Windows 8 onward) and Mac OS X do have built-in spellchecking engines.
Fortunately, there is already a module called spellchecker which has bindings for all of the above. This will use the built-in system for the platform it's installed on, but there are multiple drawbacks:
1) Native extensions must be build. This one has finished binaries via node-pre-gyp, but these need to be installed for specific platforms. If you develop on Mac OS X, run npm install
to get the package and then deploy your application on Linux (with the node_modules
-directory), it won't work.
2) Using build-in spellchecking will use defaults dictated by the OS, which might not be what you want. For example, the used language might be dictated by the selected OS language. For a UI application (for example build with Electron) this might be fine, but if you want to do server-side spellchecking in languages other than the OS language, it might prove difficult.
At the basic level, spellchecking some text boils down to:
You can write part 1 yourself. Part 2 and 3 require a "list of known correct words" or a dictionary. Fortunately, there is a format and tools to work with it already:
.dic
-files. With this, you get to choose the language, you don't need to build/download any native code and your application will work the same on every platform. If you're spellchecking on the server, this might be your most flexible option.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.