简体繁体中英

Building your own text corpus

原文 2012-07-04 11:25:17 6 1 text/ corpus

It may sounds stupid, but do you know how to build text corpus? I have searched everywhere and there is already existing corpus, but I wonder how did they build it? For example, if I want to build corpus with positive and negative tweets, then I have to just make two files? But what about inner of those files? Dont get it(((( in this example he stores pos and neg tweets in RedisDB.

1 answers

But what about inner of those files?

This depends mostly on what library you're using. XML (with a variety of tags) is common, as is one sentence per line. The tricky part is getting the data in the first place.

For example, if I want to build corpus with positive and negative tweets

Does this mean that you want to know how to mark the tweets as positive and negative? If so, what you're looking for is called text classification or semantic analysis.

If you want to find a bunch of tweets, I'd check one of these pages (just from a quick search of my own).

Clickonf5: http://clickonf5.org/5438/download-tweets-pdf-xml-format-local-machine-server/

Quora: http://quora.com/What-is-the-best-tool-to-download-and-archive-Twitter-data-of-certain-hashtags-and-mentions-for-academic-research

Google Groups: http://groups.google.com/forum/?fromgroups#!topic/twitter-development-talk/kfislDfxunI

For general learning about how to create a corpus, I would check out the Handbook of Natural Language Processing Wiki by Richard Xiao.

TensorFlow example for text classification - how to evaluate your own text?

DirectX 11: text output, using your own font texture

HTML make your own <script type=“text/language”>?

Separating non structured sentences from text corpus

Search the sentence in large text sentence corpus

Pairing a gsub function and text file for corpus cleaning

creating corpus from multiple html text files

Can't Inspect Text Corpus in R

Need to set categorized corpus reader in NLTK and Python, corpus texts in one file, one text per line

Convert TDM CSV file into Corpus Format in Text Mining

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question TensorFlow example for text classification - how to evaluate your own text? DirectX 11: text output, using your own font texture HTML make your own <script type=“text/language”>? Separating non structured sentences from text corpus Search the sentence in large text sentence corpus Pairing a gsub function and text file for corpus cleaning creating corpus from multiple html text files Can't Inspect Text Corpus in R Need to set categorized corpus reader in NLTK and Python, corpus texts in one file, one text per line Convert TDM CSV file into Corpus Format in Text Mining

Related Tags

Building your own text corpus

Question

1 answers

solution1 4 2012-07-18 00:26:08

solution1
4 2012-07-18 00:26:08