简体   繁体   English

在哪里可以找到 BIG 数据集

[英]Where I can find BIG dataset

I'm looking for a huge text classification datasets to apply what I learn in a Machine learning course.我正在寻找一个巨大的文本分类数据集来应用我在机器学习课程中学到的东西。 I'm looking for wide data and tall data .我正在寻找宽数据高数据 What I found till now are data between 200Mb up to 500Mb.到目前为止,我发现的是 200Mb 到 500Mb 之间的数据。 Please is there any repo/url where I can find dataset up to 2gb or more.请问是否有任何 repo/url 可以找到高达 2gb 或更多的数据集。

You can find a good list of some publicly available datasets here: https://github.com/awesomedata/awesome-public-datasets您可以在此处找到一些公开可用数据集的良好列表: https://github.com/awesomedata/awesome-public-datasets

As per example, have a look at CommonCrawl Dataset https://commoncrawl.org/ which has been crawled from 25 billion web pages.例如,查看 CommonCrawl 数据集https://commoncrawl.org/已从 250 亿个 web 页面爬取。 An index with the list of archives can be found here: http://index.commoncrawl.org/可以在此处找到包含档案列表的索引: http://index.commoncrawl.org/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM