简体   繁体   English

freebase如何提取所有公司的详细信息?

[英]freebase how to extract all companies detailed information?

i want to extract all the companies detailed information from freebase.我想从 freebase 中提取所有公司的详细信息。 i tried to do that using mql queries.我尝试使用 mql 查询来做到这一点。 But it is never returning me more than 4100 records.但它永远不会返回给我超过 4100 条记录。 i have also tried using cursors also but with cursors also i am able to get same number of records.我也尝试过使用游标,但使用游标我也能够获得相同数量的记录。

I have googled it and some people are suggesting to download the dump and than extract the information.我用谷歌搜索过,有些人建议下载转储而不是提取信息。 Is it the only way?这是唯一的方法吗? if yes then how to get following info from the dump.如果是,那么如何从转储中获取以下信息。 Any help is highly appreciated.任何帮助都受到高度赞赏。

[
  {
    "type": "/business/company",
    "name": null,
    "parent_company": [{}],
    "products": [].
    "industry": [].
    "founded": null,
    "net_income": [
      {
        "amount": null,
        "valid_date": null,
        "currency": null
      }
    ],
    "company_type": [],
    "headquarters": [{}],
    "number_of_employees": [{}],

    "/base/schemastaging/organization_extra/phone_number": [{}]
  }
]

First, the obligatory warning.第一,强制性警告。 Freebase has been read-only for many months and will soon be shut down. Freebase 已经只读了好几个月,很快就会被关闭。 The data there is stale.那里的数据是陈旧的。

I get a count of 4189 for that query, so it sounds like you're pretty close the the expected results.我得到该查询的计数为 4189,因此听起来您非常接近预期结果。 On the other hand, there are over 400K businesses in Freebase, so perhaps you don't really intend to limit your query to only those which have net income information.另一方面,Freebase 中有超过 40 万家企业,因此您可能并不打算将查询限制为仅包含净收入信息的企业。 If that's the case, you can modify your query by adding "optional": true to that clause of the query.如果是这种情况,您可以通过向查询的该子句添加"optional": true来修改您的查询。 ie IE

  "net_income": [{
    "amount": null,
    "valid_date": null,
    "currency": null,
    "optional": true
  }],

Having said that, 400K is an awful lot to query through the API.话虽如此,通过 API 查询 400K 是非常多的。 To get the same information from the Freebase data dump, just filter for the same properties you've included in your query.要从 Freebase 数据转储中获取相同的信息,只需筛选您在查询中包含的相同属性。

Note that there's been some significant refactoring of this schema over the years, so some of the things in your query aren't the currently preferred property names, but rather older aliases.请注意,多年来此架构进行了一些重大重构,因此您查询中的某些内容不是当前首选的属性名称,而是较旧的别名。 For example, the current name for /business/company is /business/business_operation and /business/company/founded is really just an alias for /organization/organization/date_founded, so that's what you'd want to look for in the dump.例如,/business/company 的当前名称是/business/business_operation 而/business/company/founded 实际上只是/organization/organization/date_founded 的别名,因此这就是您要在转储中查找的内容。

In the dump, all slashes (/) are replaced with dots (.), so you can filter using zgrep commands like these:在转储中,所有斜线 (/) 都替换为点 (.),因此您可以使用 zgrep 命令进行过滤,如下所示:

$ zgrep "organization\.organization.\parent" freebase-rdf-2015-04-19-00-00.gz
<http://rdf.freebase.com/ns/m.010b0njl> <http://rdf.freebase.com/ns/organization.organization.parent>   <http://rdf.freebase.com/ns/m.010d_x4z> .
<http://rdf.freebase.com/ns/m.010qw9c3> <http://rdf.freebase.com/ns/organization.organization.parent>   <http://rdf.freebase.com/ns/m.0110pjfc> .

$ zgrep "business\.business_operation\.industry" freebase-rdf-2015-04-19-00-00.gz
<http://rdf.freebase.com/ns/m.010b2kgs> <http://rdf.freebase.com/ns/business.business_operation.industry>   <http://rdf.freebase.com/ns/m.0c5mq>    .
<http://rdf.freebase.com/ns/m.010h6tq9> <http://rdf.freebase.com/ns/business.business_operation.industry>   <http://rdf.freebase.com/ns/m.02y_9m3>  .

For mediators or CVTs, there will be a separate line for each piece of the mediator.对于调解员或 CVT,调解员的每一部分都有单独的一行。 So, for example, a name change might look like this:因此,例如,名称更改可能如下所示:

<http://rdf.freebase.com/ns/m.0q2g4kt>  <http://rdf.freebase.com/ns/business.company_name_change.end_date>  "2004"^^<http://www.w3.org/2001/XMLSchema#gYear>    .
<http://rdf.freebase.com/ns/m.0q2g4kt>  <http://rdf.freebase.com/ns/business.company_name_change.company>   <http://rdf.freebase.com/ns/m.06_dbm>   .
<http://rdf.freebase.com/ns/m.0q2g4kt>  <http://rdf.freebase.com/ns/business.company_name_change.start_date>    "1974"^^<http://www.w3.org/2001/XMLSchema#gYear>    .
<http://rdf.freebase.com/ns/m.0q2g4kt>  <http://rdf.freebase.com/ns/business.company_name_change.new_name>  "Cinar"@en  .

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM