简体   繁体   English

如何在不查询的情况下估计HIVE中的表大小?

[英]How can I estimate a table size in HIVE without query?

I want to calculate the table size without querying in HIVE. 我想在不查询HIVE的情况下计算表大小。

How can I do this in HIVE? 我如何在HIVE中做到这一点? (I don't have any permission without selecting in database so I can't use show properties, etc) (我没有在数据库中选择的任何权限,所以我不能使用show属性等)

(For example) (例如)

  • dataRows : 100 dataRows:100

  • columnName(Type) : userName(string), userNumber(int), userCode(bigint), userAge(int) columnName(Type):userName(字符串),userNumber(int),userCode(bigint),userAge(int)

    • maximum length of userName : 36 用户名的最大长度:36

I calculated table size like this. 我这样计算表的大小。

  • I thought like that string is 8bytes, int is 4bytes, bigint is 8bytes (I didn't consider about record header size and column header size) 我认为该字符串是8bytes,int是4bytes,bigint是8bytes(我没有考虑记录头大小和列头大小)

    • 100 * ((8*36)+4+8+4) 100 *((8 * 36)+ 4 + 8 + 4)
    • totalSize : 30,400 bytes totalSize:30,400字节

Would you give me some advice? 你能给我一些建议吗?

hdfs dfs -du -s {table locatoin}

(optional -h) (可选-h)

Eg 例如

hdfs dfs -du -s /user/hive/warehouse/mytable
110265307244  /user/hive/warehouse/mytable

hdfs dfs -du -s -h /user/hive/warehouse/mytable
102.7 G  /user/hive/warehouse/mytable

This is not really possible if you have no access to Hive or HDFS. 如果您无法访问Hive或HDFS,则实际上是不可能的。

Hive could be using different compression mechanisms and that could impact the size of the raw data on HDFS as well. Hive可能使用不同的压缩机制,这也可能影响HDFS上原始数据的大小。 If its stored in plain text, you could potentially use this, but I wouldnt say thats the best way to do this. 如果将其存储为纯文本格式,则可以使用它,但是我不会说这是最好的方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM