简体   繁体   English

生成唯一的客户ID /在配置单元中插入唯一的行

[英]Generate unique customer id / insert unique rows in hive

I need to insert unique rows into a hive table based on Customer name and Address. 我需要根据客户名称和地址将唯一的行插入到配置单元表中。

is there anyway we can generate unique value using customer name and address? 无论如何,我们可以使用客户名称和地址产生独特的价值吗? I am looking to generate unique_value column like below and select rows with distinct unique_value. 我希望生成如下所示的unique_value列,并选择具有不同的unique_value的行。

For example like below I want to generate unique_value column 例如下面的例子,我想生成unique_value列

{customer_name} {address} {unique_value} {customer_name} {address} {unique_value}

omar street1 111 奥马尔街1 111

ryan stree2 222 瑞安stree2 222

omar street1 111 奥马尔街1 111

or any other approaches are also appreciated!. 或任何其他方法也表示赞赏!

You can try two things. 您可以尝试两件事。 You can either try having a UUID but that will generate a unique id for each row. 您可以尝试使用UUID,但这将为每一行生成一个唯一的ID。 Something like this would do: 这样的事情会做:

select reflect("java.util.UUID", "randomUUID"), customer_name, address, unique_value from table_name 从表名中选择reflect(“ java.util.UUID”,“ randomUUID”),客户名,地址,唯一值

However if you are planning to have a unique key based on the name and address, you can concat both fields and take a hash of the resulting string (See details of hash function here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF ). 但是,如果您打算基于名称和地址使用唯一键,则可以合并两个字段并对结果字符串进行哈希处理(请参阅此处的哈希函数详细信息: https : //cwiki.apache.org/confluence/ display / Hive / LanguageManual + UDF )。 That will ensure that same name and address gets the same key. 这样可以确保相同的名称和地址获得相同的密钥。 This query should be sufficient: 此查询应足够:

select customer_name, address, hash(concat(customer_name, address)) from table_name 从表名称中选择customer_name,address,hash(concat(customer_name,address))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM