简体   繁体   English

SQL/Hive 查询以计算特定值每天的行数

[英]SQL/Hive Query to Count number of rows for each day for a certain value

I am currently working on a Python script that uses a query to extract data from our Hive Server.我目前正在研究一个 Python 脚本,该脚本使用查询从我们的 Hive 服务器中提取数据。 I am expecting an output that will filter the cardnumbers that have x or more transactions per day depending on the input for "TxnCount".我期待一个 output 将根据“TxnCount”的输入过滤每天有 x 次或更多交易的卡号。

Inputs are: DateTime1, DateTime2, MerchantID, CardNum, terminalID and TxnCount.输入为:DateTime1、DateTime2、MerchantID、CardNum、terminalID 和 TxnCount。

My Code (not working):我的代码(不工作):

Query = "SELECT TRIM(i002_number) as CardNum, i004_amt_trxn, TRIM(i042_merch_id) as MerchantID, i043a_merch_name, TRIM(i041_pos_id) as TerminalID, \
i049_cur_trxn, i062v2_trans_id, i003_proc_code, i006_amt_bill, i051_cur_bill, amt_card, cardcurrency, ltimestamp, \
i039_rsp_cd, i018_merch_type, i043b_merch_city, i043c_merch_cnt, i022_pos_entry, i032_acquirer_id, trxntype, reasoncode, \
SUBSTRING(i002_number, 1, 6) AS issuer_bin, COUNT(i002_number) as txncount\
CASE \
    WHEN SUBSTRING(i002_number,1,1) = 5 THEN 'MasterCard' \
    WHEN SUBSTRING(i002_number,1,1) = 4 THEN 'VISA' \
END AS source \
FROM tsys.ods_authorizations \
WHERE ltimestamp >= '"+DateTime1+"' AND ltimestamp <= '"+DateTime2+"' AND i042_merch_id = "+MerchantID+" \
AND i002_number = "+CardNum+" AND i041_pos_id = "+terminalID+""
HAVING txncount >= '"+TxnCount+"'

Sample Expected Data (truncated):样本预期数据(截断):

CardNum         TimeStamp           TxnCount
123      2019-06-01 00:00:30.00        2   
123      2019-06-01 05:00:20.00        2
123      2019-06-03 20:00:00.00        1
456      2019-06-04 06:00:00.00        2
456      2019-06-04 00:00:10.91        2
789      2019-06-01 12:00:40.51        1

I think my problem here is that it cannot count per card number since I am having problems with the GROUP BY clause.我认为我的问题是它无法计算每个卡号,因为我遇到了 GROUP BY 子句的问题。 Also, I have not yet split the date and time and the query would not recognize the difference between dates yet.另外,我还没有拆分日期和时间,查询还不能识别日期之间的差异。

Your query is malformed.您的查询格式不正确。 You have a COUNT() with a bunch of other columns -- and you have no GROUP BY .你有一个COUNT()和一堆其他列 - 你没有GROUP BY This is not allowed in SQL.这在 SQL 中是不允许的。

I would advise you to use parameters as well, rather than munging the query string.我建议您也使用参数,而不是修改查询字符串。 So, you presumably want something like this.所以,你大概想要这样的东西。

Your results appear to want the details of every transaction, rather than a summary (so you have two rows with a count of "2" rather than one).您的结果似乎需要每笔交易的详细信息,而不是摘要(因此您有两行计数为“2”而不是一个)。 That suggests that you really want window functions:这表明你真的想要 window 函数:

SELECT a.*
FROM (SELECT a.*,
             (CASE WHEN i002_number LIKE '5%' THEN 'MasterCard'
                   WHEN i002_number LIKE '4%' THEN 'VISA'
              END) AS source
             COUNT(*) OVER (PARTITION BY i002_number, TRUNC(ltimestamp, 'DAY')) as txncount
      FROM FROM tsys.ods_authorizations a
      WHERE ltimestamp >= :timestamp1 AND
            ltimestamp <= :timestamp2 AND
            i042_merch_id = :MerchantID AND
            i002_number = :CardNum AND 
            i041_pos_id = :terminalID
     ) a
WHERE txncount >= :TxnCount

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM