简体   繁体   English

当年、月、日存储在不同的列中时如何获取不同的值

[英]How to get distinct values when year, month, day of date are stored in different columns

I am using AWS Athena to query count of distinct values for a column for last 7 days.我正在使用 AWS Athena 查询过去 7 天列的不同值的计数。

The query is invoked by a lambda function which is invoked on every Sunday of the month and pulls data from last Sunday to this Saturday.该查询由 lambda function 调用,该查询在每月的每个星期日调用,并从上星期日到本星期六提取数据。

So, for example, If today is 11th September 2022, Sunday, then the lambda will try to query the table from 4th Sep '22, Sunday till 10th Sep '22, Saturday and the query looks like this.因此,例如,如果今天是 2022 年 9 月 11 日,星期日,那么 lambda 将尝试从 22 年 9 月 4 日,星期日到 22 年 9 月 10 日,星期六,查询表,查询看起来像这样。

SELECT
    col1,
    col2,
    COUNT(DISTINCT col3) AS distinctValues
FROM "dbName"."tbl"
WHERE year = '2022'
    AND month = '09'
    AND day IN ('04','05','06','07','08','09','10' )
GROUP BY 
    col1,
    col2;

year, month and day are different columns and therefore we have IN clause for day column.年、月和日是不同的列,因此我们有日列的IN子句。

Now the issue is, if the query has to be run on 4th September 2022, then two months have to be considered.现在的问题是,如果查询必须在 2022 年 9 月 4 日运行,则必须考虑两个月。 The query has to be run to get data from 28th Aug '22, Sunday to 3rd Sep '22, Saturday.必须运行查询以获取从 22 年 8 月 28 日星期日到 22 年 9 月 3 日星期六的数据。

I cannot run this query to get the data as it will not contain correct count of distinct values.我无法运行此查询来获取数据,因为它不包含不同值的正确计数。

SELECT
    col1,
    col2,
    COUNT(DISTINCT col3) AS distinctValues
FROM "dbName"."tbl"
WHERE year = '2022'
    AND month IN ('08','09')
    AND day IN ('28','29','30','31','01','02','03' )
GROUP BY 
    col1,
    col2;

And I can also not process the results from two separate queries for Aug and Sep months because distinct values will not be correct.而且我也无法处理来自 8 月和 9 月的两个单独查询的结果,因为不同的值将不正确。

What can be done here to get distinct values for date ranges spread between two months and considering the table schema that year, month and day are stored in different columns?在这里可以做些什么来获得分布在两个月之间的日期范围的不同值,并考虑到年、月和日存储在不同列中的表模式?

So you should compare month and days to gether and for other month the ondition must be repeated separately:因此,您应该比较月份和日期,并且对于其他月份,必须单独重复该条件:

SELECT
    col1,
    col2,
    COUNT(DISTINCT col3) AS distinctValues
FROM "dbName"."tbl"
WHERE year = '2022'
    AND (month = '08' AND day IN ('X', 'Y' /*prefered days*/)
        OR  (month = '09' AND day IN ('A', 'B')))
GROUP BY 
    col1,
    col2;

Better you pass the StartDate and EndDate as a Date field and Combine year, month and day columns of table together to form a computed DATE using date_parse and then compare the range.最好将 StartDate 和 EndDate 作为日期字段传递,并将表的年、月和日列组合在一起,使用 date_parse 形成计算的 DATE,然后比较范围。

SELECT
    col1,
    col2,
    COUNT(DISTINCT col3) AS distinctValues
FROM "dbName"."tbl"
WHERE 
    date_parse(cast(year * 10000 + month * 100 + day as varchar(255)), '%Y%m%d') BETWEEN @StartDate and @EndDate
GROUP BY 
    col1,
    col2;

Please use date_parse method in AWS Athena to get date from Year, Month and Date.请使用 AWS Athena 中的 date_parse 方法从年、月和日期中获取日期。 Please refer Create date from integers in separate fields in athena aws请参阅从 athena aws 中单独字段中的整数创建日期

I would recommend converting the data to date and processing it as one (using date_diff and between for example).我建议将数据转换为日期并将其作为一个处理(例如使用date_diffbetween )。 Possibly shortest (in terms of code) way would be using array_join (if every part is in correct format).可能最短的(就代码而言)方法是使用array_join (如果每个部分的格式都正确)。 Something along this lines:沿着这条线的东西:

SELECT col1,
       col2,
       COUNT(DISTINCT col3) AS distinctValues
FROM "dbName"."tbl"
WHERE date_diff('day', date(array_join(array[year, month, date], '-')), now())
    between 1 and 7
GROUP BY col1,
         col2;

Though if those fields are used to partition data possibly you will need to go with another approach for better performance.尽管如果这些字段用于分区数据,您可能需要使用另一种方法 go 以获得更好的性能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如果列(日,月和年)分别存储在SQL Server中,则比较日期 - Compare the date if the columns (day, month and year) are stored separately in SQL Server 如何根据字段获取每个月的最后一天 - How to get the last day for each month in a distinct year based on a field 当两者都存储在 SQL Server 中的不同列中时,选择具有较近年份和月份的值 - Select values with a more recent year and month when both are stored in different columns in SQL Server 如果月份和年份以整数值存储在单独的列中,如何进行SQLite日期比较 - How to do SQLite date comparison if month and year are stored in separate columns as integer values 将多列(年,月,日)转换为日期 - Convert multiple columns (year,month,day) to a date 如何从格式化的列日期中获取特定的日期月份或年份 - how to get specific day month or year from a column date formated 如何比较日期(年,月)和日期(日,月,年) - How to compare date(year, month) with date(day, month, year) 结合日期,月份和年份获得最高日期 - Get the highest date with combination of day,month and year 当日期分别存储为年,月和日时,查找范围内的日期 - Find dates inside range when date is stored as year, month and day separately 如何从日期中提取日、月和年? - How to extract the Day, Month and Year from the date?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM