简体   繁体   English

通过另一列SQL中的唯一条目对列进行分组

[英]Group column by unique entries in another column SQL

I have the following dataset 我有以下数据集

在此处输入图片说明

I would like to query the data to produce a list of unique hostnames per username with the last login time for that record also included. 我想查询数据以生成每个用户名的唯一主机名列表,其中还包含该记录的最后登录时间。 Eg produce the following dataset. 例如,产生以下数据集。

在此处输入图片说明

The goal is to detect users account sharing, and also users with an abnormally large number of host names. 目的是检测用户帐户共享,以及具有异常大量主机名的用户。

I know enough SQL to get myself into trouble but I simply do not write queries often enough to be proficient enough to write this one without blowing half a day on it. 我知道有足够多的SQL会给自己带来麻烦,但是我只是写的查询不够频繁,以至于不能熟练地编写此查询而不会花半天时间。 Can anyone assist? 有人可以协助吗?

We are using Azure SQL (SQL Server), however I can translate answers from another SQL language. 我们正在使用Azure SQL(SQL Server),但是我可以翻译其他SQL语言的答案。

Thank you 谢谢

UPDATE 更新

I have used the following 我已经使用以下

select username, hostname, max(logintimeutc)
from loginrecords
group by username, hostname

which returns a good dataset, however when I try the following it returns 0 records despite the query above showing multiple usernames against the same hostname 这将返回一个良好的数据集,但是,尽管上面的查询显示了针对同一主机名的多个用户名,但当我尝试以下操作时,它仍返回0条记录

select username, hostname, max(logintimeutc)
from loginrecords
group by username, hostname
having count(distinct(hostname)) > 1

I would like to query the data to produce a list of unique hostnames per username with the last login time for that record also included. 我想查询数据以生成每个用户名的唯一主机名列表,其中还包含该记录的最后登录时间。

I think you just want group by : 我认为您只想group by以下方式group by

select username, hostname, max(logintimeutc)
from t
group by username, hostname;

you can use row_number() for this. 您可以为此使用row_number()。

select * from table1 t1
inner join
    (select row_number() over (partition by HostName, UserName order by LoginTimeUTC desc) as rn, UserName
            ,LoginTimeUTC, HostName from table1) as t2
on t2.UserName = t1.UserName and t2.LoginTimeUTC = t2.LoginTimeUTC and t2.HostName = t1.HostName
where t2.rn = 1

If I understand right, 2 results are expected without considering the login time, pls have a try below query: 如果我理解正确,不考虑登录时间就可以得到2个结果,请尝试以下查询:

select username,hostname,
count(*) over (partition by hostname) as NUMBER_OF_USERS_FOR_THIS_HOST,
count(*) over (partition by username) as NUMBER_OF_HOSTS_FOR_THIS_USER
from loginrecords
group by username, hostname;

First I created a test environment using the queries below. 首先,我使用以下查询创建了一个测试环境。 It would be nice if you provide these (or textual tabular data) yourself in future questions. 如果您自己在以后的问题中提供这些(或文本表格数据),那将是很好的。 Screenshots with data are very unfriendly for testing purposes. 带有数据的屏幕截图对于测试目的非常不友好。

CREATE TABLE [LoginRecords] (
    [LoginTimeUTC] SMALLDATETIME,
    [UserName] VARCHAR(5),
    [HostName] VARCHAR(5)
);
GO

INSERT INTO [LoginRecords] VALUES
    ('2019-08-22T09:51:00', 'user1', 'host1'),
    ('2019-08-25T09:31:00', 'user1', 'host2'),
    ('2019-08-30T10:51:00', 'user1', 'host2'),
    ('2019-08-25T09:51:00', 'user2', 'host2'),
    ('2019-08-25T05:51:00', 'user2', 'host3'),
    ('2019-08-30T09:51:00', 'user2', 'host3'),
    ('2019-08-25T09:31:00', 'user3', 'host4'),
    ('2019-08-30T10:51:00', 'user3', 'host4'),
    ('2019-08-25T09:51:00', 'user3', 'host4'),
    ('2019-08-25T05:51:00', 'user3', 'host5'),
    ('2019-08-25T09:51:00', 'user4', 'host6'),
    ('2019-08-25T09:31:00', 'user4', 'host6'),
    ('2019-08-30T10:51:00', 'user4', 'host6'),
    ('2019-08-25T09:51:00', 'user4', 'host7'),
    ('2019-08-30T05:51:00', 'user4', 'host7');
GO

SELECT [LoginTimeUTC], [UserName], [HostName]
FROM [LoginRecords];

Now to your actual issue at hand. 现在到您的实际问题。 I am regarding your last query that does not return your desired results: 我正在考虑您的最后一个查询,该查询未返回您想要的结果:

select username, hostname, max(logintimeutc)
from loginrecords
group by username, hostname
having count(distinct(hostname)) > 1

Instead of the HAVING-clause, you could add a WHERE-clause to filter only the usernames that are used with multiple hostnames. 除了HAVING子句,您还可以添加WHERE子句以仅过滤与多个主机名一起使用的用户名。

select username, hostname, max(logintimeutc)
from loginrecords
where username in (select username
                   from loginrecords
                   group by username
                   having count(distinct hostname) > 1)
group by username, hostname

This gives the following results: 得到以下结果:

username      hostname      (No column name)
user1         host1         22/08/2019 9:51
user1         host2         30/08/2019 10:51
user2         host2         25/08/2019 9:51
user2         host3         30/08/2019 9:51
user3         host4         30/08/2019 10:51
user3         host5         25/08/2019 5:51
user4         host6         30/08/2019 10:51
user4         host7         30/08/2019 5:51

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM