简体   繁体   English

如何在python中打印唯一行

[英]How do you print unique rows in python

I am pulling data from oracle db and need print the unique values in standard output: 我正在从oracle db中提取数据,需要在标准输出中打印唯一值:

My data is like this: 我的数据是这样的:

server1.CRITICAL_INCIDENTS 1418223897 0.000000 host=server1 type=oracle_database source=Oracle dc=DC1
server1.ResponseTimepertransaction 1418223577 2.467900 host=server1 type=oracle_database source=Oracle dc=DC1
server1.DataDictionaryHitPercent 1418223577 100.000000 host=server1 type=oracle_database source=Oracle dc=DC1
server1.FullIndexScanspersecond 1418223577 0.000000 host=server1 type=oracle_database source=Oracle dc=DC1
server1.ExecutesPerformedwithoutParsesPercent 1418223577 66.666667 host=server1 type=oracle_database source=Oracle dc=DC1
server1.SortsinMemoryPercent 1418223577 100.000000 host=server1 type=oracle_database source=Oracle dc=DC1
server1.BufferCacheHitPercent 1418223577 100.000000 host=server1 type=oracle_database source=Oracle dc=DC1
server1.DatabaseCPUTimePercent 1418223577 81.048665 host=server1 type=oracle_database source=Oracle dc=DC1
server1.CRITICAL_INCIDENTS 1418223897 0.000000 host=server1 type=oracle_database source=Oracle dc=DC1
server1.CRITICAL_INCIDENTS 1418223897 0.2000000 host=server1 type=oracle_database source=Oracle dc=DC1
server1.ResponseTimepertransaction 1418223577 2.467900 host=server1 type=oracle_database source=Oracle dc=DC1

When I am printing this values, I only need to print the unique values. 当我打印此值时,我只需要打印唯一值。 The data that comes from Oracle is Date format and when I convert this to epoch, I may get duplicate values for the same time for the same metric. 来自Oracle的数据是日期格式,当我将其转换为纪元时,对于相同的指标,我可能同时获得重复的值。 If the values for time stamp and metric is the same, I only need to print one of them. 如果时间戳记和度量标准的值相同,则只需要打印其中之一即可。

For example, I only need to print one of these lines. 例如,我只需要打印这些行之一。 They have the same time (1418223897) and the same metric (server1.CRITICAL_INCIDENTS). 它们具有相同的时间(1418223897)和相同的度量标准(server1.CRITICAL_INCIDENTS)。 The values are diffent, one is 0.0000000 and 0.2000000, but it is ok to have different values. 值是不同的,一个是0.0000000和0.2000000,但是可以使用不同的值。

server1.CRITICAL_INCIDENTS 1418223897 0.000000 host=server1 type=oracle_database source=Oracle dc=DC1
    server1.CRITICAL_INCIDENTS 1418223897 0.2000000 host=server1 type=oracle_database source=Oracle dc=DC1

I tried this: 我尝试了这个:

import pyodbc
import re
sql="DateTime, Server, Server_Type, Metric, Value from oracle_table"

cnxn = pyodbc.connect("DSN=dsn1;UID=userid;PWD=passwd123")

cursor = cnxn.cursor()


cursor.execute(sql)
row = cursor.fetchall()

    seenAlready = set()
    for line in row:
        if line[4]:
            if float(line[4])>=0:
                outputLine = line[0],line[1],line[2],line[3],line[4]
                outputLine1=line[0],line[3] #DateTime and Metric

                if outputLine1 in seenAlready:
                            continue
                    else:
                            print ' '.join([str(i) for i in outputLine])
                            seenAlready.add(outputLine1)

This is not quite working because even tought DateTime and Metric are the same, each row may be unique because Value my be different. 这不是很有效,因为即使tought DateTime和Metric都一样,每行可能都是唯一的,因为Value可能不同。

How could I fix it so that I only print one line for the same DateTime and Metric row? 如何解决此问题,以便只为同一DateTime和Metric行打印一行?

If you collect all the data and put it into a set (or put each column in its own set) then you will never have duplicates because sets can only have unique items. 如果您收集所有数据并将其放入集合中(或将每个列放入其自己的集合中),那么您将永远不会有重复项,因为集合只能具有唯一项。 If a value is an exact duplicate of another value then it should just ignore that if you try and add it to the set. 如果一个值与另一个值完全相同,则尝试将其添加到集合中时,应忽略该值。 After that you can just loop through the set with a print function. 之后,您可以使用打印功能遍历该设置。

seenAlready = set()
for line in row:
    if line[4]:
        if float(line[4])>=0:
            outputLine = line[0],line[1],line[2],line[3],line[4]
            outputLine1=line[0],line[3] #DateTime and Metric

            seenAlready.add(outputLine1)

for line in seenAlready:
    print(line) #or whatever formatted value you need to print

Something like this, or in the same vein of thinking. 这样的事情,或与之相同的想法。 Printing from the set would make the most sense because you are guaranteed to have a set of unique items. 从该组进行打印将是最有意义的,因为可以确保您拥有一组独特的项目。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM