简体   繁体   中英

removing duplicates in cognos 10 based on non duped column/field

The report below produces the id, name, primary (whether the user is the primary user or not (Y or N) and the date the user joined. I don't want any duplicate id's in the report and I want to base the condition to remove dupes on the primary column. If the name is primary I want to keep the row. If it is not primary I want to remove it only when there is a primary.

ID   Name  Primary   Date

1,     Jerry,          Y,       2/10/12

1,      Jack,          N,       2/10/12

1,      Jerry,         N,        2/10/12

2,       Nancy,        Y,        1/18/17

2,      Chris,         N,        3/4/15

3,       Vicky,        N,         10/2/16

3,       Mary,         Y,        2/2/10

4,       Jeff,         N,         1/1/11

4,       John,         N,         2/2/12

Desired output

ID   Name  Primary   Date

1,     Jerry,          Y,       2/10/12

2,      Nancy,         Y,        1/18/17

3,       Mary,         Y,        2/2/10

4,      Jeff,          N,        2/2/12

Basically I want to show one row per id but it has to display the primary if there is one. If not then it must display the non primary. If there are multiple primary's only show one it doesn't matter which and if there are multiple non primary's (when there aren't any primary's) display only one and it doesn't matter which.

If the condition was based on the date I could use min or max but this one is different.

You can create extra column rank ordered by primary DESC ( Y will then be higher) and date. If you have two with Y or N , rank 1 will be one with newer date. Then you can add filter to query, to filter only columns with rank=1 .

Expression definition for extra column:

rank( [Query2].[PRIMARY] DESC,[Query2].[DATE_ID] for [Query2].[ID]) 

Add a data item called Row Count with the following expression:

running-count(1 for [ID],[Primary])

Set the Aggregate Property of the data item to Calculated . We'll use this data item as a tiebreaker.

Now add this filter:

[Primary] = maximum([Primary] for [ID])
and 
[Row Count] = minimum([Row Count] for [ID],[Primary])

This will show only a single Y row when there are Y rows and fall back to a single N row when there are no Y rows. When the first condition results in multiple rows, the second condition selects the first one. Since you indicated that you didn't care which row was chosen if there were more than one, we could just as easily select the last row by changing the minimum() function in the second part of the filter to maximum() .

Note: The aggregate function maximum() when applied to a single character uses the ASCII numeric representation of the character for comparison purposes. The ASCII value for N is 78 while the value for Y is 89. Thus, Y will always be the maximum when both Y and N values exist. Of course, when only N values exist N becomes the maximum.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM