简体   繁体   中英

Compare multiple dates columns by row ID and highlight them in python or pandas

For the entire data frame I need to compare 4 dates all on the same row. Find the latest date and highlight it. The highlighted cell is the highest between comp1 - comp4.

The output I need will look like this: 在此处输入图片说明

I started by making sure all comps were date times dtypes and I even tried making them objects and comparing them before writing this but with no luck.

Here is what I have tried/searched on line for but none of these work:

checks.style.highlight_max(color= 'yellow', axis=0)

Nothing gets highlighted

I also tried to use subsets but for some reason no matter if check the dtypes on each comp they non not stay a date time or an object but instead become a float for some odd reason

checks.style.highlight_max(color= 'yellow', axis=0, subset=['CAC Clearance', 'ASB Results Received','Arch Assessment','Bio Assessment'])

This is the error i get but I have them all as date times before I run it.

TypeError: '>=' not supported between instances of 'float' and 'datetime.date'

Lastly I tried to do a groupby the ID and even that way I cant not seem to get it to work.

example date using print(checks.head().to_records())/print(checks.head().to_dict())

outputs (only can give certain info for now timestamps)

TypeError Traceback (most recent call last) in ----> 1 print(checks.head().to_records())/print(checks.head().to_dict())

TypeError: unsupported operand type(s) for /: 'NoneType' and 'NoneType'

1st print example:

'2021-10-13T00:00:00.000000000', '2021-10-13T00:00:00.000000000')

2nd print example:

Timestamp('2021-10-13 00:00:00'), 4: Timestamp('2021-10-13 00:00:00')}, 'Bio Assessment': {0: Timestamp('2021-10-13 00:00:00'), 1: Timestamp('2021-10-14 00:00:00'), 2: Timestamp('2021-10-13 00:00:00'), 3: Timestamp('2021-10-13 00:00:00'), 4: Timestamp('2021-10-13 00:00:00')}}

I figured it out.

  • first had to copy my df to stop copy warning

  • then use this code to make all my date time string and fill in the NaT with "0"s. This was the only way I could compare with out a str/int to datetime/timestamp error:

    checks['comp1'] = checks['comp1'].dt.strftime('%Y-%m-%d').fillna("0")

  • I tried to use the highlight style above in the original post but only a few dates would highlight so I made this long function but it works.

side note . It seems like I'm comparing the same comps twice but if I did it any other way some comparisons would not compare due to some comp1s were blank, comp2, etc. The function would go starting into the second matching part.

  • Not all data is filled out for this contract but latest date was needed for over 600,000 records with 1-4 comps.

     def find_lastest_date(df, comp1, comp2, comp3, comp4): # compares comp1 to all other comps if ((df[comp1] > df[comp2]) & (df[comp1] > df[comp3]) & (df[comp1] > df[comp4])): return 'comp1 Latest Date' # compares comp2 to all other comps elif ((df[comp2] > df[comp1]) & (df[comp2] > df[comp3]) & (df[comp2] > df[comp4])): return "comp2 Latest Date" # compares comp3 to all other comps elif ((df[comp3] > df[comp1]) & (df[comp3] > df[comp2]) & (df[comp3] > df[comp4])): return 'comp3 Latest Date' # compares comp4 to all other comps elif ((df[comp4] > df[comp1]) & (df[comp4] > df[comp2]) & (df[comp4] > df[comp3])): return 'comp4 Latest Date' # Comp matches # All comps == "0" leave blank elif ((df[comp1] == "0") & (df[comp2] == "0") & (df[comp3] == "0") & (df[comp4] == "0")): return "" # All comps macth elif ((df[comp1] == df[comp2]) & (df[comp1] == df[comp3]) & (df[comp1] == df[comp4])): return "Lastest Date has Matches" # comparing 3 comp matches # comp1 match only comp2 & comp3 | comp1 matches 3 & 4 elif ((df[comp1] == df[comp2]) & (df[comp1] == df[comp3])) | ((df[comp1] == df[comp3]) & (df[comp1] == df[comp4])): return "Lastest Date has Matches" # comp 2 match only comp1 & comp3 | comp1 matches 3 & 4 elif ((df[comp1] == df[comp2]) & (df[comp2] == df[comp3])) | ((df[comp2] == df[comp3]) & (df[comp2] == df[comp4])): return "Lastest Date has Matches" # comp 3 match only comp1 & comp2 | comp1 matches 2 & 4 elif ((df[comp3] == df[comp1]) & (df[comp3] == df[comp2])) | ((df[comp3] == df[comp2]) & (df[comp3] == df[comp4])): return "Lastest Date has Matches" # comp 4 match only comp1 & comp2 | comp4 matches 2 & 3 elif ((df[comp4] == df[comp1]) & (df[comp4] == df[comp2])) | ((df[comp4] == df[comp2]) & (df[comp3] == df[comp4])): return "Lastest Date has Matches" # 2 comps match # comp1 match to another other comp elif ((df[comp1] == df[comp2]) | (df[comp1] == df[comp3]) | (df[comp1] == df[comp4])): return "Lastest Date has Matches" # comp2 match to another other comp elif ((df[comp2] == df[comp1]) | (df[comp2] == df[comp3]) | (df[comp2] == df[comp4])): return "Lastest Date has Matches" # comp3 match to another comp elif ((df[comp3] == df[comp1]) | (df[comp3] == df[comp2]) | (df[comp3] == df[comp4])): return "Lastest Date has Matches" else: return ""

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM