简体   繁体   中英

Windows Aero Rendering Bug

So, I have stumbled upon an interesting bug with the Windows API and I'm wondering if anyone has some insight on how to work around it. It seems that even Google has struggled with it. It should be noted that while I will be fixing this in Qt source itself, the problem is with Windows default message handling , not Qt. All of the files that I will mention can be found online as they are all open source libraries. Below is somewhat of a complex problem and I will try and give as much context as possible. I've put a lot of time and effort into fixing this myself, but being that I've only been an engineer for about 8 months, I'm still quite inexperienced and could very well have missed something obvious.

The context:

I have written a program that uses Qt to skin my windows with custom skins. These skins go over the system default non-client UI skins. In other words, I use custom painted frames (supported by Qt). Since Qt5 I've been having issues with my program when it is run on any pre-Windows Aero OS ( less than XP and greater than Vista with Windows Aero disabled). Unfortunately, Qt devs have all but confirmed that they do not really support XP anymore, so I will not rely on them to fix the bug.

The Bug:

Clicking anywhere in the non-client area while running a machine with composition disabled (Windows Aero disabled or not existing) will cause Windows to repaint its system default non-client UI on top of my custom skin.

My Research

A bit of debugging and investigation led me to qWindowsProc in qwindowscontext.cpp. I was able to determine that the last windows message to be handled before my window's skin was painted over was WM_NCLBUTTONDOWN . This seemed strange, so I took to the internets.

Sure enough, I found a file called hwnd_message_Handler.cc that comes from Google's Chromium Embedded Framework (CEF). In that file are many comments about how various windows messages, for some insane reason, cause repaints of the system default non-client frames over custom frames. The following is one such comment.

// A scoping class that prevents a window from being able to redraw in response
// to invalidations that may occur within it for the lifetime of the object.
//
// Why would we want such a thing? Well, it turns out Windows has some
// "unorthodox" behavior when it comes to painting its non-client areas.
// Occasionally, Windows will paint portions of the default non-client area
// right over the top of the custom frame. This is not simply fixed by handling
// WM_NCPAINT/WM_PAINT, with some investigation it turns out that this
// rendering is being done *inside* the default implementation of some message
// handlers and functions:
//  . **WM_SETTEXT**
//  . **WM_SETICON**
//  . **WM_NCLBUTTONDOWN**
//  . EnableMenuItem, called from our WM_INITMENU handler
// The solution is to handle these messages and **call DefWindowProc ourselves**,
// but prevent the window from being able to update itself for the duration of
// the call. We do this with this class, which automatically calls its
// associated Window's lock and unlock functions as it is created and destroyed.
// See documentation in those methods for the technique used.
//
// The lock only has an effect if the window was visible upon lock creation, as
// it doesn't guard against direct visiblility changes, and multiple locks may
// exist simultaneously to handle certain nested Windows messages.
//
// IMPORTANT: Do not use this scoping object for large scopes or periods of
//            time! IT WILL PREVENT THE WINDOW FROM BEING REDRAWN! (duh).
//
// I would love to hear Raymond Chen's explanation for all this. And maybe a
// list of other messages that this applies to ;-)

Also in that file exists several custom message handlers to prevent this bug from occurring. For example, another message I found that causes this bug is WM_SETCURSOR . Sure enough, they have a handler for that which, when ported to my program, worked wonderfully.

One of the common ways they handle these messages is with a ScopedRedrawLock. Essentially, this just locks redrawing at the beginning of the hostile message's default handling (via DefWindowProc) and remains locked for the duration of the call, unlocking itself when it comes out of scope (hence, Scoped RedrawLock). This will not work for WM_NCLBUTTONDOWN for the following reason:

Stepping through qWindowsWndProc during the default handling of WM_NCLBUTTONDOWN, I saw that WM_SYSCOMMAND is handled in the same call stack directly after WM_NCLBUTTONDOWN. The wParam for this particular WM_SYSCOMMAND is 0xf012 - another officially undocumented value**. Luckily in the remarks section of the MSDN WM_SYSCOMMAND page somebody commented about it. Turns out, it is the SC_DRAGMOVE code.

For reasons that may seem obvious, we cannot simply lock redrawing for the handling of WM_NCLBUTTONDOWN because Windows automatically assumes that the user is trying to drag the window if he clicks on a non-client area (in this case, HTCAPTION). Locking here will cause the window to never redraw for the duration of the drag- until Windows receives a button up message(WM_NCLBUTTONUP or WM_LBUTTONUP).

And sure enough, I find this comment in their code,

  if (!handled && message == WM_NCLBUTTONDOWN && w_param != HTSYSMENU &&
      delegate_->IsUsingCustomFrame()) {
    // TODO(msw): Eliminate undesired painting, or re-evaluate this workaround.
    // DefWindowProc for WM_NCLBUTTONDOWN does weird non-client painting, so we
    // need to call it inside a ScopedRedrawLock. This may cause other negative
    // side-effects (ex/ stifling non-client mouse releases).
    DefWindowProcWithRedrawLock(message, w_param, l_param);
    handled = true;
  }

This makes it seem as though they had the same problem, but didn't quite get around to solving it.

The only other place CEF handles WM_NCLBUTTONDOWN in the same scope as this problem is here:

 else if (message == WM_NCLBUTTONDOWN && delegate_->IsUsingCustomFrame()) {
    switch (w_param) {
      case HTCLOSE:
      case HTMINBUTTON:
      case HTMAXBUTTON: {
        // When the mouse is pressed down in these specific non-client areas,
        // we need to tell the RootView to send the mouse pressed event (which
        // sets capture, allowing subsequent WM_LBUTTONUP (note, _not_
        // WM_NCLBUTTONUP) to fire so that the appropriate WM_SYSCOMMAND can be
        // sent by the applicable button's ButtonListener. We _have_ to do this
        // way rather than letting Windows just send the syscommand itself (as
        // would happen if we never did this dance) because for some insane
        // reason DefWindowProc for WM_NCLBUTTONDOWN also renders the pressed
        // window control button appearance, in the Windows classic style, over
        // our view! Ick! By handling this message we prevent Windows from
        // doing this undesirable thing, but that means we need to roll the
        // sys-command handling ourselves.
        // Combine |w_param| with common key state message flags.
        w_param |= base::win::IsCtrlPressed() ? MK_CONTROL : 0;
        w_param |= base::win::IsShiftPressed() ? MK_SHIFT : 0;
      }
    }

And while that handler addresses a similar problem, its not quite the same.

The Question

So at this point I'm stuck. I'm not quite sure where to look. Maybe I'm reading the code incorrectly? Maybe the answer is there in CEF and I'm just overlooking it? It seems like CEF engineers encountered this problem and have yet to come up with the solution, given the TODO: comment. Does anybody have any idea what else I could do? Where do I go from here? Not solving this bug is not an option. I'm willing to dig deeper but at this point I'm contemplating actually handling Windows drag events myself rather than having the DefWindowProc handle it. Though, that might still cause the bug in the case where the user is actually dragging the window.

Links

I have included a list of links that I have been using in my research. Personally, I downloaded CEF source myself so that I could better navigate the code. If you are truly interested in solving this problem, you might need to do the same.

WM_NCLBUTTONDOWN

WM_NCHITTEST

WM_SYSCOMMAND

DefWindowProc

hwnd_message_handler.cc

hwnd_message_handler.h

qwindowscontext.cpp

Tangent

Just to bring validation to CEF's code, if you look in the header of hwnd_message_handler, you will also notice that there are two undocumented windows messages of value 0xAE and 0xAF. I was seeing 0xAE during the default handling of WM_SETICON that was causing problems, and this code helped confirm that what I was seeing was indeed real.

I found this page which suggests hiding your window by removing WS_VISIBLE immediately before calling DefWindowProc() , then showing it immediately after. I haven't tried it, but it's something to look at.

So, the actual way this fix was achieved was by removing the WS_CAPTION flag during NC_LBUTTONDOWN and adding it back during NC_LBUTTONUP message handling. However, because of the way Windows calculates its size before rendering, it could miscalculate since it removes the caption area from consideration. So, you will need to offset this while handling the WM_NCCALCSIZE message.

Keep in mind that the amount of pixels you will need to offset will vary depending on which windows theme or OS you are in. ie Vista has a different theme than XP. So you will need to decide on a scale factor to keep it clean.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM