June 5th, 2025
1 reaction

Why do some Windows functions fail if I pass an unaligned Unicode string?

A customer found that if they passed Unicode strings (which in Windows means strings encoded as UTF-16LE using the two-byte data type wchar_t as code units) which are not on even addresses, then some—but not all—functions fail to accept those strings. Why isn’t this documented?

This is one of the ground rules for programming: Pointers must be properly aligned unless explicitly permitted otherwise.

In the C and C++ languages, forming an unaligned pointer is explicitly specified to return no useful value.

In C:

(6.3.2.3 Pointers) If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined.

In C++:

[expr.static.cast](13) If the original pointer value represents the address A of a byte in memory and A does not satisfy the alignment requirement of T, then the resulting pointer value is unspecified.

Therefore, simply creating a misaligned pointer already takes you outside the world of allowable (in C) or at least meaningful (in C++) operations, so you shouldn’t be surprised that using misaligned pointers results in nonsense.

As for why certain functions get more upset than others, it’s all a matter how how those functions use the pointers and who detects the misaligned pointer.

If you are using a processor that is alignment-sensitive, you will probably get a failure when the code tries to read the data from that pointer. If the access is made in user mode, you will get an access violation exception, and the process will probably crash. If the access is made in kernel mode, the kernel mode parameter validator will probably return an invalid parameter error. (Kernel mode must protect itself from user mode.)

If you are using a processor that forgives misaligned data accesses, then you may get away with it for a while, until the code does something with the data that requires alignment. For example, atomic operations typically require aligned data, even on processors that are normally forgiving of misalignment.

And even though x86-64 is generally alignment-forgiving, there are still places where it is alignment sensitive. For example, some instructions involving SIMD registers require alignment. SIMD registers are often used for copying blocks of memory around, and since wchar_t has 2-byte alignment, the switch statement for performing block copies has only 8 legal starting points out of 16, since all the odd addresses are invalid. If you pass an odd address, you might well fall through the switch statement and perform garbage copies.

The Microsoft C++ compiler has a special nonstandard keyword __unaligned for declaring that a pointer may be unaligned, and this tells the compiler that any accesses to the data behind that pointer must use instructions that are alignment-forgiving. For some processors, this can be quite expensive.

Limit your use of misaligned pointers to places where misaligned pointers are expressly permitted. You can tell where those places are by looking for the Windows SDK macro UNALIGNED. For example:

LWSTDAPI_(int)
    SHFormatDateTimeA(
        _In_ const FILETIME UNALIGNED * pft,
        _Inout_opt_ DWORD * pdwFlags,
        _Out_writes_(cchBuf) LPSTR pszBuf,
        UINT cchBuf); 
Topics
Code

Author

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

11 comments

  • Jan RingoÅ¡ · Edited

    Hmm, passing unicode strings to API functions.
    I wish I could pass std::wstring_view (-ish object, or reference to) to them, and save myself copy and/or allocation, when most of them internally convert it again to UNICODE_STRING anyway.

  • Shawn Van Ness 2 days ago

    I request a follow-up, to tell us why SHFormatDateTimeA() needed to accept unaligned FILETIME structs .. sounds like there’s a good story there?

    • Joshua Hudson 2 days ago

      FILETIME is misaligned in some on-disk structures, so it could not be fixed even when we made the jump to 64 bit Windows. Not taking misaligned FILETIMEs is asking for trouble.

      • Shawn Van Ness 2 days ago

        Some binary data files were written with 32-bit windows, but need to be read by 64-bit code? Sounds plausible — I’m just surprised this doesn’t come up more often.

        I’m working with a very old (late 90s) codebase that reads and writes a ton of little binary file formats, like that. I wonder how much we’re getting away with x64 being “alignment-forgiving”. We are porting to arm64, so I guess we’ll find out. :/

  • ketil albertsen · Edited

    Every time I read something about alignment issues, I ask myself: What really did we start building byte adressable machines for?

    If you cannot use byte addresses freely, but must restrict yourself to using word addresses, what are byte addresses for? Even old architectures from the 1950s-60s, such as the Univac 1100 series (and dozenzs of others), did have facilities for handling character strings. Maybe the original IBM 360 design made it simpler ... until those super-performance architectures arrived, telling: Don't make use of that new-won freedom you got! Do as you did before IBM 360 - stick to word addresses!

    I...

    Read more
    • Danielix Klimax

      But you can freely address bytes. You get into trouble only when you are pointing at larger elements…

      • Joshua Hudson 22 hours ago

        @ketil albertsen

        Seems to be because most modern frameworks are based on Javascript, Java, or C#. But in fact we did a lot of internal string representation in UTF-8 in prior decades. PHP used an internal string representation of UTF-8.

        I'm looking at an oddball case; if we had a complete string processing library in UTF-8 available to me in .NET; including .cshtml files we would almost certainly go through our innards with a rototill to replace the builtin string with a UTF-8 string everywhere, and push remaining UTF-16 calls down into the database adapter. The performance boost would be worth...

        Read more
      • ketil albertsen

        "... only when you are pointing at larger elements ..."

        Such as UTF-16. Independent 8-bit entities are really a special case nowadays. The very most of data elements are part of larger structures. A class member is part of the structure of instance value members. If an instance has single local byte variable, it cannot, on an alignment sensitive architecture, be allocated at an arbitrary address, but must honor alignment requirements.

        Cole Turbin mentions byte addressed UTF-8, which is certainly the best choice for external data representation. I refuse to believe that it is "the most common code page" for internal working...

        Read more
      • Cole Tobin 2 days ago

        Exactly. UTF-8, the most common codepage (and the only one you should ever use), is entirely byte-addressed!

  • Joshua Hudson 3 days ago

    The only one I would have actually expected to work is MultiByteToWideChar:

    MultiBiyteToWideChar(1200, 0, unaligned_ptr, byte_length, aligned_wide_char_ptr, aligned_length);
    MultiBiyteToWideChar(1201, 0, unaligned_ptr, byte_length, aligned_wide_char_ptr, aligned_length);

    But it doesn’t because 1200 and 1201 aren’t implemented. So native programmers get to write compile-time checks for alignment.