[RADIATOR] Unicode in nthash passwords

Mon May 11 16:26:59 UTC 2026

Hi Heikki,

thanks for coming back to me!

On Mon, 2026-05-11 at 18:09 +0300, Heikki Vatiainen via radiator wrote:
> A small correction: The 
> current behaviour is to convert octets, a binary stream, which
> includes 
> octets not defined in 8859-1. One option to get started could simply
> be 
> to make it a global option.

Apparently, there are differing opinions in nomenclature as to whether
ISO 8859-1 defines 0x00 through 0x1F as C0 control characters, 0x7F as
DEL, and 0x80 through 0x9F as C1 control characters; or if it leaves
them undefined.

The actual text of ISO/IEC 8859-1:1998 states:

> The shaded positions in the code table [0x00-0x1F, 0x7F-0x9F]
> correspond to bit combinations that do not represent graphic
> characters. Their use is outside the scope of ISO/IEC 8859; it is
> specified in other International Standards, for example ISO/IEC 6429.

For the sake of completeness: ISO/IEC 6429:1992 then defines the C0 and
C1 control characters (sections 5.2 and 5.3), but notes in section
F.8.1 that DELETE (0x7F) has explicitly not been carried over from
ISO/IEC 646; it is, however, reintroduced in ISO/IEC 10646 (Unicode).

I think it's a matter of discussion whether the reference to 6429
within 8859-1 is enough to say "8859-1 defers the definition of control
characters to 6429" or "8859-1 defers the definition of control
characters to other standards, which may or may not include 6429 at the
implementor's discretion".

The ISO-8859-to-Unicode mappings on the Unicode website [0] include
C0+C1+DEL and ReadMe.txt states:

> These tables are considered to be authoritative mappings
> between the Unicode Standard and different parts of
> the ISO/IEC 8859 standard.

Also, most programming environments -- including, apparently, the Perl
installation on my machine -- seem to assume that ISO 8859-1 bytes 0x00
through 0xFF map bijectively to Unicode code points U+0000 to U+00FF.
Thus, I think we can get away with discussing ISO 8859-1 as equal to
the first 256 code points of Unicode; thus, the naive "intersperse NULs
after each byte" algorithm effectively converts ISO 8859-1 to UTF-16LE.

(Let us, for a second, ignore that HTML5 defines "iso-8859-1" as
Windows-1252 and refers to "ISO 8859-1 [including C0+C1+DEL]" as
"isomorphic encoding" instead.) 

You wrote:
> 
> Encoding errors would likely need logging. This would help 
> troubleshooting the cases when the password is correct but the
> encoding 
> is not UTF-8, or whatever the expected encoding is.

Absolutely. Fortunately, there is a failure reason log field. :-D

You wrote:
> Ondra, can you check if NFD is used with your network password? It
> might 
> be needed to make the normalization method configurable too so that
> it 
> matches the processing that's done when the users set their password.

The system that feeds our database with NT hashes certainly does not
normalize to NFD (o_umlaut -> o + combining umlaut), as I have seen
precomposed characters with umlauts. I am not sure if it normalizes to
NFC or NFKC (o + combining umlaut -> o_umlaut), but I doubt it. In
fairness, Windows's password routines don't seem to perform any
normalization either.

I think including normalization support is a good idea in general, but
"don't normalize and hope for the best" should be one of the options.

Thanks for taking your time to look into this matter!

Cheers,
~~ Ondra

[0] https://www.unicode.org/Public/MAPPINGS/ISO8859/