From 6b36602e8b0b455df7e9b134e43bfbb59cbff25d Mon Sep 17 00:00:00 2001 From: david Date: Mon, 13 May 2013 05:21:22 +0000 Subject: [PATCH] =?UTF-8?q?UTF-8=E2=80=93aware=20escaping=20in=20XML=20tod?= =?UTF-8?q?o.?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- todo/nmap.txt | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/todo/nmap.txt b/todo/nmap.txt index bcca92547..2354f7b78 100644 --- a/todo/nmap.txt +++ b/todo/nmap.txt @@ -66,6 +66,16 @@ o NSE digest auth should use the more robust parsing from http.parse_www_authenticate as described at http://seclists.org/nmap-dev/2012/q3/868 +o Treat the input to the escape function in xml.cc as UTF-8, not just + ASCII. Good UTF-8 should survive into the output; i.e., "\xe2\x98\xbb" + should become "\xe2\x98\xbb" in the output, not "☻". + If the input happens not to be UTF-8, (like the file name in + http://seclists.org/nmap-dev/2013/q1/180), I suppose we can + individually encode each byte of each invalid sequence: "\xba\xda\xbf" + becomes "ºÚ¿". Can probably do this with simple + byte->rune and rune->byte functions as in + http://plan9.bell-labs.com/sys/doc/utf.html. + o We should probably redo the Nmap header (e.g. on http://nmap.org) to make it more attractive. Or, at a minimum we should update the screenshots and think about which links we really need (some of those