From 6b36602e8b0b455df7e9b134e43bfbb59cbff25d Mon Sep 17 00:00:00 2001
From: david <david@e0a8ed71-7df4-0310-8962-fdc924857419>
Date: Mon, 13 May 2013 05:21:22 +0000
Subject: [PATCH] =?UTF-8?q?UTF-8=E2=80=93aware=20escaping=20in=20XML=20tod?=
 =?UTF-8?q?o.?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 todo/nmap.txt | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/todo/nmap.txt b/todo/nmap.txt
index bcca92547..2354f7b78 100644
--- a/todo/nmap.txt
+++ b/todo/nmap.txt
@@ -66,6 +66,16 @@ o NSE digest auth should use the more robust parsing from
   http.parse_www_authenticate as described at
   http://seclists.org/nmap-dev/2012/q3/868
  
+o Treat the input to the escape function in xml.cc as UTF-8, not just
+  ASCII. Good UTF-8 should survive into the output; i.e., "\xe2\x98\xbb"
+  should become "\xe2\x98\xbb" in the output, not "&#xe2;&#x98;&#xbb;".
+  If the input happens not to be UTF-8, (like the file name in
+  http://seclists.org/nmap-dev/2013/q1/180), I suppose we can
+  individually encode each byte of each invalid sequence: "\xba\xda\xbf"
+  becomes "&#xba;&#xda;&#xbf;". Can probably do this with simple
+  byte->rune and rune->byte functions as in
+  http://plan9.bell-labs.com/sys/doc/utf.html.
+
 o We should probably redo the Nmap header (e.g. on http://nmap.org) to
   make it more attractive.  Or, at a minimum we should update the
   screenshots and think about which links we really need (some of those