invalid encoding character 2003-09-24 - By Christopher Ebert
Hi,
When Java reads a stream of bytes into characters and encounters a character outside of the encoding (e.g. not in the ISO-8859-1 character set) it replaces the character with a '?'. I believe this behaviour is configurable, but I don't know how (you might have to register your own converter). By the time Xerces (or Xalan) sees the character, it's too late. I'm not sure where you configure it, but looking at the source code, it's a 'substitution mode' flag - there are methods on CharToByteConverter (and ByteToCharConverter if you're going the other way) to set it, but I'm not sure how you can set it in your case. If you set it to 'false', the converter will throw an exception if it encounters an unmappable byte sequence (or charater).
Chris
|
|