Ticket #10474 (closed defect: fixed)
AIHTMLDecoder and Unicode newlines
| Reported by: | rgovostes | Owned by: | nobody |
|---|---|---|---|
| Milestone: | Component: | Adium Core | |
| Version: | Severity: | normal | |
| Keywords: | Cc: | ||
| Patch Status: |
Description
I've noticed that pasted text with line breaks, especially when copied from Safari or Word, sometimes shows up in the message view stripped of all breaks. More interestingly, though the problem shows up in the message view it does not appear in the logs or in Growl notifications.
The following HTML, rendered by Safari and pasted into Adium, triggers the issue:
<p> This is on the first line.<br /> This is on the second line. </p>
However, remove those paragraph tags and the resulting paste doesn't show the issue. And render it in Firefox and the paste will work fine too.
As it turns out, when we stick the paragraph tags in, it makes WebKit change its mind about how the characters are encoded when copied. Without the paragraph tags, they show up as regular linefeeds (0x0A), but with the paragraph tags they become line separators (U+2028).
The encoding causes it to bypass AIHTMLDecoder's substitutions in -encodeLooseHTML:imagesPath:, which are only set up to recognize \n and \r:
- When sending the message to AIM's server, thingsToInclude.nonASCII = false, so it does a very rudimentary find/replace of \r\n, \r, and \n.
- When sending the message to the message view, thingsToInclude.nonASCII = true, so we end up around line 620 being escaped as 

Ideally this code would be updated to properly replace all Unicode line breaks as <br>. Wikipedia has the exhaustive list taken from the Unicode Standard 4.0 guidelines:
http://en.wikipedia.org/wiki/Newline#Unicode
Otherwise, Apple's character sets and Unicode utilities don't seem to include all of those (strangely enough), so the list may need to be hardcoded.

