Ticket #239 (fixed enhancement)

Unicode hyperlink not clickable and truncated

  • Created: 2011-01-07 16:55:29
  • Reported by: adaur
  • Assigned to: Reines
  • Milestone: 1.4.4
  • Component: parser
  • Priority: normal


When you try to post http://www.rguihù the link generated isn't valid. This is because the domain contains non-ASCII characters, which must be converted to ASCII using punycode.

An example of a valid URL which is affected is http://☃.net, which should be translated to

Really we have 2 issues here:

  1. The parser isn't picking these up as valid domains, and hence isn't wrapping them in URL tags automatically.

  2. The URL itself isn't converted to punycode.

Obviously (1) is an issue, but I'm not sure if we should handle (2), or let the browser handle it. From what I can see all browsers above IE6 seem to natively support this translation, so it's probably safe to let the browser do it.


adaur 2011-01-07 16:59:04

Visman 2011-01-09 14:06:08

Letter ù is guilty tongue

adaur 2011-01-09 15:04:50

Ok Visman, you're right. But it stills a bug smile

Visman 2011-01-09 15:29:11

The reference with Russian letters behaves similarly.
And the domain ".рф" already is.

Reines 2011-01-21 22:37:47

Reines 2011-01-23 12:24:17

There is a bug here, but it is slightly more subtle than simply not recognizing the character. As you said, enclosing that string in url tags makes it show up as a link, but it still isn't a valid link. Only ASCII characters are valid in domain names, and that one you posted includes some others.

For domain names with non-ASCII characters, the characters must be converted using punycode, giving a valid ASCII domain.

Really this is asking for an extra feature - FluxBB to automatically detect domains with non-ASCII characters, and convert them using punycode.

Reines 2011-01-23 12:27:32

Reines 2011-01-23 12:32:35

Reines 2011-01-23 13:32:43

I suggest we leave the punycode translation up to browsers. In which case all we need to do is change the do_clickable regex to include non-ASCII word characters when looking for URLs.

Reines 2011-01-23 14:07:42

Changing the regular expressions to use \p{L} instead of \w and \P{L} instead of \W. This fixes numerous unicode related issues:
- #166: Censoring still doesn't work fully with utf-8
- #179: Indexer can't recognize Unicode punctuation
- #239: Unicode hyperlink not clickable and truncated

Reines 2011-01-23 14:14:25

  • Owner changed from ridgerunner to Reines.
  • Summary changed from Long hyperlink not clickable and truncated to Unicode hyperlink not clickable and truncated.

I've implemented a fix for this - just testing for some testing.

Reines 2011-01-26 15:24:18

  • Status changed from open to fixed.