Forums

Unfortunately no one can be told what FluxBB is - you have to see it for yourself.

You are not logged in.

#1 2010-01-10 22:19:16

MattF
Member
From: South Yorkshire, England
Registered: 2008-05-06
Posts: 1,230
Website

UTF-8 and html characters

One of those possibly stupid questions, big_smile but started wondering earlier and it's bugging me now. If a client browser is using UTF-8 encoding, are all characters submitted in that encoding to the server, or are the likes of <>& in the normal ISO type format? Hope that makes sense, btw.


Screw the chavs and God save the Queen!

Offline

#2 2010-01-12 00:56:54

MattF
Member
From: South Yorkshire, England
Registered: 2008-05-06
Posts: 1,230
Website

Re: UTF-8 and html characters

Would I be correct in assuming that these two expressions:

'/</'
'/\x3c/u'

would match the < symbol using preg_replace whichever encoding is used?

Last edited by MattF (2010-01-12 00:57:37)


Screw the chavs and God save the Queen!

Offline

#3 2010-01-12 08:55:26

Reines
Lead developer
From: Scotland
Registered: 2008-05-11
Posts: 3,165
Website

Re: UTF-8 and html characters

MattF wrote:

One of those possibly stupid questions, big_smile but started wondering earlier and it's bugging me now. If a client browser is using UTF-8 encoding, are all characters submitted in that encoding to the server, or are the likes of <>& in the normal ISO type format? Hope that makes sense, btw.

UTF-8 is backwards compatible with ASCII so <>& are the same no matter which encoding they use.

Offline

#4 2010-01-12 15:24:54

MattF
Member
From: South Yorkshire, England
Registered: 2008-05-06
Posts: 1,230
Website

Re: UTF-8 and html characters

So they're always submitted as <>& rather than their UTF-8 equivalent, regardless? Trying to get upto speed on this Unicode thing is doing my nut in at the moment. big_smile Just out of curiosity, would that regex be correct for matching the Unicode equivalent of <?

Cheers Reines. smile


Screw the chavs and God save the Queen!

Offline

#5 2010-01-12 15:43:59

Reines
Lead developer
From: Scotland
Registered: 2008-05-11
Posts: 3,165
Website

Re: UTF-8 and html characters

In ISO-8859-1 < is stored as 3C (60 in decimal).
In UTF-8 < is stored as 3C (60 in decimal).

They are not "equivalent", they are the same.

Think of ISO-8859-1 as an extension of ASCII - The first 127 characters are the same, then it has some extra tacked on.
UTF-8 is the same idea - The first 127 characters are the same, then it has some (quite a lot!) extra tacked on.

If you look at http://www.fileformat.info/info/charset … 1/list.htm and http://www.fileformat.info/info/charset/UTF-8/list.htm you will see up until 7F (127) they are both identical.

Offline

#6 2010-01-12 18:16:09

MattF
Member
From: South Yorkshire, England
Registered: 2008-05-06
Posts: 1,230
Website

Re: UTF-8 and html characters

Cheers for that explanation Reines. The penny has dropped now.  smile

There is some awfully confusing documentation on the web. Been trying to get to grips with this stuff over the last day or two and it has obviously caused me more confusion than help.


Screw the chavs and God save the Queen!

Offline

#7 2010-01-12 18:41:24

Reines
Lead developer
From: Scotland
Registered: 2008-05-11
Posts: 3,165
Website

Re: UTF-8 and html characters

Character sets can be incredibly confusing tongue

Offline

#8 2010-01-12 20:10:31

Franz
Lead developer
From: Germany
Registered: 2008-05-13
Posts: 4,071
Website

Re: UTF-8 and html characters

They've been invented to do just that big_smile


fluxbb.de | develoPHP

"As code is more often read than written it's really important to write clean code."

Offline

#9 2010-01-13 01:02:37

MattF
Member
From: South Yorkshire, England
Registered: 2008-05-06
Posts: 1,230
Website

Re: UTF-8 and html characters

They definitely had that effect on me. The more I read, the less I understood. big_smile


Screw the chavs and God save the Queen!

Offline

Board footer

Powered by FluxBB 1.5.0