Paul Fredlein wrote:
> Anyway, so far it all seems to work but I'm convinced it's just good
> luck not good management. What should I use throughout to make sure it
> doesn't break:-
this sounds too much like programming by guessing. not a good way to go about
it. you should go back through this from the beginning, and figure out what
each piece is doing. how did the data get into the database, what encoding is
the database using, what does the php documentation say, and so on.
generally speaking, in this day and age, you're better off using unicode for
everything, if you can. alas, there is more than one way to encode a series
of unicode characters. utf8 is popular if you have data that is mostly ascii
with the occasional non-ascii character thrown in. most characters are
encoded in only a single byte, with non-ascii characters requiring two or
more. it's not very efficient for languages like chinese, though. in that
case you'd be better off using utf16, where every character is encoded in a
minimum of two bytes.
this topic is way too big to be covered in a single usenet post, so you're
better off reading this:
http://www.joelonsoftware.com/articles/Unicode.html
you mentioned gb2312 at one point, which is definitely *not* unicode. it's an
older encoding commonly used for asian languages. if your input source is
unavoiadably gb2312 or something like it, it's not the end of the world.
macosx contains CFString and NSString methods to convert data in one encoding
to almost any other encoding.