> I've looked at other websites in Japanese and figure that everything is pretty
> much the same except for the character set.  The copy is in encoded somehow
> and when the page is rendered, the japanese letters and words appear.

Yep. This is just another character encoding. Good old Latin-1 has letters with
accents on in the top 128 bytes; Japanese has all those funky kanji instead.

The most popular encoding for Japanese web pages is probably Shift-JIS. This is
a 'double-byte character set' where including a top-bit-set byte means that
this and the next byte are taken together as a single character. (This is
necessary because there are rather more than 128 Japanese characters.)

However, do *not* use Shift-JIS for anything(*) (or ISO-2022 or EUC for that
matter, two other equally horrible encodings). Use Unicode, saved in a
sensible encoding such as UTF-8, and you'll be able to use all possible
characters including Japanese ones, Latin characters with accents, Greek and
so on, all at the same time.

(* - on the web, anyway. There is still reason to use Shift-JIS in e-mail,
unfortunately, due to some incredibly crap webmail providers.)

> Forgive me if this sounds really stupid, but what will I need to get the
> encoded copy into my pages?

You will need at least one Japanese font installed. It is a good idea to
install Japanese encodings too, so that your browser doesn't get stuck when
it hits a page with a peculiarly Japanese encoding like Shift-JIS. For WinXP
you can do both from Control Panel -> Regional and Language Options ->
Languages -> Supplemental Language Support.

Then cut and paste into a text editor with full Unicode support. Notepad on
Windows NT/2000/XP/2003 supports Unicode fine, but it is still Notepad, ugh.
My favorite Unicode-capable editor for Windows is from www.emeditor.com,
but there are surely lots of others to choose from. On Linux, the KDE text
editors are fine also. Don't know about Macs.

Take note of the encoding you save under (normally UTF-8) and make sure you
specify this encoding in the Content-Type charset parameter so that the browser
can tell it's UTF-8 without having to guess and maybe get it wrong. If you
don't have access to the server config to set the default charset, use a meta
element:

  <head>
    ...
    <meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
  </head>

> Any gotchas or tips would be appreciated.

Get them to indicate what is actually *supposed* to be a line break in the
copy they send you, and what's just text wrapping. When you don't grok the
language and there aren't spaces between words it can be hard to tell what
the structure is supposed to be if you've only been sent plain text.

Avoid using the standard 'serif' and 'sans-serif' generic CSS fonts; they will
on some machines for no apparent reason choose a font without any Japanese
characters in, resulting in a page where all the characters are rendered as
empty squares. Put one of the common Japanese font names before any generics.
But beware! Fonts can change names depending on the native character set: what
you see as 'MS PGothic' will be available to a Japanese IE user only under the
name 'MS P[1][2]', where [1] is the kanji represented by Unicode code point
26126 and [2] is the kanji represented by Unicode code point 26397. So in any
font-family CSS declarations, include both names.

Oh, and don't include backslashes in your page text. For reasons too arcane
and tedious to go into, they'll come out as yen symbols instead on native
Japanese machines.
and@doxdesk.com