More languages » Japanese/Hiragana

Sie können sich hier anmelden

Dieses Thema hat 23 Antworten
und wurde 6.698 mal aufgerufen

More languages

Seiten 1 | 2

#1 Japanese/Hiragana

Zitat · Antworten

Hm, the problem with syllabic scripts is that you never have enough syllables on your bar, you'd need something massive. It would probably be a lot of extra work but if you used only the Hangul components, then it might work. As in, Unicode has this massive Hangul block but it's actually composed of 24 letters (it actually is an alphabetic script but because the letters are grouped in blocks of 1 to 4 symbols it comes across as being syllabic).

So 을 (riŭl) for example is composed of ㅇ and ㅡ and ㄹ. But how you would convert them, tricky. If there is a reliable converter, maybe.

I think of all the Asian scripts the only one which is realistic would be Japanese if you did a Hiragana based Scrabble.

*******************************
Do, or do not. There is no try.

Scotty

Offline

Administrator

Beiträge:

3.790

21.02.2014 10:06

#2 RE: Korean Scrabble

Zitat · Antworten

Some kind of letter composer between the rack and the actual tile that is placed... sounds feasible. But no, we need more flexibility on the board. Tricky. Some day I'll go ahead with Asian languages.

Download: Sourceforge.net | Help: Wiki | Discussion: Forum | News: Twitter | Fanship: Facebook

#3 RE: Korean Scrabble

Zitat · Antworten

So why don't we do Japanese as a proof of concept for Asian languages? It should be fairly easy. There are three scripts, one is like Chinese the other two are syllabic (Hiragana and Katakan) but much more limited in scope than Korean. To begin with, we can dispense with Katakana, the main difference is purpose i.e. foreign terms are usually written with Katakana, native terms with Hiragana. There's around 100 or so that you need but I don't think it needs much in the way of modifying the game itself. So for example sushi is written as すし (su + shi), manga is まんが (ma + n + ga) (n is the only consonant only symbol, before you ask).

The main trick would be to find a Hiragana wordlist.

*******************************
Do, or do not. There is no try.

#4 RE: Korean Scrabble

Zitat · Antworten

Ah hab eins hier gefunden. Wenn wir nur Hiragana nehmen, dann bräuchten wir Zeile 6 bis 18230, wobei noch einige Zeilen rauszuwerfen wären die Kanji (dh die komplexen Chinesischen Lehnzeichen enthalten) enthalten aber das schaffe ich technisch nicht. Man könnte uU eine Tabelle der erwünschten Hiragana Zeichen erstellen und dann alles rauswerfen was sonst Zeichen hat, die nicht in der liste sind oder das uU über die Unicode bereiche machen.

Alle Zeilen in der nur ein einziges Zeichen ist, müßte man auch rauswerfen.

Aber sonst hätten wir dann ein brauchbares Konzept für Scrabble3D in Japanisch (Hiragana). Aber das sind eher Gedankenspiele, ich gleube man bräuchte dazu mindestens ... grrr sorry, lapsed into German, we'd need at least one speaker of Japanese.

It would appear, however, that Hiragana versions are the tentative script of choice for Japanese Scrabble, I did some searching and found this image of someone who did a paper version and Scrabble-ish games also use Hiragana. Maybe one could do a beta and put it out there and wait for a response?

*******************************
Do, or do not. There is no try.

Scotty

Offline

Administrator

Beiträge:

3.790

27.02.2014 14:45

#5 Japanese Scrabble

Zitat · Antworten

I collected all words between the chars ぁ and ん with more than one char. It results in a list with 16277 words. The letter distribution, i.e. how many times a particular char is found in the list, is as follows:

The attached dictionary contains the letter set. That means if you load the dic you will get asked if the set should be applied.

Finally, the game looks similar to Korean. Very short words, many stairways.

Download: Sourceforge.net | Help: Wiki | Discussion: Forum | News: Twitter | Fanship: Facebook

Dateianlage:
Hiragana.zip

#6 RE: Japanese Scrabble

Zitat · Antworten

From http://www002.upp.so-net.ne.jp/kei-k/dic.htm you can download file edict-ejje-All-hiraindex.zip

It is from 1998, in EUC-JP coding. I opened it as Shift-JIS which seemed to work.
/// EDICT 26JUN98 V98-002, Main Japanese-English Electronic Dictionary File, Copyright J.W. Breen - 1998

It has hiragana index with 53158 entries:
katakana-english 13567
hiragana-(kanji)-en 1493+51665=53158

->for each hiragana entry there are kanji and english meaning.

http://www.edrdg.org/jmdict/edict.html
...
Is it Public Domain?

EDICT can be freely used provided satisfactory acknowledgement is made in any software product, server, etc. that uses it. There are a few other conditions relating to distributing copies of EDICT with or without modification. Copyright is vested in the EDRG (Electronic Dictionary Research Group). You can see the specific licence statement at the Group's site.
...

Current version can be downloaded at http://www.edrdg.org/jmdict/edict_doc.html but I couldn't access it. If it hasn't got a hiragana index, they can probably produce it easily. You could ask Jim Breen http://www.csse.monash.edu.au/~jwb/

#7 RE: Japanese Scrabble

Zitat · Antworten

The related JMDict project http://www.edrdg.org/jmdict/j_jmdict.html has an UTF8 coded file.

#8 RE: Japanese Scrabble

Zitat · Antworten

If you drop the rarest hiragana from the letterset, for example all with less than 150 occurrences in the dictionary, the game will be more playable. The hiragana.zip with 16277 words, if all kanji entries are dropped from the original file, contains proportionally more particles, inflections and suffixes.

#9 RE: Japanese Scrabble

Zitat · Antworten

I was thinking along the lines of actually invoking a pre-war convention and switching all Hiragana to Katakana, because apart from increasing the list by using Katakana, they also tend to be longer. But the wordlist you listed seems a lot bigger. It might be an option to simple use 3-kana as a minimum length though I'm not sure if that would fix the stair issue.

*******************************
Do, or do not. There is no try.

#10 RE: Japanese Scrabble

Zitat · Antworten

xyz, thanks for that. not sure about the copyright issues on that one - I think while we're trying to work out if it's feasible or not, we're better off sticking to the open source file.

Right, with the help of our code mage, I have done a merged list - all in Katakana. Here's in detail what I did:
* Converted all Hiragana only words to Katakana
* Converted mixed Hiragana/Kanji words to Katakana
* Converted mixed Katakana/Kanji words to Katakana
* Converted all Kanji to Katakana

There's a bit of cleaning up to do. I used an automatic converter (with some spot checking) so overall I'm fairly confident the quality is good.
- there are strings which contain a space, these need to be chucked out
- there are strings which contains items outside the Katakana range (some Hiragana, some Kanji, I think these are rare combinations the converter was not familiar with)

At the moment the list is just short of 120k, I think once we've thrown out the messy ones, we should be on 100k or so, which seems reasonable.

If this gives us a reasonably balanced game, we could run it as a beta and see if we can attract some Japanese people to clean up the list etc etc.

Admin: Attachment deleted

*******************************
Do, or do not. There is no try.

Scotty

Offline

Administrator

Beiträge:

3.790

05.03.2014 08:50

#11 RE: Japanese Scrabble

Zitat · Antworten

Zip file seems to be broken, please reup it.

Download: Sourceforge.net | Help: Wiki | Discussion: Forum | News: Twitter | Fanship: Facebook

#12 RE: Japanese Scrabble

Zitat · Antworten

Hiragana should be used, even though it's doubtful if Japanese adults would play even it. Katakana is used only for foreign loan words and for emphasis. Kids learn Japanese with hiragana.

If it is possible to use the EDICT file and have/make a Hiragana->Kanji->English index for it, Kanji and English could then be shown in the tooltip for the word. That would be great even for learning Japanese!

The free use of EDICT seems clear to me from http://www.edrdg.org/
"EDICT can be freely used provided satisfactory acknowledgement is made in any software product, server, etc. that uses it."

I sent email to Jim Breen about the possible use.

There is a physical Scrabble in Japanese, it's in romaji. I'll try to find the photo.

#13 RE: Japanese Scrabble

Zitat · Antworten

I'll email you the file Scotty!

That's the way Kana are used today but it wasn't always so. Prior to WW2, much of what is written today in Hiragana was written in Katakana only. Also, Hiragana words are sometimes written in Katakana for emphasis so it's not totally unheard of even today.

There are also practical reasons related to the use of the choonpu i.e. you can easily convert a Hiragana list to Katakana but not the other way round. Given the issues we have with stairs etc in the game, I think that is (for now) the key factor, having an unplayable game is not much use to anyone.

*******************************
Do, or do not. There is no try.

#14 RE: Japanese Scrabble

Zitat · Antworten

Scrabble in Japanese romaji: http://translate.google.com/translate?hl...%2F%3Fno%3D1586 "There is no Japanese scrabble" http://translate.google.com/translate?hl...-ninja%C2%A0%2F

#15 RE: Japanese Scrabble

Zitat · Antworten

Here is Jim Breen's reply:

...
Let me explain briefly what my dictionary files contain. I am sure
you can reformat the contents to make an index file of the type you seek.

I'll illustrate this using a Japanese word for tooth cavities. The word is
usually pronounced mushiba, and more rarely kushi or ushi. It's
commonly written 虫歯, but is also written 齲歯 or 齲. (Yes, complicated
but Japanese is like that.

My main dictionary distribution format is the XML version (JMdict). In
this format the entry is:
<ent_seq>1604850</ent_seq>
<k_ele>
<keb>虫歯</keb>
<ke_pri>ichi1</ke_pri>
<ke_pri>news1</ke_pri>
<ke_pri>nf17</ke_pri>
</k_ele>
<k_ele>
<keb>齲歯</keb>
</k_ele>
<k_ele>
<keb>齲</keb>
</k_ele>
<r_ele>
<reb>むしば</reb>
<re_pri>ichi1</re_pri>
<re_pri>news1</re_pri>
<re_pri>nf17</re_pri>
</r_ele>
<r_ele>
<reb>うし</reb>
<re_restr>齲歯</re_restr>
</r_ele>
<r_ele>
<reb>くし</reb>
<re_restr>齲歯</re_restr>
</r_ele>
<info>
<audit>
<upd_date>2012-09-05</upd_date>
<upd_detl>Entry created</upd_detl>
</audit>
<audit>
<upd_date>2012-09-05</upd_date>
<upd_detl>Entry amended</upd_detl>
</audit>
<audit>
<upd_date>2012-09-05</upd_date>
<upd_detl>Entry amended</upd_detl>
</audit>
</info>
<sense>
<pos>&n;</pos>
<pos>&adj-no;</pos>
<gloss>cavity</gloss>
<gloss>tooth decay</gloss>
<gloss>decayed tooth</gloss>
<gloss>caries</gloss>
</sense>
</entry>

That's quite complex, but it can be parsed, etc.

There are two simpler formats. One is the EDICT2 one:

虫歯(P);齲歯;齲 [むしば(P);うし(齲歯);くし(齲歯)] /(n,adj-no) cavity/tooth
decay/decayed tooth/caries/(P)/

It has the hiragana in [...], in this case showing some restrictions.

Then there is the old legacy EDICT format. This can only have one kanji form
and one hiragana form per line, so it gets split up:

虫歯 [むしば] /(n,adj-no) cavity/tooth decay/decayed tooth/caries/(P)/
齲 [むしば] /(n,adj-no) cavity/tooth decay/decayed tooth/caries/
齲歯 [うし] /(n,adj-no) cavity/tooth decay/decayed tooth/caries/
齲歯 [くし] /(n,adj-no) cavity/tooth decay/decayed tooth/caries/
齲歯 [むしば] /(n,adj-no) cavity/tooth decay/decayed tooth/caries/

The JMdict version is in UTF-8. The other two are in EUC-JP.
You can convert them to UTF-8, e.g.
iconv -f EUC-JP -t UTF-8 EDICT2 > EDICT2_UTF8 (on a Unix/Linux system).

All of these can be downloaded from
http://ftp.monash.edu.au/pub/nihongo/ Get the files
JMdict_e.gz, edict2.gz and edict.gz

I hope this helps.
...

Seiten 1 | 2

nächste Seite »

«« Japanese Romaji Scrabble

Finnish Braille implemented »»

Sprung