Sie sind vermutlich noch nicht im Forum angemeldet - Klicken Sie hier um sich kostenlos anzumelden Impressum 
Sie können sich hier anmelden
Dieses Thema hat 23 Antworten
und wurde 6.498 mal aufgerufen
 More languages
Seiten 1 | 2
akerbeltzalba Offline




Beiträge: 142

21.02.2014 02:04
#1 Japanese/Hiragana Zitat · Antworten

Hm, the problem with syllabic scripts is that you never have enough syllables on your bar, you'd need something massive. It would probably be a lot of extra work but if you used only the Hangul components, then it might work. As in, Unicode has this massive Hangul block but it's actually composed of 24 letters (it actually is an alphabetic script but because the letters are grouped in blocks of 1 to 4 symbols it comes across as being syllabic).

So 을 (riŭl) for example is composed of ㅇ and ㅡ and ㄹ. But how you would convert them, tricky. If there is a reliable converter, maybe.

I think of all the Asian scripts the only one which is realistic would be Japanese if you did a Hiragana based Scrabble.

*******************************
Do, or do not. There is no try.

Scotty Offline

Administrator


Beiträge: 3.788

21.02.2014 10:06
#2 RE: Korean Scrabble Zitat · Antworten

Some kind of letter composer between the rack and the actual tile that is placed... sounds feasible. But no, we need more flexibility on the board. Tricky. Some day I'll go ahead with Asian languages.


Download: Sourceforge.net | Help: Wiki | Discussion: Forum | News: Twitter | Fanship: Facebook

akerbeltzalba Offline




Beiträge: 142

21.02.2014 22:30
#3 RE: Korean Scrabble Zitat · Antworten

So why don't we do Japanese as a proof of concept for Asian languages? It should be fairly easy. There are three scripts, one is like Chinese the other two are syllabic (Hiragana and Katakan) but much more limited in scope than Korean. To begin with, we can dispense with Katakana, the main difference is purpose i.e. foreign terms are usually written with Katakana, native terms with Hiragana. There's around 100 or so that you need but I don't think it needs much in the way of modifying the game itself. So for example sushi is written as すし (su + shi), manga is まんが (ma + n + ga) (n is the only consonant only symbol, before you ask).

The main trick would be to find a Hiragana wordlist.

*******************************
Do, or do not. There is no try.

akerbeltzalba Offline




Beiträge: 142

26.02.2014 02:08
#4 RE: Korean Scrabble Zitat · Antworten

Ah hab eins hier gefunden. Wenn wir nur Hiragana nehmen, dann bräuchten wir Zeile 6 bis 18230, wobei noch einige Zeilen rauszuwerfen wären die Kanji (dh die komplexen Chinesischen Lehnzeichen enthalten) enthalten aber das schaffe ich technisch nicht. Man könnte uU eine Tabelle der erwünschten Hiragana Zeichen erstellen und dann alles rauswerfen was sonst Zeichen hat, die nicht in der liste sind oder das uU über die Unicode bereiche machen.

Alle Zeilen in der nur ein einziges Zeichen ist, müßte man auch rauswerfen.

Aber sonst hätten wir dann ein brauchbares Konzept für Scrabble3D in Japanisch (Hiragana). Aber das sind eher Gedankenspiele, ich gleube man bräuchte dazu mindestens ... grrr sorry, lapsed into German, we'd need at least one speaker of Japanese.

It would appear, however, that Hiragana versions are the tentative script of choice for Japanese Scrabble, I did some searching and found this image of someone who did a paper version and Scrabble-ish games also use Hiragana. Maybe one could do a beta and put it out there and wait for a response?

*******************************
Do, or do not. There is no try.

Scotty Offline

Administrator


Beiträge: 3.788

27.02.2014 14:45
#5 Japanese Scrabble Zitat · Antworten

I collected all words between the chars ぁ and ん with more than one char. It results in a list with 16277 words. The letter distribution, i.e. how many times a particular char is found in the list, is as follows:

ぁ	1
あ 1284
ぃ 8
い 2760
ぅ 0
う 2408
ぇ 5
え 776
ぉ 0
お 1444
か 2597
が 1059
き 2268
ぎ 487
く 1883
ぐ 345
け 1074
げ 400
こ 1594
ご 431
さ 1153
ざ 292
し 2786
じ 871
す 761
ず 574
せ 646
ぜ 175
そ 652
ぞ 129
た 1615
だ 725
ち 1265
ぢ 33
っ 912
つ 1570
づ 203
て 896
で 264
と 1619
ど 584
な 1457
に 560
ぬ 230
ね 437
の 1164
は 785
ば 619
ぱ 121
ひ 673
び 470
ぴ 66
ふ 550
ぶ 459
ぷ 35
へ 135
べ 205
ぺ 36
ほ 318
ぼ 368
ぽ 114
ま 1509
み 1243
む 452
め 895
も 940
ゃ 349
や 720
ゅ 268
ゆ 307
ょ 682
よ 712
ら 1533
り 2473
る 461
れ 950
ろ 664
ゎ 0
わ 988
ゐ 1
ゑ 79
を 33
ん 2232

The attached dictionary contains the letter set. That means if you load the dic you will get asked if the set should be applied.

Finally, the game looks similar to Korean. Very short words, many stairways.


Download: Sourceforge.net | Help: Wiki | Discussion: Forum | News: Twitter | Fanship: Facebook

Dateianlage:
Hiragana.zip
xyz Offline



Beiträge: 69

03.03.2014 17:04
#6 RE: Japanese Scrabble Zitat · Antworten

From http://www002.upp.so-net.ne.jp/kei-k/dic.htm you can download file edict-ejje-All-hiraindex.zip

It is from 1998, in EUC-JP coding. I opened it as Shift-JIS which seemed to work.
/// EDICT 26JUN98 V98-002, Main Japanese-English Electronic Dictionary File, Copyright J.W. Breen - 1998

It has hiragana index with 53158 entries:
katakana-english 13567
hiragana-(kanji)-en 1493+51665=53158

->for each hiragana entry there are kanji and english meaning.

http://www.edrdg.org/jmdict/edict.html
...
Is it Public Domain?

EDICT can be freely used provided satisfactory acknowledgement is made in any software product, server, etc. that uses it. There are a few other conditions relating to distributing copies of EDICT with or without modification. Copyright is vested in the EDRG (Electronic Dictionary Research Group). You can see the specific licence statement at the Group's site.
...

Current version can be downloaded at http://www.edrdg.org/jmdict/edict_doc.html but I couldn't access it. If it hasn't got a hiragana index, they can probably produce it easily. You could ask Jim Breen http://www.csse.monash.edu.au/~jwb/

xyz Offline



Beiträge: 69

03.03.2014 17:06
#7 RE: Japanese Scrabble Zitat · Antworten

The related JMDict project http://www.edrdg.org/jmdict/j_jmdict.html has an UTF8 coded file.

xyz Offline



Beiträge: 69

03.03.2014 17:14
#8 RE: Japanese Scrabble Zitat · Antworten

If you drop the rarest hiragana from the letterset, for example all with less than 150 occurrences in the dictionary, the game will be more playable. The hiragana.zip with 16277 words, if all kanji entries are dropped from the original file, contains proportionally more particles, inflections and suffixes.

akerbeltzalba Offline




Beiträge: 142

04.03.2014 00:31
#9 RE: Japanese Scrabble Zitat · Antworten

I was thinking along the lines of actually invoking a pre-war convention and switching all Hiragana to Katakana, because apart from increasing the list by using Katakana, they also tend to be longer. But the wordlist you listed seems a lot bigger. It might be an option to simple use 3-kana as a minimum length though I'm not sure if that would fix the stair issue.

*******************************
Do, or do not. There is no try.

akerbeltzalba Offline




Beiträge: 142

05.03.2014 01:18
#10 RE: Japanese Scrabble Zitat · Antworten

xyz, thanks for that. not sure about the copyright issues on that one - I think while we're trying to work out if it's feasible or not, we're better off sticking to the open source file.

Right, with the help of our code mage, I have done a merged list - all in Katakana. Here's in detail what I did:
* Converted all Hiragana only words to Katakana
* Converted mixed Hiragana/Kanji words to Katakana
* Converted mixed Katakana/Kanji words to Katakana
* Converted all Kanji to Katakana

There's a bit of cleaning up to do. I used an automatic converter (with some spot checking) so overall I'm fairly confident the quality is good.
- there are strings which contain a space, these need to be chucked out
- there are strings which contains items outside the Katakana range (some Hiragana, some Kanji, I think these are rare combinations the converter was not familiar with)

At the moment the list is just short of 120k, I think once we've thrown out the messy ones, we should be on 100k or so, which seems reasonable.

If this gives us a reasonably balanced game, we could run it as a beta and see if we can attract some Japanese people to clean up the list etc etc.

Admin: Attachment deleted

*******************************
Do, or do not. There is no try.

Scotty Offline

Administrator


Beiträge: 3.788

05.03.2014 08:50
#11 RE: Japanese Scrabble Zitat · Antworten

Zip file seems to be broken, please reup it.


Download: Sourceforge.net | Help: Wiki | Discussion: Forum | News: Twitter | Fanship: Facebook

xyz Offline



Beiträge: 69

05.03.2014 11:58
#12 RE: Japanese Scrabble Zitat · Antworten

Hiragana should be used, even though it's doubtful if Japanese adults would play even it. Katakana is used only for foreign loan words and for emphasis. Kids learn Japanese with hiragana.

If it is possible to use the EDICT file and have/make a Hiragana->Kanji->English index for it, Kanji and English could then be shown in the tooltip for the word. That would be great even for learning Japanese!

The free use of EDICT seems clear to me from http://www.edrdg.org/
"EDICT can be freely used provided satisfactory acknowledgement is made in any software product, server, etc. that uses it."

I sent email to Jim Breen about the possible use.

There is a physical Scrabble in Japanese, it's in romaji. I'll try to find the photo.

akerbeltzalba Offline




Beiträge: 142

05.03.2014 15:14
#13 RE: Japanese Scrabble Zitat · Antworten

I'll email you the file Scotty!

That's the way Kana are used today but it wasn't always so. Prior to WW2, much of what is written today in Hiragana was written in Katakana only. Also, Hiragana words are sometimes written in Katakana for emphasis so it's not totally unheard of even today.

There are also practical reasons related to the use of the choonpu i.e. you can easily convert a Hiragana list to Katakana but not the other way round. Given the issues we have with stairs etc in the game, I think that is (for now) the key factor, having an unplayable game is not much use to anyone.

*******************************
Do, or do not. There is no try.

xyz Offline



Beiträge: 69

06.03.2014 17:47
#14 RE: Japanese Scrabble Zitat · Antworten
xyz Offline



Beiträge: 69

07.03.2014 17:26
#15 RE: Japanese Scrabble Zitat · Antworten

Here is Jim Breen's reply:

...
Let me explain briefly what my dictionary files contain. I am sure
you can reformat the contents to make an index file of the type you seek.

I'll illustrate this using a Japanese word for tooth cavities. The word is
usually pronounced mushiba, and more rarely kushi or ushi. It's
commonly written 虫歯, but is also written 齲歯 or 齲. (Yes, complicated
but Japanese is like that.

My main dictionary distribution format is the XML version (JMdict). In
this format the entry is:
<ent_seq>1604850</ent_seq>
<k_ele>
<keb>虫歯</keb>
<ke_pri>ichi1</ke_pri>
<ke_pri>news1</ke_pri>
<ke_pri>nf17</ke_pri>
</k_ele>
<k_ele>
<keb>齲歯</keb>
</k_ele>
<k_ele>
<keb>齲</keb>
</k_ele>
<r_ele>
<reb>むしば</reb>
<re_pri>ichi1</re_pri>
<re_pri>news1</re_pri>
<re_pri>nf17</re_pri>
</r_ele>
<r_ele>
<reb>うし</reb>
<re_restr>齲歯</re_restr>
</r_ele>
<r_ele>
<reb>くし</reb>
<re_restr>齲歯</re_restr>
</r_ele>
<info>
<audit>
<upd_date>2012-09-05</upd_date>
<upd_detl>Entry created</upd_detl>
</audit>
<audit>
<upd_date>2012-09-05</upd_date>
<upd_detl>Entry amended</upd_detl>
</audit>
<audit>
<upd_date>2012-09-05</upd_date>
<upd_detl>Entry amended</upd_detl>
</audit>
</info>
<sense>
<pos>&n;</pos>
<pos>&adj-no;</pos>
<gloss>cavity</gloss>
<gloss>tooth decay</gloss>
<gloss>decayed tooth</gloss>
<gloss>caries</gloss>
</sense>
</entry>

That's quite complex, but it can be parsed, etc.

There are two simpler formats. One is the EDICT2 one:

虫歯(P);齲歯;齲 [むしば(P);うし(齲歯);くし(齲歯)] /(n,adj-no) cavity/tooth
decay/decayed tooth/caries/(P)/

It has the hiragana in [...], in this case showing some restrictions.

Then there is the old legacy EDICT format. This can only have one kanji form
and one hiragana form per line, so it gets split up:

虫歯 [むしば] /(n,adj-no) cavity/tooth decay/decayed tooth/caries/(P)/
齲 [むしば] /(n,adj-no) cavity/tooth decay/decayed tooth/caries/
齲歯 [うし] /(n,adj-no) cavity/tooth decay/decayed tooth/caries/
齲歯 [くし] /(n,adj-no) cavity/tooth decay/decayed tooth/caries/
齲歯 [むしば] /(n,adj-no) cavity/tooth decay/decayed tooth/caries/

The JMdict version is in UTF-8. The other two are in EUC-JP.
You can convert them to UTF-8, e.g.
iconv -f EUC-JP -t UTF-8 EDICT2 > EDICT2_UTF8 (on a Unix/Linux system).

All of these can be downloaded from
http://ftp.monash.edu.au/pub/nihongo/ Get the files
JMdict_e.gz, edict2.gz and edict.gz

I hope this helps.
...

Seiten 1 | 2
 Sprung  
Xobor Forum Software von Xobor.de
Einfach ein Forum erstellen
Datenschutz