Hi - it looks like Michael Bauer introduced me in the Scottish Gaelic thread.
I'd like to create a version for Irish. I created the open source Irish spellchecker "GaelSpell" so I have quality word list with good coverage, ready to use.
I'm inclined to try the digraph approach that Michael is using for Scottish though. Processing my word list this way, yields the following character frequencies:
A 338240 I 260387 R 162701 N 151440 L 121075 E 120140 O 116712 Í 109779 S 102242 T 91160 D 75868 Ċ 73325 M 69310 C 62305 Á 61244 G 58405 F 46855 Ṫ 41950 Ó 37388 Ḃ 32064 U 31698 Ḋ 31298 Ṁ 29715 B 26442 Ú 24377 Ġ 24179 É 23497 P 19668 Ṡ 12369 Ḟ 12124 H 11051 Ṗ 6530
Any advice of tile distributions/point would be appreciated; I've never played seriously in Irish, only English.
Taking the diagraph approach, "H" becomes a bit of a special case - most words in Irish that start with a vowel permit free addition of an H at the beginning. This makes it a bit boring for Scrabble play, so I'm inclined to just drop H entirely.
Also, I've translated about 90% of the .lang file. Shall I email that to the devs directly when it's complete?
Yes, indeed, Michael (akerbeltzalba) told me that you would join us here in the Scrabble3D forum. That's why I already have prepared a separate forum especially for Irish language!
I'm very very happy to have you here, and I welcome you warmly! I hope you will enjoy Scrabble3D and our forum. So do we when Scrabble3D will be available in Irish !!! That's really great!!!
Welcome, Kevin! Be careful with letter frequency only. IMHO it's better to take two directly following letters into account. But it's a difficult estimation and the common set is discussed for German as well. Linhart is our specialist for that topic. If the lang file is complete you can send it to my address as given in the about dialogue. I apologize in advance, the lang will be changed within one or two weeks - I hope to finish patch 24 soon. Some entries changed id, some have been removed and some were added. I believe less than 50 in total.
Ok, thanks. The lang file is done now, a bit more proofreading and I'll send it off to you. Is the latest english.lang available in SVN or anywhere online? That would be nice for localizers in terms of keeping up with string changes on an ongoing basis.
Here's what I did for Gaelic, I calculated the %, compared to English (i.e. how many tiles per %) and then distributed the tiles proportionally. I made a slight error in forgetting it's 100-2 (jokers) but if you apply the same your set should be close to good. Remember we ditched ng and discovered there were not enough aie for Gaelic but Irish has slightly different conventions so I'd suggest you start with the basic ration and then tweak it.
a 37147 21.55% 1 14 i 18305 10.62% 1 9 e 10876 6.31% 1 9 n 10656 6.18% 1 4 r 10532 6.11% 1 4 s 8978 5.21% 1 4 o 7489 4.34% 1 4 ch 6789 3.94% 2 4 u 5865 3.40% 2 3 l 5760 3.34% 2 4 d 5600 3.25% 2 4 dh 5185 3.01% 2 4 g 4413 2.56% 3 3 t 4359 2.53% 3 3 m 4227 2.45% 3 2 th 3334 1.93% 3 3 c 3242 1.88% 3 3 nn 2945 1.71% 4 2 bh 2382 1.38% 2 2 b 2144 1.24% 3 2 à 1786 1.04% 5 2 gh 1405 0.82% 5 2 mh 1311 0.76% 5 1 f 1195 0.69% 5 1 ò 1065 0.62% 6 1 ù 983 0.57% 6 1 è 939 0.54% 6 1 ì 911 0.53% 6 1 p 755 0.44% 6 1 fh 648 0.38% 5 1 rr 633 0.37% 8 1 ll 410 0.24% 8 1 ng 110 0.06% 10 1
Urk the table is a bit of a mess, I'll email it to you.
[EDIT] Bussinchen: I have fixed it. Your table is fine!
Zitat von kscanne...Is the latest english.lang available in SVN or anywhere online?
English.lang is created by the program itself if no lang is found. Actually, English default text is filled into every lang file, independent from the name, if a caption, label, or resource is not available. For the upcoming changes I'll create a special thread at the announcement section where I post changes.
Some time ago I tried to improve the German tile set, as Scotty has indicated already. Here are my main considerations:
I think it is not the best way to start with the overall frequency of letters in the word list. In a typical Scrabble play, the proportion of 2- or 3-letter words is very much higher than in the word list. So I propose to calculate for each letter the relative frequency (in %) in the list of 2-letter words first, then in the list of 3-letter words and so on, up to 8 or 9 letters. Finally you take the mean of these values as a basis for your tile set.
Of course, there may be further arguments, depending on peculiarities of the language. For instance, in English you can add an S to most of the words. But it was decided that the tile set should contain fewer S in order to make backhooks a bit more difficult.
Thanks linhart for your input! I computed the proportions for word of different letters and the results were quite interesting. I've revised a few of the tile counts accordingly - posting the new numbers in the other thread.