Zitat The word count is limited by your RAM (or maybe 2GB). German dic has about 600k entries, several with paraphrase. It's size is about 10MB, loaded into the program in 2 or 3 seconds.
Ok, That seems roughly comparable to what our spellchecking dictionaries have (bearing in mind some of the morphology rules the .aff file implements). Now that doing this is a serious possibility, I'll put my head together with my co-conspirator and thrash out some rules on how to deal with some of the dictionary issues. Having read some of the German and Latin threads, there are a few things we need to figure too.
Getting technical here so she can chip in. There's a rule in the other dic files that non-independent forms are not allowed. That makes sense but it means we need to remove the following from the .dic - any hyphenated word (as we can't do hyphens anyway) - any word with an apostrophe: m', d' - some prefixes (we need a full list): co-, comh-, dì, seann-, droch-, deagh-, prefixes that are also independent words are ok: ro- (ro), - no single letter words: a, m', d' - no emphatic affixes (with or without hyphen): -sa, -san, -se, e
If we work off the root forms prior to generating them. On the bright side, I think the only rule we need to implement is the one for lenition. The rules for adding h-, t-, n- we don't want anyway.
Just thinking out loud for now, we need a full list. Any categories I missed?
> - any hyphenated word (as we can't do hyphens anyway) > - some prefixes (we need a full list): co-, comh-, dì, seann-, droch-, deagh-,
Kicking these out could be handled with regular expressions.
> - any word with an apostrophe: m', d' > - no single letter words: a, m', d'
If this is a short list, a simple query/replace will do
> - prefixes that are also independent words are ok: ro- (ro),
Let's keep the ro and kick out the ro-
> - no emphatic affixes (with or without hyphen): -sa, -san, -se, e Why not the ones without the hyphen? They would be an affix just like e.g. a plural affix
> If we work off the root forms prior to generating them. On the bright side, I think the only rule we need to implement is the one for lenition. The rules for adding h-, t-, n- we don't want anyway.
I will answer later, when I will be at home again.
I must say that I don't know anything about the Gaelic language, not even whether you have many inflected forms like in German or Italian or not (like in English)... (i.e. if Gaelic is a more synthetic or a more analytic language). I think I must google a little bit in order to understand more how Celtic languages are structured. I must learn what lenition means exactly for Gaelic word stems. Interesting and exiting!
Usually spellchecker lists contain also abbreviations. These have to be eliminated, since they are not allowed in Scrabble. This is no problem if they have a full stop at the end. But e.g. in German there are also many abbreviations which are not marked in this way (chemical Elements, physical units etc.).
In other languages like German, Swedish, Italian and Spanish, for example, a well known dictionary is the reference dictionary for the validity of the placed words. In German, officially, the Duden (Rechtschreibduden) is used, and all the lemmata that are in that dictionary, are valid Scrabble-words. In Swedish, it is the SAOL (Svenska Akademiens ordlista - the word list of the Royal Swedish Academy.) In Spanish, it is the list of the Royal Spanish Academy. In Italian, it is the Zingarelli.
Which dictionary could it be for Gaelic Scrabble?
As a Gaelic Scrabble does not exist yet, you have to decide your own rules. It would be desirable to choose a Gaelic dictionary to be the reference dictionary for the validity of the Gaelic words.
Unfortunately I cannot really help you, because I cannot read a word in Gaelic. But I have found some links that could be useful:
But I think you know that already, because there I have found a link to akerbeltzalba's website, too.
If you would like to create a really excellent Gaelic Scrabble word list, you should compare the lemmata in the reference dictionary with your own word list that could be any spell checker list or something else. You should eliminate all the words that are not found in the official reference dictionary. But this is a huge work that maybe will take several years.
But I think you can start with your own list and as a first step eliminate all proper nouns, all trade marks, all abbreviations, all words with hyphens and apostrophs and so on.
We'll use Am Faclair Beag (http://www.faclair.com) as a standard reference, it contains the closest thing Gaelic has to a Duden and a modern dictionary alongside. As this is also the source for the OO dictionary files, it's a close match :)
Abbreviations will be easy to catch, fortunately, by stripping all caps.
Gaelic Scrabble does not exist yet, and there are no official rules. Therefore feel free to do what you think will be the best solution for Gaelic Scrabble!
Question, you mentioned this but I misplaced the bit of paper I put it on. We had a facebook debate about the word sets and whether to allow from from Dwelly's classic dictionary. Opinions being evenly split, I'm tending to offering a choice of "modern only" and "with Dwelly".
That requires two .dic files, yes? One where all are marked [=AFB] and one with [=Dwelly], I think that's what you said, wasn't it?
And a bit of exciting news, Kevin, our black magic man, would also like to do Irish. So perhaps we should change this Forum thread into Scottish Gaelic & Irish :) (I don't think a totally separate subforum is needed)
In Scrabble3D dictionaries it is possible to create different dic-categories within the same file (= one file only!).
Let's take Gero's deutsch.dic as an example, because in deutsch.dic, we do have different categories. You cannot see the entries in deutsch.dic, because that downloaded file is encrypted, but Gero sends always an unencrypted version of his German SuperDic to me as a backup of his work. In German, Rechtschreibduden (RD) is standard catagory, always active, it cannot be unchecked, that's why it is grey in the settings. Universalduden (UD) and Freestyle are supplementary categories, not conform to the official rules, but they can be checked (or unchecked), if the player wants to do so. For more information about Gero's dic, please read here (in German): Geros SuperDic - Tipps und Tricks zum Umgang mit dem deutschen Wörterbuch
[Header] Version=Superdic Stand 06.08.11 StandardCategory=Rechtschreibduden [Categories] 1=Universalduden 2=Freestyle und Duden-Oldies [Words] AA=KINDGERECHTE UMSCHREIBUNG FÜR MENSCHLICHEN KOT AACHENER AACHENERIN AACHENERINNEN AACHENERN AACHENERS AAK=AAK - FLACHES RHEINFRACHTSCHIFF - QUELLE FREMDWÖRTERDUDEN;2 AAKE=AAK - FLACHES RHEINFRACHTSCHIFF - QUELLE FREMDWÖRTERDUDEN;2 AAKEN=AAK - FLACHES RHEINFRACHTSCHIFF - QUELLE FREMDWÖRTERDUDEN;2 AAKES=AAK - FLACHES RHEINFRACHTSCHIFF - QUELLE FWD;2 AAKS=AAK - FLACHES RHEINFRACHTSCHIFF - QUELLE FWD;2 AAL AALE AALEN AALEND AALENDE AALENDEM AALENDEN AALENDER AALENDES AALENS AALES AALEST AALET AALFANG=VON AALFANG;1 AALFANGE=VON AALFANG;1 AALFANGES=VON AALFANG;1 AALFANGS=VON AALFANG;1 AALGLATT AALGLATTE AALGLATTEM AALGLATTEN AALGLATTER AALGLATTES AALKASTEN=EINE FANGVORRICHTUNG UD 6. AUFL.;2 AALKASTENS=EINE FANGVORRICHTUNG UD 6. AUFL.;2 AALKORB=VON AALKORB;1 AALKORBE=VON AALKORB;1 AALKORBES=VON AALKORB;1 AALKORBS=VON AALKORB;1 AALKÄSTEN=EINE FANGVORRICHTUNG - PL. UD 6. AUFL.;2 AALKÖRBE=VON AALKORB;1 AALKÖRBEN=VON AALKORB;1 AALLEITER=FISCHPASS FÜR AALE;1 AALLEITERN=FISCHPASS FÜR AALE - PL.;1 AALMUTTER=IN KALTEN MEEREN, TEILWEISE IN GROSSEN TIEFEN LEBENDER FISCH, DER LEBENDE JUNGE ZUR WELT BRINGT;1 AALMUTTERN=IN KALTEN MEEREN, TEILWEISE IN GROSSEN TIEFEN LEBENDER FISCH, DER LEBENDE JUNGE ZUR WELT BRINGT - KORREKTER PL.!;1 AALQUAPPE=;1 AALQUAPPEN=;1 AALRAUPE=IM SÜSSWASSER LEBENDER, GROSSER RAUBFISCH;1 AALRAUPEN=IM SÜSSWASSER LEBENDER, GROSSER RAUBFISCH;1 AALREUSE=;1 AALREUSEN=;1 AALS AALSPEER=;1 AALSPEERE=;1 AALSPEEREN=;1 AALSPEERES=;1 AALSPEERS=;1 AALST AALSTECHEN=;1 AALSTECHENS=;1 AALSTRICH=AALSTRICH IST DER LÄNGS ÜBER DIE RÜCKENMITTE VERLAUFENDE DUNKLE STREIFEN IM FELL VON DIVERSEN SÄUGETIEREN;1 AALSTRICHE=AALSTRICH IST DER LÄNGS ÜBER DIE RÜCKENMITTE VERLAUFENDE DUNKLE STREIFEN IM FELL VON DIVERSEN SÄUGETIEREN;1 AALSTRICHEN=AALSTRICH IST DER LÄNGS ÜBER DIE RÜCKENMITTE VERLAUFENDE DUNKLE STREIFEN IM FELL VON DIVERSEN SÄUGETIEREN;1 AALSTRICHES=AALSTRICH IST DER LÄNGS ÜBER DIE RÜCKENMITTE VERLAUFENDE DUNKLE STREIFEN IM FELL VON DIVERSEN SÄUGETIEREN;1 AALSTRICHS=AALSTRICH IST DER LÄNGS ÜBER DIE RÜCKENMITTE VERLAUFENDE DUNKLE STREIFEN IM FELL VON DIVERSEN SÄUGETIEREN;1 AALSUPPE=;1 AALSUPPEN=;1 AALT ... ...
If you look at the header of that file, you find this:
[Categories] 1=Universalduden 2=Freestyle und Duden-Oldies
So for Gaelic you can specify e.g. the following optional categories:
[Categories] 1=Dwelly 2=XYZ Gaelic word list 3=...
Now your header must be like this:
[Header] Version=100001 Author=akerbeltzalba StandardCategory=Am Faclair Beag Licence=CC-N3, any commercial use is prohibited. Comment=Gaelic Dic 28.08.11, encrypted Key=????????? [Replace] [Categories] 1=Dwelly 2=... 3=... [Words] ... ...
Don't worry about the encryption key. I believe that it is generated automatically by the program, when encryption is done. Keep the encryption key empty: Key=
In the word list, however, you must write like this:
AALFANG=VON AALFANG;1 AALKASTEN=EINE FANGVORRICHTUNG UD 6. AUFL.;2
i.e. Gaelic word, equal sign (=), definition/explanation/comment, semicolon, number of the category
If you don't have any definitions/explanations/comments yet, you must write like this:
AALSTECHEN=;1
So remember that it is important not to use any semicolon within the definitions, because everything you write after the semicolon will not be shown in the tooltip or the word search hits. Semicolon is a category marker only.
If you want words to be found in the standard category, you only write like this (with definition or without definition) (no semicolon, no number):
AA=KINDGERECHTE UMSCHREIBUNG FÜR MENSCHLICHEN KOT AACHENER
Feel free to contact me or Gero, whenever you have more questions!
Zitat von akerbeltzalbaAnd a bit of exciting news, Kevin, our black magic man, would also like to do Irish. So perhaps we should change this Forum thread into Scottish Gaelic & Irish :) (I don't think a totally separate subforum is needed)
What's the more sensible option? For example, let's say we have a 5 word dic file: glas taigh fuar muirsgian deoch-bhiugh
Let's say the last two are Dwelly words but the others occur both in Dwelly and in AFB. If we define Dwelly=2 and AFB=1, does the list look like this: glas;1;2 taigh;1;2 fuar;1;2 muirsgian;2 deoch-bhiugh;2
Or is it better to define AFB+Dwelly=1 and AFBonly=2,resulting in: glas;2 taigh;2 fuar;2 muirsgian;1 deoch-bhiugh;1
The German dic isn't the best example for your task. English has two lists: SOWPODS and TWL, where TWL is a subset of SOWPODS [1]. So, the dic has a standard category TWL and players can add SOWPODS words. These words are marked by ;1. It's not possible to have a word in two or more categories, and double entries are not allowed. Putting all together the list looks like that:
[General] StandardCategory=Dwelly [Categories] 1=ABC [Words] glas taigh fuar muirsgian=;1 deoch-bhiugh=;1
Dwelly is always active and ABC can be used optional.
Ah ok, I'm with you now. Except I probably didn't explain it so well so it's the other way round I think. The idea is to have a smaller, more modern wordset (let's call this AFB=1) and a bigger wordset containing the modern wordset PLUS older words in it too for advanced players (let's call this combined set AFB+Dwelly=2 - I'll think of better names eventually).
So we get [General] StandardCategory=AFB [Categories] 1=Dwelly [Words] glas taigh fuar muirsgian=;1 deoch-bhiugh=;1
This means that in a standard game, players only get the StandardCategory words so the last two are not allowed. But if they select the bigger wordset, they get everything plus ;1