Zitat von xyz im Beitrag #12The kotus list which I edited for suomi.dic contained 94110 words, and the version I sent (years ago) seems to have the right number, also (see Finnisch Archiv 2008).
But suomi.dic contains 93695 words, so about 415 words are missing.
Zitat von Scotty im Beitrag #14Your dictionary contains 93695 unique words (the rest of 94110 entries are duplicates) with several strange letters: ´," ",-,',¢,4,A,Á,À,Å,Ä,B,C,D,E,É,È,Ê,F,G,H,I,Î,J,K,L,M,N,O,Ö,P,Q,R,S,Š,T,U,Û,V,W,X,Y,Z,Ž,
I don't remember exactly how we managed to create the current list but I guess it would be the result after removing all these problems.
This means that maybe no update of suomi.dic would be necessary for the "strange" words you have added. We have to check this. But a better header would be perfect indeed.
Yep, but ´," ",-,',¢,4 are most probably no valid characters.
Zitat von Some examples from the attached file (with line number)30 AALTOILU 31 -AALTOINEN 32 AALTOLEVY 46540 NELJÄ 46541 4H-KERHO 46542 4H-KERHOLAINEN ...
I downloaded the latest list I sent with 94110 words (UTF-8 without BOM), and it still seems to have some wrong characters. They showed ok before I zipped it (I used Notepad++ for editing). I'll try once more soon. Is UTF-8 the right encoding for Scrabble3D, or does it matter?
Zitat von Scotty im Beitrag #14Your dictionary contains 93695 unique words (the rest of 94110 entries are duplicates) with several strange letters: ´," ",-,',¢,4,A,Á,À,Å,Ä,B,C,D,E,É,È,Ê,F,G,H,I,Î,J,K,L,M,N,O,Ö,P,Q,R,S,Š,T,U,Û,V,W,X,Y,Z,Ž,
First "´" and fifth "¢" characters are wrong, due to my(?) encoding error, rest are as they should be. See thread "Finnish letter set".
There are no duplicates, unless you can prove it, of course :)
All the words from the original list should be in suomi.dic, even the ones containing characters not in the Finnish alphabet, because they have been deemed to be Finnish (loan) words by the Institute for the Languages in Finland. The playing rules can be stricter but if you want to loosen to rules, you would need to redo the dictionary.
Personally I like to play Scrabble with inflected word forms allowed, it is a better game, and better for imagination and creativity. Also works excellently handicapping the computer in human-computer games!
About licence I have no experience (and little interest). Is "my" licence totally unrelated to Scrabble3D licence?
Zitat von Gast im Beitrag #18There are no duplicates, unless you can prove it, of course :)
Nothing could be simpler...
1 2 3 4 5
116 AARNIO 117 AARNIO 451 AHTAUS 452 AHTAUS ...
I was using a compiler switch to debug dictionaries for publishing. I'll make it a hidden setting in the next release (v28h) so you can test it yourself.
To list double entries and used letters, add a line under section General in Scrabble3D.ini as follows:
Zitat von Scrabble3D.ini[General] DebugDictionary=1
You need to have debug information active (switch in config menu right most tab). Loading is slowed down significantly in debug mode and it will not be run in threads.
You are right about the duplicates! In the original list homonyms may have several entries. That explains why there are fewer words in the Scrabble3D entries number. The original Kotus list contains inflection information which might be used some day, for example to categorize the words (verbs/nouns etc), and homonyms may have different inflections. Program seems to be able to pass the duplicates, so is it OK to leave the duplicates in the list?
Sorry I couldn't respond earlier. I was able to test the dictionary just by copying it to the setup dictionary and editing the letterset with all the new letters and save it as a new preset. But perhaps debugging gives more info? I ran the demo with small number of letters in the rack, and it seems to hang sometimes, not in every game, perhaps when no words are found.
The non-Finnish alphabet letters are not used in many words, and it doesn't work well to have these letters as computers choices in the letterset. If they are played, it is next to impossible find a word in the other dimension using that letter. But they should be able to be played with humans. Is it possible to play a word with diacritic characters using the corresponding character in the Finnish alphabet without changing the entries in the dictionary, or maybe adding the word without diacritic marks to the dictionary? Doesn't some other language have this kind of problem solved?
If you have useful pointers to the licence choices in ordinary language (no-lawyer) please, you have some experience about it.
Duplicates: Just change something (e.g. add/delete a word) and accept save dictionary question when the program is closed.
Licence: I don't want to care much on that topic. If you want only restricted copies of your list, for instance after request, quote it. If you think your list should be spread over the world without any restriction (even commercial use) use GPLv3. And a third option is CC NC3 for non commercial but free copies. There are a lot of other licences. The basic idea is that you set yourself up to origin of the work and how others can deal with it. If the original list is distributed under GPL it must be published with the same licence.
Non-Finnish alphabet letters: I believe some characters are not valid and therefore those words aren't too. For instance, T-Shirt and E-Mail are valid German words (listed in Duden) but may not be placed in Scrabble.
Diacritic characters: Bussinchen is expert in that topic. I hope she will answer the question.
Now I have checked our current suomi.dic, which I have never done before. I see that in the beginning, from line 7 (i.e. after the header) to line 616 all the words begin with a hyphen. I believe that this means that those words cannot be used separately, but only as composed nouns. If I am right, this means that all those words must be eliminated from the suomi.dic or be written in their compound forms, which would be the better solution, but which is a lot of work, of course.
After these hyphen-words, on line 617 - 623, there are the following words:
4H-KERHO 4H-KERHOLAINEN 4H-NEUVOJA 4H-TOIMINTA A PRIORI A-OIKEUDET A-PYLVÄS
Numbers are not valid as letters, neither are hyphens, so 4H-KERHO, 4H-KERHOLAINEN, 4H-NEUVOJA, 4H-TOIMINTA, A-OIKEUDET, A-PYLVÄS are not valid words and have to be deleted from suomi.dic.
A PRIORI
If we apply Swedish rules, A PRIORI are not two valid words in Scrabble, because those words are never used separately, but only together in exactly that phrase:
Zitat von Swedish rules on http://www.scrabbleforbundet.se/index.ph...id=16&Itemid=39• Ett uppslag i SAOL som är en fras med flera ord definierar inga tillåtna ord. T.ex. gör uppslaget silk screen [sil'k skri'n] s. metod för tryckning
varken *SILK eller *SCREEN till tillåtna ord. Av samma skäl är *KRETI otillåtet eftersom uppslaget i SAOL är "kreti och pleti". Däremot är PLETI ett tillåtet ord på grund av uppslaget pleti se kreti och pleti.
The same rule exists also in German Scrabble, see the screenshot which is linked to the online pdf-file with the German rules about validity and non-validity of words (see page 2).
In my opinion, we should apply this international rule for Finnish Scrabble as well.
Zitat von Scotty im Beitrag #21Licence: I don't want to care much on that topic. If you want only restricted copies of your list, for instance after request, quote it. If you think your list should be spread over the world without any restriction (even commercial use) use GPLv3. And a third option is CC NC3 for non commercial but free copies. There are a lot of other licences. The basic idea is that you set yourself up to origin of the work and how others can deal with it. If the original list is distributed under GPL it must be published with the same licence.
This means that you, dear xyz, don't need to worry any more about which license you should apply. As Scotty said, it must be the GNU General Public License anyway. But it is not only a "must", but it is a common license in any open source context, and it matches well the program Scrabble3D itself which also is published under the GNU GPL license.
Zitat von xyz im Beitrag #201. Is it possible to play a word with diacritic characters using the corresponding character in the Finnish alphabet without changing the entries in the dictionary,
2. or maybe adding the word without diacritic marks to the dictionary?
1.
No.
Example: If the entry in the dic is ŠHAKKI, a placed word SHAKKI will not be recognized as a valid word. SHAKKI will be an unknown, wrong word in this case. So entries have absolutely to be adapted to the applied letter set: ŠHAKKI must be replaced by SHAKKI.
If we decide to apply the Swedish rule (= the French or Italian rule) for letters with diacritical signs (see my post #22 on Finnish letter set ), these letters with diacritical signs should be replaced by the same letters without diacritical signs.
Example: Each Š has to be replaced by S in suomi.dic.
2.
If you keep ŠHAKKI and add SHAKKI, both variants of that word could be played during the same game. But I would not like that, because it would be an inconsistent solution, if both words with diacritical signs and words without diacritical signs were placed on the board during the same game. We have to define one rule, not mix two different rules in one game.
If we decide to add the letters Q W X Z Å with number 0 and value 0 to the Finnish letter set, we must keep in suomi.dic the words containing those letters. If we decide not to add the letters Q W X Z Å with number 0 and value 0 to the Finnish letter set, we have to delete the words containing those letters in suomi.dic.
Zitat von xyz im Beitrag #20You are right about the duplicates! In the original list homonyms may have several entries. That explains why there are fewer words in the Scrabble3D entries number. The original Kotus list contains inflection information which might be used some day, for example to categorize the words (verbs/nouns etc), and homonyms may have different inflections. Program seems to be able to pass the duplicates, so is it OK to leave the duplicates in the list?
Zitat von Gast im Beitrag #18Personally I like to play Scrabble with inflected word forms allowed, it is a better game, and better for imagination and creativity. Also works excellently handicapping the computer in human-computer games!
I agree. For instance in French, Italian, Spanish and German games, inflected forms are played, and this is what makes the game much more interesting.
As Finnish is an agglutinative language, the question about which inflected forms should be implemented in the word list, is very difficult.
In 2011 we already have had a discussion (in German) about agglutination in Hungarian language and inflected forms in magyar.dic (see Agglutination und Zulässigkeit von Wortformen im Hinblick aufs magyar.dic ), but we did not really find a solution, and we could not get any good electronic list of the Hungarian dictionary either. That's why we don't have a magyar.dic for Scrabble3D yet. The discussion stopped and unfortunately the Hungarian speaking forum members are not active any more.
I have edited the Finnish dictionary to have these word categories according to the letters contained in the entry:
[Categories] 1=Q,W,X,Z,Å (149 entries) 2=Š,Ž (69) 3=À,Á,Â,È,É,Ê,Î,Ô (17) 4=- (966) 5=' (17) 6=0-9 (4) 7=Entries with spaces (255) 8=Incomplete words (start or end with -) (802)
For entries with space, I used non-breaking space, unicode 160, in the dictionary and the letterset, because space character couldn't be saved in the letterset.
The rest, i.e. uncategorized words in the list are the "official" Finnish Scrabble letters: ABCDEFGHIJKLMNOPRSTUVYÄÖ
With a letterset that has all these characters (as a new preset) it is now possible to play Finnish standard Scrabble or "anything-in-the-dictionary-is-ok" and almost anything in between by selecting/unselecting categories. With all the categories selected, computer also can play any of the dictionary entries if it scores well enough. And all this without changing the source code!
Some categories have overlap, in which case the entry is in the higher-numbered category. The order is approximate Finnishness from the top category down. It is easy to change or recombine the categories if necessary. Another possible order would be 5,4,.. because these categories are mostly pure Finnish, not loan-words. But perhaps the order is not important.
The uncategorized "official" entries can perhaps later be categorized into verbs, nouns etc. possibly for language learning, according to information in the original list.
Zitat von xyz im Beitrag #28I have edited the Finnish dictionary to have these word categories according to the letters contained in the entry:...
Sounds reasonable to me, but take into consideration to combine some categories (e.g. 2+3 or 4+8). If you want to make this dictionary and the appropriate letter set public, you should find other players that support this idea, or rather if j24 can live with this solution and Bussinchen does not have objections (or anyone else). Then you can send me your dic and letter distribution and I'll upload it to SF.net to spread it around the world.
As I said before: I don't support non-letters like apostrophs, hyphens, numbers, spaces as "letters" in the letter set nor non-complete words as valid words. This is not Scrabble any more.
In my opinion this will spoil the reputation of Scrabble3D in Finland.
We don't know who the guest J24 is - J24 is not a registered forum member. So we cannot ask him/her. Nobody else but me is discussing this question here in the forum. I have told you my opinion, then xyz can do what he wants. But I don't like it. This is not professional.