Look I don't want to get into a heated argument when we don't even have a working prototype. Fact is, Japanese people DO use Katakana for words normally written in Hiragana, both historically (remember Hiragana started out as woman's writing) and today - mainly for emphasis. Just open any old Manga and look for the whoosh-bang-ouch type words. All in Katakana.
The primary concern just now is to see if using either form of Kana or some sort of hybrid we can create a playable game. The initial outcome was that Hiragana alone produces too many ladders and blocks because of the frequency of two-Kana words which means it's not really playable. Yes, it's a property of the language, I'm not suggesting the Japanese language be changed to fit the game but it may mean that Kana is simply not suited to a Scrabble-type game. So since we're experimenting, I'm trying to see if Katakana produces something more workable, alright? I'm not suggesting we force anyone to do anything. I think you'll find Japanese people a bit more flexible in relation to Kana than you might think they are...
Take this mag cover:
where the word otokonoko 'boy' (the big yellow Kana) which is normally written in Kanji and Hiragana 男の子 or Hiragana おとこんこ is written in Katakana オトコノコ. In fact, though it's a bit hard to read, I can't see much in the way of Hiragana ANYwhere on the cover where you'd normally expect it.
When I started experimenting with Gaelic many years ago I didn't think it would work either because I started out treating h as a letter which made the game unplayable but when I reverted to the historic digraphs, suddenly it worked great. Sometimes thinking outside the box is a good idea. And yes, sometimes even thinking outside the box fails :)
Now regarding the range issue... it should go as far as 30FC (the ー length mark) - not including that will have removed a lot of words. I'm not entirely sure about the rest - is it possible that there is an end-of-line space character in a lot of lines? That would account for the small number of words in the output.
Scotty, how do you define characters during filtering? UTF code range I'm guessing? I think it's probably just a character range definition issue. And yes, all lines which contain at least one space character need to go.
xyz, there's an issue with the Romaji version - it contains a lot of acronyms by the looks of it, you pleayed LL, RF, CM etc. And the Hiragana still has an over-abundance of 2 kana words.
I'm not saying I'm necessarily right but I would like to try the Katakana version first before we decide on a way forward.
That's the way Kana are used today but it wasn't always so. Prior to WW2, much of what is written today in Hiragana was written in Katakana only. Also, Hiragana words are sometimes written in Katakana for emphasis so it's not totally unheard of even today.
There are also practical reasons related to the use of the choonpu i.e. you can easily convert a Hiragana list to Katakana but not the other way round. Given the issues we have with stairs etc in the game, I think that is (for now) the key factor, having an unplayable game is not much use to anyone.
xyz, thanks for that. not sure about the copyright issues on that one - I think while we're trying to work out if it's feasible or not, we're better off sticking to the open source file.
Right, with the help of our code mage, I have done a merged list - all in Katakana. Here's in detail what I did: * Converted all Hiragana only words to Katakana * Converted mixed Hiragana/Kanji words to Katakana * Converted mixed Katakana/Kanji words to Katakana * Converted all Kanji to Katakana
There's a bit of cleaning up to do. I used an automatic converter (with some spot checking) so overall I'm fairly confident the quality is good. - there are strings which contain a space, these need to be chucked out - there are strings which contains items outside the Katakana range (some Hiragana, some Kanji, I think these are rare combinations the converter was not familiar with)
At the moment the list is just short of 120k, I think once we've thrown out the messy ones, we should be on 100k or so, which seems reasonable.
If this gives us a reasonably balanced game, we could run it as a beta and see if we can attract some Japanese people to clean up the list etc etc.
I was thinking along the lines of actually invoking a pre-war convention and switching all Hiragana to Katakana, because apart from increasing the list by using Katakana, they also tend to be longer. But the wordlist you listed seems a lot bigger. It might be an option to simple use 3-kana as a minimum length though I'm not sure if that would fix the stair issue.
Hm, the problem with syllabic scripts is that you never have enough syllables on your bar, you'd need something massive. It would probably be a lot of extra work but if you used only the Hangul components, then it might work. As in, Unicode has this massive Hangul block but it's actually composed of 24 letters (it actually is an alphabetic script but because the letters are grouped in blocks of 1 to 4 symbols it comes across as being syllabic).
So 을 (riŭl) for example is composed of ㅇ and ㅡ and ㄹ. But how you would convert them, tricky. If there is a reliable converter, maybe.
I think of all the Asian scripts the only one which is realistic would be Japanese if you did a Hiragana based Scrabble.
Ah hab eins hier gefunden. Wenn wir nur Hiragana nehmen, dann bräuchten wir Zeile 6 bis 18230, wobei noch einige Zeilen rauszuwerfen wären die Kanji (dh die komplexen Chinesischen Lehnzeichen enthalten) enthalten aber das schaffe ich technisch nicht. Man könnte uU eine Tabelle der erwünschten Hiragana Zeichen erstellen und dann alles rauswerfen was sonst Zeichen hat, die nicht in der liste sind oder das uU über die Unicode bereiche machen.
Alle Zeilen in der nur ein einziges Zeichen ist, müßte man auch rauswerfen.
Aber sonst hätten wir dann ein brauchbares Konzept für Scrabble3D in Japanisch (Hiragana). Aber das sind eher Gedankenspiele, ich gleube man bräuchte dazu mindestens ... grrr sorry, lapsed into German, we'd need at least one speaker of Japanese.
It would appear, however, that Hiragana versions are the tentative script of choice for Japanese Scrabble, I did some searching and found this image of someone who did a paper version and Scrabble-ish games also use Hiragana. Maybe one could do a beta and put it out there and wait for a response?
So why don't we do Japanese as a proof of concept for Asian languages? It should be fairly easy. There are three scripts, one is like Chinese the other two are syllabic (Hiragana and Katakan) but much more limited in scope than Korean. To begin with, we can dispense with Katakana, the main difference is purpose i.e. foreign terms are usually written with Katakana, native terms with Hiragana. There's around 100 or so that you need but I don't think it needs much in the way of modifying the game itself. So for example sushi is written as すし (su + shi), manga is まんが (ma + n + ga) (n is the only consonant only symbol, before you ask).
The main trick would be to find a Hiragana wordlist.
Hm, the problem with syllabic scripts is that you never have enough syllables on your bar, you'd need something massive. It would probably be a lot of extra work but if you used only the Hangul components, then it might work. As in, Unicode has this massive Hangul block but it's actually composed of 24 letters (it actually is an alphabetic script but because the letters are grouped in blocks of 1 to 4 symbols it comes across as being syllabic).
So 을 (riŭl) for example is composed of ㅇ and ㅡ and ㄹ. But how you would convert them, tricky. If there is a reliable converter, maybe.
I think of all the Asian scripts the only one which is realistic would be Japanese if you did a Hiragana based Scrabble.
Hm I just checked but they all seem fine (well, there was a linguistic fix I did but no plural). Not every plural form always should change between form ONE, TWO, FEW and OTHER. It all depends on how the plural is formed and what letter it starts with. For example, it's 1 chat, 2 chat but 1 taigh, 2 thaigh.
Thema von akerbeltzalba im Forum General Questions Abou...
If you're working offline in the new po format, you may find that in Virtaal (for example) you're told the file is complete but then Transifex tells you you're at 99% or something. That's most likely down to a transitional issue i.e. Virtaal is looking at an old string which contained Simple Plurals like You received %d point(s) and thinks the source has been translated but Transifex checks all fields and tells you they're incomplete.
Quickest fix is to do a search for the strings Transifex is flagging and fixing them either offline or online.
My thinking is to take the English file and make it match the German translation, which is very good (I think Bussinchen did it). I could do that if you tell me what the best way of doing that is. I mean, which file should I proofread?
I think it's important, Transifex is a much more public place and we will probably attract other translators who know little about the project, so they need good source strings.
Thema von akerbeltzalba im Forum General Questions Abou...
I know this has been bandied around but having just gone through most of the file again, the English really needs fixing, at least the strings which are given out to the localizers.
I'm a tad tuned out, I believe we had just the Default language but now we have Default and English? Or is that the British English file on Transifex? And if so, should we not be using those? There's some really crazy strings offered up for translation like tsBoardConfig.Caption which is "tsBoardConfig" or tsWordCheckMode.Caption which is given as "tsWordCheckMode"
normally on other projects any change in the source makes the original translation go to fuzzy or blank (depending on how big the change is), most annoyingly, that happens when the source corrects a typo or something but on the bright side, using a translation memory (which is now possible since we're on po) this is usually very quick to fix.
Thanks for the setting up the project on Transifex. A couples of things: 1) A lot of the source strings seem to have lost the accelerators (e.g. New Game (for which we had the translation &Geama ùr), is that deliberate? If yes, I will remove them from the Gaelic po
2) There seems to have been some odd language crossover on some strings. &Word search for example had ended up being Italian in the Gaelic po. I cannot see enough to make that a problem (so far only 3 strings) but it might be worthwhile for translators to do a quick gross error check on the old translations.
3) Are the source strings here the old, slightly fishy English translations or is this the translation which Bussinchen et al proofread at some point? If it's the proofread version, I will give the old strings a quick look to make sure it's all nice and tidy.
Yikes the top level title for the Fula forum is long ;)
Regarding the po, I'm getting slightly annoyed with Transifex because they take forever fixing stuff or adding a new language and as a result, the problem with the Gaelic plural bug is getting really annoying. And there was another problem I can't remember off the top of my head. But I'll keep looking.
The thing is, hosting the po files on the web really only becomes very important when you have teams working on each language. At the moment I think virtually every translation is handled by a single translator, so translating the po or lang files offline isn't really much of an issue.
As long as someone has a hosting package, it's fairly easy to install MediaWiki (even I managed to do that!). I'm not sure though if it is possible to have two wikis on the same hosting packer, I will have to check. But if someone else has a hosting package, that would be an option too.