Pinnwand- Tatoeba

Pinnwand (6.976 Diskussionen)

Nützliche Hinweise

Antworten auf einige häufig gestellte Fragen findest du hier.

Wir möchten eine angenehme Atmosphäre, in der gesittete Diskussionen geführt werden, aufrechterhalten, siehe Regeln gegen schlechtes Benehmen.

Neueste Nachrichten

feedback

sharptoothed

gestern

subdirectory_arrow_right

DJ_Saidez

vor 6 Tagen

feedback

saverius

vor 8 Tagen

feedback

saverius

vor 11 Tagen

subdirectory_arrow_right

Igider

vor 11 Tagen

subdirectory_arrow_right

Rafik

vor 11 Tagen

subdirectory_arrow_right

Pfirsichbaeumchen

vor 11 Tagen

feedback

Tighra_tlelli

vor 11 Tagen

subdirectory_arrow_right

Wezel

vor 13 Tagen

feedback

saverius

vor 13 Tagen

blay_paul 28. Februar 2010 28. Februar 2010 um 10:29:25 UTC

link

zur Pinnwand

Favourite'd sentences not working.

I set a number of sentences as 'favourite', but when I use the link in my account page it says "This user does not have any favorites."

Antworten verbergen Antworten anzeigen

sysko 6. März 2010 6. März 2010 um 16:05:43 UTC

link

zur Pinnwand

Tatoeba has been updated, this bug should have been fixed by now

Antworten verbergen Antworten anzeigen

xtofu80 9. März 2010 9. März 2010 um 19:58:16 UTC

link

zur Pinnwand

I can only see the first ten favourite sentences. The other sentences are not accessible, as far as I see.

sysko 28. Februar 2010 28. Februar 2010 um 12:19:44 UTC

link

zur Pinnwand

Yep the issue has been fixed and will re-work in next release, which should not be long know, sorry to make you wait

xtofu80 4. März 2010 4. März 2010 um 12:21:03 UTC

link

zur Pinnwand

I found some incorrect sentences linked together, but don't know how to resolve the problem:
Sentence nº313285 "no-smoking area" is correctly translated in
Sentence nº90428 as "禁煙区域" but incorrectly in
Sentence nº90427 as "禁猟区", which means no-fishing zone, or no-hunting zone.

The best solution would be to cut the link between nº313285 and nº90427. Or should I delete nº90427 and add it again as the correct translation of my German sentence about the hunting zone?
Greetings

Antworten verbergen Antworten anzeigen

blay_paul 4. März 2010 4. März 2010 um 12:35:11 UTC

link

zur Pinnwand

I think the Japanese sentence with 禁猟区 is a kind of typo and I suggested it be deleted. I added a new pair of sentences for 禁猟区 to replace it.

If you think the German sentence is worth keeping then you can do so, but I think the Japanese version was a bit odd.

Nemo 1. März 2010 1. März 2010 um 03:54:10 UTC

link

zur Pinnwand

I've looked through a lot of contributions, and I've come to the realization that there are a LOT of contributions in English made by non-native speakers. I assume the same is the case for other languages, especially Japanese. There needs to be some sort of indicator for each sentence on whether or not the last editor was a native speaker. I've seen a lot of English sentences that are perfectly grammatical, with no errors at all, that I have never in my entire life heard someone utter -- correct or not, a native speaker would never say them.

Antworten verbergen Antworten anzeigen

sysko 1. März 2010 1. März 2010 um 09:05:58 UTC

link

zur Pinnwand

As the major part of both japanase and english come from the tanaka corpus, I can understand that the english is not really reliable, but for most of others language, Spanish, German, Polish, Chinese, I can say that for these languages 99% has been added by native

I agree we need a way to precise if the sentence has been added or reviewed by a native, we're currently thinking about a nice way to do that, maybe something to tag some sentences as "trust"

anyway for the moment one can assume that sentences which belong to someone are much more reliable than orphans

Antworten verbergen Antworten anzeigen

blay_paul 1. März 2010 1. März 2010 um 11:32:44 UTC

link

zur Pinnwand

> As the major part of both japanese and english come from
> the tanaka corpus, I can understand that the english is
> not really reliable.

The Tanaka Corpus was, initially, generated by students submitting pairs of sentences with the intent that the Japanese and English meant the same thing. So the Japanese is marginally more reliable than the English because the person entering it was Japanese.

However you cannot assume that the Japanese is correct and the English unreliable all the time. It's more complicated than that.

Antworten verbergen Antworten anzeigen

xtofu80 6. März 2010 6. März 2010 um 16:09:00 UTC

link

zur Pinnwand

Being a native German speaker, I came across both Japanese and English sentence which I felt were not correct, however I was not 100% sure.
It would be a cool feature if non-natives could mark a sentence as "questionable", and then this sentence could be checked and corrected or verified by a native speaker. I suppose this would be rather easy to implement using the word list feature. So a non-native speaker would not correct a sentence which he is not 100% sure about, but put it into this list, and native speakers could occasionally go through the list and check for grammatical errors. This would drastically improve the quality of the sentences, if the feature is known and used by most users.

Antworten verbergen Antworten anzeigen

blay_paul 6. März 2010 6. März 2010 um 16:54:55 UTC

link

zur Pinnwand

Just post saying that you're not sure they are correct. There are enough native speakers of English to check the English speakers (and, though I may be biased, I think I'm good enough at Japanese to usually have a good idea as to whether a sentence is OK).

blay_paul 26. Februar 2010 26. Februar 2010 um 14:15:19 UTC

link

zur Pinnwand

Romaji generator.

Are the tool and data used available online anywhere?

Antworten verbergen Antworten anzeigen

sysko 28. Februar 2010 28. Februar 2010 um 00:03:05 UTC

link

zur Pinnwand

yep it's from the kakasi project
http://kakasi.namazu.org/
as said before, the project seems more than dead :(

Antworten verbergen Antworten anzeigen

blay_paul 28. Februar 2010 28. Februar 2010 um 09:12:54 UTC

link

zur Pinnwand

Oooh yes. I remember this now.

The bad news is that kakasi probably isn't really fixable. I think you'd need to re-writing the code in a major way, not just add a few lines to the dictionary, to fix it.

The good news? Removing the line
ぜつ絶
from the file 'kakasidict' may correct one romaji error in generated romaji.

Antworten verbergen Antworten anzeigen

sysko 28. Februar 2010 28. Februar 2010 um 12:21:50 UTC

link

zur Pinnwand

I think we can also try to find if there's people motivated to start a project for automatic romanization of japanese, or looking if there's not an embryon of such project and see how we can help

Antworten verbergen Antworten anzeigen

contour 28. Februar 2010 28. Februar 2010 um 17:54:22 UTC

link

zur Pinnwand

For now, if there was the possibility to enter the romaji explicitly, and if manually entered and automatically generated romaji could be separated, that should make for a good test set for evaluating different methods for automatic generation.

I think that ideally one would start with a mature project, and automatically add corrections to the training set.

Antworten verbergen Antworten anzeigen

blay_paul 28. Februar 2010 28. Februar 2010 um 19:20:08 UTC

link

zur Pinnwand

> I think that ideally one would start with a mature
> project, and automatically add corrections to the
> training set.

There are six main approaches that could be taken.
1. Drop romaji support.
2. Allow manual correction of romaji.
3. Develop romaji generation code that uses the WWWJDIC index line.
4. Further develop kakasi
5. Look for alternative romaji conversion software.
6. Develop romaji conversion software from scratch.

I would recommend 1, 2, or 3.

4. Could be done, but I think you would soon reach limits on what is achievable.

Antworten verbergen Antworten anzeigen

sysko 1. März 2010 1. März 2010 um 00:09:20 UTC

link

zur Pinnwand

(I don't speak japanese at all, so excuse me if i speak non sense)

Nemo talk about JUMAN to replace kakasi, which can output in kana,
is kana not better as that way we're sure people who can't write japanese will not "accidently" mess up the "romanization", by restricting the "reading" part to kana characters , and we're also sure people use the same convention as there's only one kana per "sound"
(Trang always take about different way to write the romaji)

what do you think ? Trang ?

Antworten verbergen Antworten anzeigen

Nemo 1. März 2010 1. März 2010 um 03:11:15 UTC

link

zur Pinnwand

I should give a little more information than I have in the past posts I have, I think, because there seems to have been little progress. I don't really want to come off as being harsh, but the reality is that Kakashi is a lost cause. Whoever coded the program did so in a very naive way, and to use sed to correct its errors would take an inordinate amount of both human and CPU time, and in the best case scenario, it would cause such undue load on the server so as to make tatoeba unusable. I've gotten the impression that Kakashi was chosen with little to no consideration of other options (c.f. below), despite the fact that there exist ways to accurately dissect Japanese text into parseable units, which could be further changed into romaji. The reality is, Kakashi is nowhere near mature enough to produce accurate results, and as an abandoned project there is little hope of it reaching that maturity -- its output will never get any better than it is. In contrast, Juman seems to be near-perfect, though I will admit that I have not tried the other romanizers suggested in the blog post, nor have I done extensive testing of Juman. Regardless, Juman seems to be acceptable, even optimal. Kakashi falls so far short of the mark that I'm not sure why it is even in use. I would even go so far as to state that if Kakashi remained the method of conversion, that by the time tatoeba becomes popular, greasemonkey scripts will be produced which correct romaji via some other means, if that's even feasible. (Here's the blog post I referenced: http://blog.tatoeba.org/2009/02...anization.html )

Antworten verbergen Antworten anzeigen

TRANG 1. März 2010 1. März 2010 um 20:37:21 UTC

link

zur Pinnwand

Yes, to be honest, KAKASI was chosen with no consideration of other options. It was the first one I found that when I searched for a romaji converter, so I picked it.

And only later I wrote this blog post where actually searched and I listed other solutions. Solutions that I should have explored but never had the time to =/
I completely agree with you that KAKASI is not the long term solution.

Anyway, considering you have been taking the time to write all these posts, I will take a look at Juman ;). But if you can just tell me quickly what command to use to get a Japanese sentence parsed and converted into kana, that can save me some time from going through the documentation. Ah but, does JUMAN supports UTF-8...?

Antworten verbergen Antworten anzeigen

contour 1. März 2010 1. März 2010 um 22:19:40 UTC

link

zur Pinnwand

From a quick look, it looks like you have to convert to and from EUC-JP. Piping a sentence through "juman -b -c" gives one line per word, with readings in the second of the space-separated columns.

Antworten verbergen Antworten anzeigen

Nemo 2. März 2010 2. März 2010 um 02:03:07 UTC

link

zur Pinnwand

There's a powerpoint tutorial, I'll look at it when I have time and translate it. The translated user guide focuses on the whole idea behind the system, and why it was/how it was developed, and then when it comes to the syntax, it's just a bunch of "I don't know this word" and "If you break this down, it would mean something like..."

Nemo 1. März 2010 1. März 2010 um 02:51:08 UTC

link

zur Pinnwand

If Juman's kana/categorization output is accurate, it can produce 100% perfect romaji output. Kana give a representation of how something is said, along with its syntactical representation. There are ambiguities in kana, but JUMAN gives enough information that the pronunciation and syntax can be reconciled to provide a perfect, phonetic, romanization.

TRANG 1. März 2010 1. März 2010 um 22:00:51 UTC

link

zur Pinnwand

My ideal approach would be using WWWJDIC indices, combined with a better software for conversion into romaji or kana.

As for making romaji editable, if we were to make anything editable, I'd rather it be kana, like what sysko suggested.

If the purpose is to provide something useful for learning, then it's obviously better to have a sentence in kana, with spaces so that the learner knows how the sentence is composed. And of course we can use the sentence in kana to generate correct romaji.

TRANG 23. Februar 2010 23. Februar 2010 um 21:25:16 UTC

link

zur Pinnwand

That took me forever to write but hopefully it will prevent us from explaining certain things over and over again: http://blog.tatoeba.org/2010/02...n-tatoeba.html

Antworten verbergen Antworten anzeigen

sysko 24. Februar 2010 24. Februar 2010 um 00:36:04 UTC

link

zur Pinnwand

WOooOW i haven't read it entirely, and you've did a really damn good job, many thanks to Trang :)

lilygilder 24. Februar 2010 24. Februar 2010 um 01:15:26 UTC

link

zur Pinnwand

Thanks Trang, this clears up a lot of problems. Very helpful. =) And kudos for writing all of this.

blay_paul 22. Februar 2010 22. Februar 2010 um 20:04:06 UTC

link

zur Pinnwand

Autolinking broken (See latest comment in http://tatoeba.org/eng/sentence...81800#comments )

link to
http://mitleid.cool.ne.jp/tonegawa.htm
ends up pointing to
http://tatoeba.org/eng/sentence...gawa.htm%5C%27

Antworten verbergen Antworten anzeigen

sysko 22. Februar 2010 22. Februar 2010 um 20:58:34 UTC

link

zur Pinnwand

yep the issue is known and already fixed, it will reported in next release (which will come soon) :)

blay_paul 22. Februar 2010 22. Februar 2010 um 12:35:42 UTC

link

zur Pinnwand

Here's another suggestion.

There's some space on the right hand side of the 'home' page. I suggest you use it to show the most recent posts in the 'Wall'. Probably best if you just show the first line or two and make it a link to the #-anchor of the message in question.

blay_paul 22. Februar 2010 22. Februar 2010 um 11:13:20 UTC

link

zur Pinnwand

Simple suggestion.

In a break from the difficult and / or controversial suggestions I have one simple one to offer.

I suggest that the sentence list pages (e.g.
http://tatoeba.org/eng/sentences_lists/edit/24
) should use their description as their page title (e.g.
Sentence lists: jpn->eng translations needed
instead of just
Sentence lists
)

blay_paul 21. Februar 2010 21. Februar 2010 um 13:45:36 UTC

link

zur Pinnwand

Seriously - romaji editing now. ;-)

I don't think there's any point in waiting for "a serious Japanese contributor". Most of the romaji errors are very obvious and either I, or half a dozen or so regulars here, would be well able to correct them if they had the chance.

I would go as far as to say that it would be better not to have romaji AT ALL rather than leave them in the current state.

Antworten verbergen Antworten anzeigen

TRANG 21. Februar 2010 21. Februar 2010 um 18:16:09 UTC

link

zur Pinnwand

Then it may be no romaji at all... But I want opinion from more users first. Is it better to have no romaji at all, or is it better to have something even if it's not 100% correct?

I know Nemo is against romaji as well, but if I have added it, it was because more than two people had requested it in the past.

Regarding editable romaji, I'd rather avoid having people to waste time on correcting romaji which is why I don't want to make it editable.
Most of the time it's a systematic error that can be found in more than 100 other sentences. If I made romaji editable, you'd have to edit them one by one.
You'd also have to make sure everyone agrees on the romanization rules and follows them, which is again more work.

I think it's better to improve the software (not necessarily KAKASI) to the point where it can't get any better. It would save time for so many other people in the world...

Perhaps there is someone out there who is actively developing an open source Japanese parser and furgina-romaji converter. I haven't had time to search, but if you do find one (and by "you" I mean anyone who is reading this), by all means, let me know.

Antworten verbergen Antworten anzeigen

blay_paul 21. Februar 2010 21. Februar 2010 um 20:55:46 UTC

link

zur Pinnwand

> I'd rather avoid having people to waste time on
> correcting romaji which is why I don't want to make
> it editable.

That's basically another way of saying that the romaji isn't important. If the romaji isn't important I'd rather it wasn't there than be there and often incorrect. ;-)

Having a kana version or furigana would be a nice alternative. kana would get rid of the
o / wo
e / he
wa / ha
confusion. Note that a combination of Edict and the Index information could be used to generate pretty-much-correct furigana or kana. (Not that easy, but doable)

Antworten verbergen Antworten anzeigen

JeroenHoek 1. März 2010 1. März 2010 um 09:39:11 UTC

link

zur Pinnwand

I agree with Paul that furigana might me preferable to broken rōmaji. Learning the basics of kana shouldn't take you more than a month or two, after that, kanji readings become the hard part. Furigana should, in my opinion, eliminate the need for rōmaji for learners of Japanese.

Rōmaji is mostly useful for transcribing Japanese for a public that cannot read any Japanese at all. Also, the rōmaji generated by Kakasi is wāpuro-rōmaji.

Nemo 21. Februar 2010 21. Februar 2010 um 20:13:59 UTC

link

zur Pinnwand

I'm for having all the romaji on the site be accurate. If the best way to do that is deleting all of the romaji, then I'd say do that. If you really want an accurate romaji representation, it will probably need to be written ad hoc. I don't think this should be too hard though, so long as it is written for this project specifically, and it is done soon. This site is currently comprised mostly of the Tanaka Corpus, so far as I am aware, so almost every word in the Japanese examples should be also present in EDICT, which has the reading of every word in it in kana. If there are multiple readings, I would just make the output something like:
僕は市場へ行った
*** boku wa (shijyou | ichiba) e itta

So that the edge cases could be fixed. It's still a lot of work, but it's doable. (In this case, the difference is irrelevant, but in many it could be relevant). You could then dump the database into a text file of all beginning with ***. I believe EDICT even has the readings listed in order of frequency, so if you wanted to you could have it just guess the first one every time, and fixing the few that got put in incorrectly would not be a huge ordeal. I would recommend keeping some automatic conversion in place, and storing things in the database as:
僕は市場へ行った
ぼくはしじょうへいった
and having the conversion take place from the kana to romaji on-the-fly. Also, force those editing the romaji to use kana. Basically introduce a learning curve that will discourage those who don't know better from thinking they do. Also, changes in romanization could be implemented very easily. I personally use wapuro romaji whenever I do, which is rare still, but I know this is less than ideal for learning.

Antworten verbergen Antworten anzeigen

Nemo 21. Februar 2010 21. Februar 2010 um 20:24:15 UTC

link

zur Pinnwand

My whole post is a waste of time, lol. The software you are using has an output to kana mode, which would not be subject to the pitfalls that romaji is. I suggest we use that. Kana is not that difficult to learn, and there's no sense in learning grammar/sentences before kana anyway.

Antworten verbergen Antworten anzeigen

Nemo 21. Februar 2010 21. Februar 2010 um 20:33:34 UTC

link

zur Pinnwand

We need post editing, haha. JUMAN does exactly what you need. It converts from kanji to hiragana, and labels each word with what it is. So, if it says は is a 助詞 (particle), you can output wa, and the same for all of the others. I'm not sure that it outputs romaji (The sample set-up does not), but with kana and part of speech, romaji is just a lookup table away.

cburgmer 19. Februar 2010 19. Februar 2010 um 20:10:36 UTC

link

zur Pinnwand

May I disturb the silence once again. Consider the situation where somebody translates sentence A into B, then somebody later comes along to translate B into C. It then turns out that B is wrong and is consecutively changed. This invalidates C. Any ideas/plans for that?

Antworten verbergen Antworten anzeigen

TRANG 19. Februar 2010 19. Februar 2010 um 20:37:46 UTC

link

zur Pinnwand

Part of the answer is in the comment I wrote here:
http://tatoeba.org/eng/sentences/show/126

And in my todo list for the weekend: write some guideline so that users know how contribute correctly.

Antworten verbergen Antworten anzeigen

cburgmer 19. Februar 2010 19. Februar 2010 um 20:46:25 UTC

link

zur Pinnwand

Thx to both of you. I see the whole system is well thought-out.

sysko 19. Februar 2010 19. Februar 2010 um 20:37:57 UTC

link

zur Pinnwand

this have already been discussed ^^

soon, we will add an unlink feature, so you will be able to say "these sentence are no longer translations to each other"

so what to do in your case

in fact you're not supposed to change the B sentence, as long as the sentence is by itself correct, because as you've said, it will make translation of B erroneous too

so you just add a B2 sentence and add a note that the B sentence will need to be unlink to A sentence

Pinnwand (6.976 Diskussionen)

Nützliche Hinweise

sharptoothed

DJ_Saidez

saverius

saverius

Igider

Rafik

Pfirsichbaeumchen

Tighra_tlelli

Wezel

saverius

Hilfe

Entwickler

Über uns