Opmaarken
The data you will find here will NOT be useful unless you are coding a language tool or processing data.
If you simply want sentences that you can use to learn a language, check out the sentence lists. You can build your own, or view the ones that others have created. The lists can be downloaded and printed.
Algemaine informoatsie over bestanden
Many of the Japanese and English sentences are from the Tanaka Corpus, which belongs to the public domain.
Creative commons
These files are released under CC BY 2.0 FR.
A part of our sentences are also available under CC0 1.0.
Lisensies veur audio
The license covering an audio file is chosen by the contributor, and is indicated on the page that lists the audio files that he or she has contributed.
Nog vroagen?
If you have questions or requests, feel free to contact us. In general, we answer quickly.
Downloads
Use this tool to generate and download customized exports on demand.
Download all sentences in language A that are translated into language B, along with the translations.
Zinnen
- Bestaandsnoam
-
Ale toalen Allain zinnen ien t: Abgazie Adygees Afrihili Afrikoans Aino Aklanon Albanees Algeriens-Arabisch Amhoars Ancient Hebrew Arabisch Aragonees Assamees Assyrisch Neo-Aramees Asturisch Avar Awadhi Aymara Azerbeidzjoans Baarg-Mari Balinees Baluchi Bambara Bandjarees Basjkiers Baskisch Baybayanon Bayers Bengoals Berber Berom Bhojpuri Birmoans Bislama Bodo Boerjoatisch Bosnisch Bretons Brithenig Bulgoars Cayuga Cebuano Central Kanuri Central Kurdish (Soranî) Centroal Dusun Centroal-Huasteca-Nahuatl Centroalbikol Centroalmnong Chagatai Chamorro Chavacano Cherokee Chinook Jargon Choctaw Congo Swahili Cuyonon CycL Deens Divehi Drìnts Dungan Dutton World Speedwords Duuts Eastern Armenian Egyptisch-Arabisch Emilioans Engels Erromintxela Erzjoa Esperanto Esties Evenks Ewe Extremeens Fareus Fijisch Fijisch Hindoestani Fins Fraans Frais Friulisch Fuloa Ga Gagaoezisch Gakassisch Galizjoans Gan Sinees Garhwali Gegisch Georgisch Goedjaratisch Golf Arabisch Gotisch Graiks Gruinlaands Grunnegs Guadeloups Kreools Guarani Guerrero-Nahuatl Haitioans Kreools Hakka Hausa Hawaioans Hibbrais Hiligaynon Hindi Hitchiti Hmong Daw (Wit) Hmong Njua (Gruin) Ho Hongoars Hunsrik Iban Ido Iers Ieslaands Igbo Ilocano Indonezisch Ingrisch Interglossa Interlingua Interlingue Inuktitut Irakees Arabisch Isoan Italjoans Jakoets Jamaikoans Patois Japans Javanees Jeuds Babylonisch Aramees Jewish Palestinian Aramaic Jiddisj Jin Juhuri (Judeo-Tat) K'iche' Kabardian Kabylisch Kadazan (Kustgebied) Kalmoeks Kamba Kannada Kantonees Kapampangan Karakalpaks Karakhanid Karatsjaj-Balkoarisch Karelisch Kasjmiri Kasjoebisch Katalaans Kazachs Kekchi (Q'eqchi') Kelantan-Pattani Malay Keningau Murut Keuls Khalaj Khasi Khmer Kinyarwanda Kirgizisch Kiribati Kirundi Klassiek Sinees Klingon Koemoeks Komi-Permjoaks Komi-Zurjeens Konkani (Goan) Koreaans Kornisch Korsikaans Kotava Krim-Tatoars Kroatisch Kveens Kymrisch Láadan Ladinisch Ladino Lakota Lao Latain Leegduuts (Leegsaksisch) Letgoals Lets Libyan Arabic Liefs Ligurisch Limbörgs Lingala Lingua Franca Nova Litouws Loazisch Lojban Lombardisch Louisiana Kreools Lugandoa Lushootseed Luxembörgs Madoerees Mahasu Pahari Maithili Malagasi Malaisisch Malaisisch (Informeel) Malayalam Maltees Mambae Mandarien Sinees Mantjoe Manx Maori Mapuche Marathi Marokkoans-Arabisch Marshallees Mauritioans Mazedonisch Meitei Mi'kmaq Middelengels Middelfraans Middle Persian (Pahlavi) Min Nan Sinees Minangkabaus Mingreels Mirandees Mohawk Moksjoa Mon Mongolisch Mono (USA) Muskogee (Creek) Nagoa (Tangshang) Nahuatl Nande Nauroaans Navajo Neapolitan Nederlaands Nedersorbisch Nepalees Newari Ngeq Nigerioans Fulfulde Niueoans Njungoa Nogai Noordfrais Noordhaida Noordlevantiens Arabisch Noordmoluks Malais Noordsamisch Noors (Bokmål) Noors (Nynorsk) Northern Kurdish (Kurmancî) Northern Zaza (Kirmanjki) Novial Nuer Nuosu Nyanja O'odham Occitoans Odioa (Oriyoa) Oedmoerts Oeigoers Oekraiens Oerdoe Oezbeeks Ojibwe Okinawoans Oldaramees Oldengels Oldfraans Oldfrais Oldgraiks Oldnoors Oldpraissisch Oldrussisch Oldsaksisch Oldspoans Oldtörs Oldtupi Oppersorbisch Orizaba-Nahuatl Osmoans Ossetisch Palaus Pali Paltsisch Pangasinan Papiamìnts Pasjtoe Pennsylvania-Duuts Perzisch Phoenician Piemontees Pikardisch Pipil Plains Cree Pools Portugees Punjabi (Oost) Punjabi (West) Qashqai Quechua Quenya Rapa Nui Rendille Reto-Romoans Riffains Roemeens Roetheens Rohingya Romani Russisch Samogitisch Samooans Sango Sanskriet Santali Saraiki Sardiens Selterfrais Servisch Setswana Seychels Kreools Shona Shuswap Sicilioans Silesian Sindariens Sindhi Sinees Pidginengels Singalees Sjanghainees Skots Skots-Goals Slovaaks Sloveens Soemerisch Soendanees Somalisch South Levantine Arabic Southern Kurdish Southern Zaza (Dimli) Spoans Sranan Tongo Standard Moroccan Tamazight Svabisch Sveeds Svitzerduuts Swahili Swoazi Sylheti Syrisch Tachawit Tadzjieks Tagalog Tagol Murut Tahaggart Tamahaq Tahitioans Talossoans Talysjisch Tamil Tashelhit Tatar Telugu Temuoans Tetun Thai Tibetoans Tigre Tigrinya Tjechisch Tjetjeens Tjoektjisch Tjoevasjisch Toevoans Tok Pisin Tokelaus Toki Pona Tonga (Zambezi) Tongoans Törkmeens Törs Tsongoa Tumbuka Tuvaluoans Uab Meto Umbundu Urhobo Veniesjoans Vepsisch Vietnamees Volapük Võro Waaide-Mari Waray-Waray Wayuu Western Armenian Witrussisch Woals Wolof Xhosa Xiang Sinees Yoruba Yucatec Maya Zais Zaza Zoeloe Zuudaltai Zuudhaida Zuudsamisch Zuudsotho Zuudsubanen Onbekinde toal - Bestaandsomschrieven
- Contains all the sentences in the selected language. Each sentence is associated with a unique id and an ISO 639-3 language code.
- Fields and structure
- Zin-ID [tab] Toal [tab] Tekst
Detailleerde zinnen
- Bestaandsnoam
-
{{sentencesDetailed | filename}}
Ale toalen Allain zinnen ien t: Abgazie Adygees Afrihili Afrikoans Aino Aklanon Albanees Algeriens-Arabisch Amhoars Ancient Hebrew Arabisch Aragonees Assamees Assyrisch Neo-Aramees Asturisch Avar Awadhi Aymara Azerbeidzjoans Baarg-Mari Balinees Baluchi Bambara Bandjarees Basjkiers Baskisch Baybayanon Bayers Bengoals Berber Berom Bhojpuri Birmoans Bislama Bodo Boerjoatisch Bosnisch Bretons Brithenig Bulgoars Cayuga Cebuano Central Kanuri Central Kurdish (Soranî) Centroal Dusun Centroal-Huasteca-Nahuatl Centroalbikol Centroalmnong Chagatai Chamorro Chavacano Cherokee Chinook Jargon Choctaw Congo Swahili Cuyonon CycL Deens Divehi Drìnts Dungan Dutton World Speedwords Duuts Eastern Armenian Egyptisch-Arabisch Emilioans Engels Erromintxela Erzjoa Esperanto Esties Evenks Ewe Extremeens Fareus Fijisch Fijisch Hindoestani Fins Fraans Frais Friulisch Fuloa Ga Gagaoezisch Gakassisch Galizjoans Gan Sinees Garhwali Gegisch Georgisch Goedjaratisch Golf Arabisch Gotisch Graiks Gruinlaands Grunnegs Guadeloups Kreools Guarani Guerrero-Nahuatl Haitioans Kreools Hakka Hausa Hawaioans Hibbrais Hiligaynon Hindi Hitchiti Hmong Daw (Wit) Hmong Njua (Gruin) Ho Hongoars Hunsrik Iban Ido Iers Ieslaands Igbo Ilocano Indonezisch Ingrisch Interglossa Interlingua Interlingue Inuktitut Irakees Arabisch Isoan Italjoans Jakoets Jamaikoans Patois Japans Javanees Jeuds Babylonisch Aramees Jewish Palestinian Aramaic Jiddisj Jin Juhuri (Judeo-Tat) K'iche' Kabardian Kabylisch Kadazan (Kustgebied) Kalmoeks Kamba Kannada Kantonees Kapampangan Karakalpaks Karakhanid Karatsjaj-Balkoarisch Karelisch Kasjmiri Kasjoebisch Katalaans Kazachs Kekchi (Q'eqchi') Kelantan-Pattani Malay Keningau Murut Keuls Khalaj Khasi Khmer Kinyarwanda Kirgizisch Kiribati Kirundi Klassiek Sinees Klingon Koemoeks Komi-Permjoaks Komi-Zurjeens Konkani (Goan) Koreaans Kornisch Korsikaans Kotava Krim-Tatoars Kroatisch Kveens Kymrisch Láadan Ladinisch Ladino Lakota Lao Latain Leegduuts (Leegsaksisch) Letgoals Lets Libyan Arabic Liefs Ligurisch Limbörgs Lingala Lingua Franca Nova Litouws Loazisch Lojban Lombardisch Louisiana Kreools Lugandoa Lushootseed Luxembörgs Madoerees Mahasu Pahari Maithili Malagasi Malaisisch Malaisisch (Informeel) Malayalam Maltees Mambae Mandarien Sinees Mantjoe Manx Maori Mapuche Marathi Marokkoans-Arabisch Marshallees Mauritioans Mazedonisch Meitei Mi'kmaq Middelengels Middelfraans Middle Persian (Pahlavi) Min Nan Sinees Minangkabaus Mingreels Mirandees Mohawk Moksjoa Mon Mongolisch Mono (USA) Muskogee (Creek) Nagoa (Tangshang) Nahuatl Nande Nauroaans Navajo Neapolitan Nederlaands Nedersorbisch Nepalees Newari Ngeq Nigerioans Fulfulde Niueoans Njungoa Nogai Noordfrais Noordhaida Noordlevantiens Arabisch Noordmoluks Malais Noordsamisch Noors (Bokmål) Noors (Nynorsk) Northern Kurdish (Kurmancî) Northern Zaza (Kirmanjki) Novial Nuer Nuosu Nyanja O'odham Occitoans Odioa (Oriyoa) Oedmoerts Oeigoers Oekraiens Oerdoe Oezbeeks Ojibwe Okinawoans Oldaramees Oldengels Oldfraans Oldfrais Oldgraiks Oldnoors Oldpraissisch Oldrussisch Oldsaksisch Oldspoans Oldtörs Oldtupi Oppersorbisch Orizaba-Nahuatl Osmoans Ossetisch Palaus Pali Paltsisch Pangasinan Papiamìnts Pasjtoe Pennsylvania-Duuts Perzisch Phoenician Piemontees Pikardisch Pipil Plains Cree Pools Portugees Punjabi (Oost) Punjabi (West) Qashqai Quechua Quenya Rapa Nui Rendille Reto-Romoans Riffains Roemeens Roetheens Rohingya Romani Russisch Samogitisch Samooans Sango Sanskriet Santali Saraiki Sardiens Selterfrais Servisch Setswana Seychels Kreools Shona Shuswap Sicilioans Silesian Sindariens Sindhi Sinees Pidginengels Singalees Sjanghainees Skots Skots-Goals Slovaaks Sloveens Soemerisch Soendanees Somalisch South Levantine Arabic Southern Kurdish Southern Zaza (Dimli) Spoans Sranan Tongo Standard Moroccan Tamazight Svabisch Sveeds Svitzerduuts Swahili Swoazi Sylheti Syrisch Tachawit Tadzjieks Tagalog Tagol Murut Tahaggart Tamahaq Tahitioans Talossoans Talysjisch Tamil Tashelhit Tatar Telugu Temuoans Tetun Thai Tibetoans Tigre Tigrinya Tjechisch Tjetjeens Tjoektjisch Tjoevasjisch Toevoans Tok Pisin Tokelaus Toki Pona Tonga (Zambezi) Tongoans Törkmeens Törs Tsongoa Tumbuka Tuvaluoans Uab Meto Umbundu Urhobo Veniesjoans Vepsisch Vietnamees Volapük Võro Waaide-Mari Waray-Waray Wayuu Western Armenian Witrussisch Woals Wolof Xhosa Xiang Sinees Yoruba Yucatec Maya Zais Zaza Zoeloe Zuudaltai Zuudhaida Zuudsamisch Zuudsotho Zuudsubanen Onbekinde toal - Bestaandsomschrieven
- Contains additional fields for each sentence (owner name, date created/modified).
- Fields and structure
- Zin-ID [tab] Toal [tab] Tekst [tab] Gebroekersnoam [tab] Doatum touvougd [tab] Doatum veur lest wiezigd
Original and Translated Sentences
- Bestaandsnoam
- sentences_base.tar.bz2
- Bestaandsomschrieven
-
Each sentence is listed as original or a translation of another. The "base" field can have the following values:
- zero: The sentence is original, not a translation of another.
- greater than zero: The id of the sentence from which it was translated.
- \N: Unknown (rare).
- Fields and structure
- Zin-ID [tab] Base field
Zinnen (CC0)
- Bestaandsnoam
-
Ale toalen Allain zinnen ien t: Algeriens-Arabisch Ancient Hebrew Arabisch Bengoals Berber Deens Duuts Engels Esperanto Fins Fraans Hibbrais Hindi Ho Hongoars Ido Interlingua Italjoans Japans Jeuds Babylonisch Aramees Jewish Palestinian Aramaic Jiddisj Kabylisch Kantonees Karelisch Katalaans Klassiek Sinees Klingon Konkani (Goan) Kveens Kymrisch Láadan Ladino Latain Ligurisch Mandarien Sinees Middelengels Nederlaands Njungoa Noors (Bokmål) Oekraiens Oldaramees Oldfrais Oldgraiks Oldnoors Phoenician Pools Portugees Russisch Santali Spoans Standard Moroccan Tamazight Sveeds Sylheti Tachawit Tjechisch Toki Pona Volapük Witrussisch Onbekinde toal - Bestaandsomschrieven
- Contains all the sentences available under CC0.
- Fields and structure
- Zin-ID [tab] Toal [tab] Tekst [tab] Doatum veur lest wiezigd
Hinwiezens
- Bestaandsnoam
- links.tar.bz2
- Bestaandsomschrieven
- Contains the links between the sentences. 1 [tab] 77 means that sentence #77 is the translation of sentence #1. The reciprocal link is also present, so the file will also contain a line that says 77 [tab] 1.
- Fields and structure
- Zin-ID [tab] Vertoalen-ID
Labels
- Bestaandsnoam
- tags.tar.bz2
- Bestaandsomschrieven
- Contains the list of tags associated with each sentence. 381279 [tab] proverb means that sentence #381279 has been assigned the "proverb" tag.
- Fields and structure
- Zin-ID [tab] Labelnoam
Liesten
- Bestaandsnoam
- user_lists.tar.bz2
- Bestaandsomschrieven
- Contains the list of sentence lists.
- Fields and structure
- Liest-ID [tab] Gebroekersnoam [tab] Doatum aanmoakt [tab] Doatum veur lest wiezigd [tab] Liestnoam [tab] Bewaarkboar deur
Zinnen ien liesten
- Bestaandsnoam
- sentences_in_lists.tar.bz2
- Bestaandsomschrieven
- Indicates the sentences that are contained by any lists. 13 [tab] 381279 means that sentence #381279 is contained by the list that has an id of 13.
- Fields and structure
- Liest-ID [tab] Zin-ID
Japanese indices
- Bestaandsnoam
- jpn_indices.tar.bz2
- Bestaandsomschrieven
- Contains the equivalent of the "B lines" in the Tanaka Corpus file distributed by Jim Breen. See this page for the format. Each entry is associated with a pair of Japanese/English sentences. Zin-ID refers to the id of the Japanese sentence. Betaiken-ID refers to the id of the English sentence.
- Fields and structure
- Zin-ID [tab] Betaiken-ID [tab] Tekst
Zinnen mit audio
- Bestaandsnoam
- sentences_with_audio.tar.bz2
- Bestaandsomschrieven
- Contains the ids of the sentences, in all languages, for which audio is available. Other fields indicate who recorded the audio, its license and a URL to attribute the author. If the license field is empty, you may not reuse the audio outside the Tatoeba project.
- Downloading audio
- A single sentence can have one or more audio, each from a different voice. To download a particular audio, use its audio id to compute the download URL. For example, to download the audio with the id 1234, the URL is https://tatoeba.org/audio/download/1234.
- Fields and structure
- Zin-ID [tab] Audio id [tab] Gebroekersnoam [tab] Lisensie [tab] Attribution URL
User skill level per language
- Bestaandsnoam
- user_languages.tar.bz2
- Bestaandsomschrieven
- Indicates the self-reported skill levels of members in individual languages.
- Fields and structure
- Toal [tab] Skill level [tab] Gebroekersnoam [tab] Details
Users' sentence reviews
- Bestaandsnoam
- users_sentences.csv
- Bestaandsomschrieven
- Contains sentences reviewed by users. The value of the review can be -1 (sentence not OK), 0 (undecided or unsure), or 1 (sentence OK). Warning: this data is still experimental.
- Fields and structure
- Gebroekersnoam [tab] Zin-ID [tab] Review [tab] Doatum touvougd [tab] Doatum veur lest wiezigd
Transkripsies
- Bestaandsnoam
-
Ale toalen Allain zinnen ien t: Japans Kantonees Mandarien Sinees Oezbeeks - Bestaandsomschrieven
- Contains all transcriptions in auxiliary or alternative scripts. A username associated with a transcription indicates the user who last reviewed and possibly modified it. A transcription without a username has not been marked as reviewed. The script name is defined according to the ISO 15924 standard.
- Fields and structure
- Zin-ID [tab] Toal [tab] Skriptnoam [tab] Gebroekersnoam [tab] Transkriptie