Note

The data you will find here will NOT be useful unless you are coding a language tool or processing data.

If you simply want sentences that you can use to learn a language, check out the sentence lists. You can build your own, or view the ones that others have created. The lists can be downloaded and printed.

General information about the files

Many of the Japanese and English sentences are from the Tanaka Corpus, which belongs to the public domain.

Creative commons

These files are released under CC BY 2.0 FR.

A part of our sentences are also available under CC0 1.0.

Licenses covering audio

The license covering an audio file is chosen by the contributor, and is indicated on the page that lists the audio files that he or she has contributed.

Questions?

If you have questions or requests, feel free to contact us. In general, we answer quickly.

Downloads

Use this tool to generate and download customized exports on demand.

translate

Sentence pairs

Download all sentences in language A with translations in language B

Download all sentences in language A that are translated into language B, along with the translations.

Sentence language:

Translation language:

info The files provided below are updated every Saturday at 6:30 a.m. (UTC).

Sentences

Filename: {{sentences | filename}}
All languages
Only sentences in: Adyghe Afrihili Afrikaans Ainu Aklanon Albanian Algerian Arabic Amharic Ancient Greek Ancient Hebrew Anglica Arabica Aragonese Assamese Assyrian Neo-Aramaic Asturian Awadhi Aymara Azerbaijani Balinese Bambara Banjar Bashkir Basque Batavica Bavarian Baybayanon Belarusian Berber Berom Bhojpuri Bislama Bodo Bosnian Breton Brithenig Bulgarian Burmese Cantonese Catalan Cayuga Cebuano Central Bikol Central Dusun Central Huasteca Nahuatl Central Kanuri Central Kurdish (Soranî) Central Mnong Chagatai Chamorro Chavacano Chechen Cherokee Chinese Pidgin English Chinook Jargon Chinyanja Choctaw Chuvash Coastal Kadazan Congo Swahili Coreana Cornish Corsican Croatian Cuyonon CycL Czech Danish Drents Dunganica Dutton World Speedwords Eastern Armenian Egyptian Arabic Emilian Erromintxela Erzya Esperantica Estonian Ewe Extremaduran Faroese Fiji Hindi Fijian Finnish Frisian Friulian Ga Gagauz Galician Gallica Gan Chinese Georgian Germanica Gheg Albanian Gilbertese Gothic Graeca Gronings Guadeloupean Creole French Guarani Guerrero Nahuatl Gujarati Gulf Arabic Haitian Creole Hakka Chinese Hausa Hawaiian Hebraica Hiligaynon Hindi Hispanica Hitchiti Hmong Daw (White) Hmong Njua (Green) Hungarian Hunsrik Iaponica Iban Icelandic Ido Igbo Indonesiana Ingrian Interglossa Interlingua Interlingue Inuktitut Iraqi Arabic Irish Isan Italica Javanese Jewish Babylonian Aramaic Jewish Palestinian Aramaic Jin Chinese Juhuri (Judeo-Tat) K'iche' Kabyle Kalmyk Kamba Kannada Karachay-Balkar Karakhanid Karelian Kashmiri Kashubian Kazakh Kekchi (Q'eqchi') Kelantan-Pattani Malay Keningau Murut Khalaj Khasi Khmer Kinyarwanda Kirundi Klingon Kölsch Komi-Permyak Komi-Zyrian Konkani (Goan) Kotava Kumyk Kven Finnish Kyrgyz Láadan Ladino Lakota Lao Latgalian Latina Laz Libyan Arabic Ligurian Lingala Lingua Abasca lingua Avarica Lingua Baluchica lingua bengalica Lingua Borussica Lingua Buriatica Lingua Cabardino-Circassica Lingua Castellana antiqua Lingua Chakassica Lingua Dhivehi Lingua Evenkensis Lingua Franca Nova Lingua Francogallica antiqua Lingua Garhvali Lingua Groenlandica Lingua Ho lingua Iacutica Lingua Ilocana Lingua Karakalpakensis Lingua Ladina Lingua Lettonica lingua limburgica Lingua Mari montana Lingua Mari pratensis lingua Meitei lingua Neapolitana Lingua Ojibwayensis Lingua Ossetica Lingua Palica Lingua Phoenicia Lingua Quenya lingua Santali lingua Saraiki Lingua Saxonica antiqua lingua Silati lingua Silesica Lingua Sindarin Lingua Sinensis Lingua Slavica orientalis antiqua lingua Suebica Lingua Syriaca Lingua Talossana Lingua Tatarica Crimensis Lingua Toki Pona Lingua Tschuctschica Lingua Zingarica Literary Chinese Lithuanian Livonian Lojban Lombard Louisiana Creole Low German (Low Saxon) Lower Sorbian Luganda Lushootseed Lusitanica Luxembourgish Macedonian Madurese Mahasu Pahari Maithili Malagasy Malay (Vernacular) Malayalam Malayana Maltese Mambae Manchu Manx Maori Mapuche Marathi Marshallese Mi'kmaq Middle English Middle French Middle Persian (Pahlavi) Min Nan Chinese Minangkabau Mingrelian Mirandese Mohawk Moksha Mon Mongolian Mono (USA) Morisyen Moroccan Arabic Muskogee (Creek) Naga (Tangshang) Nahuatl Nande Nauruan Navajo Nepalensis Newari Ngeq Nigerian Fulfulde Niuean Nogai North Frisian North Levantine Arabic North Moluccan Malay Northern Haida Northern Kurdish (Kurmancî) Northern Sami Northern Zaza (Kirmanjki) Norwegian Bokmål Norwegian Nynorsk Novial Nuer Nuosu Nyungar O'odham Occitan Odia (Oriya) Okinawan Old Aramaic Old English Old Frisian Old Norse Old Tupi Old Turkish Orizaba Nahuatl Ottoman Turkish Palatine German Palauan pampangense Pangasinan Papiamento Pashto Patois Iamaicanus Pennsylvania German Persica Picard Piedmontese Pipil Plains Cree Polonica lingua Pulaar Punjabi (Eastern) Punjabi (Western) Qashqai Quechua Rapa Nui Rendille Rohingya Romanian Romansh Rusyn Ruthenica Samoan Samogitian Sango Sanskrit Sardinian Saterland Frisian Scots Scottish Gaelic Serbian Setswana Seychellois Creole Shanghainese Shona Shuswap Sicilian Sindhi Sinhala Slovak Slovenian Somali South Levantine Arabic Southern Altai Southern Haida Southern Kurdish Southern Sami Southern Sotho Southern Subanen Southern Zaza (Dimli) Sranan Tongo Standard Moroccan Tamazight Suahili Sumerian Sundanese Swazi Swedish Swiss German Tachawit Tagal Murut Tagalog Tahaggart Tamahaq Tahitian Tajik Talysh Tamil Tarifit Tashelhit Tatar Telugu Temuan Tetun Thai Tibetan Tigre Tigrinya Tok Pisin Tokelauane Tonga (Zambezi) Tongan Tsonga Tumbuka Turkish Turkmen Tuvaluan Tuvinian Uab Meto Udmurt Ukrainian Umbundu Upper Sorbian Urdu Urhobo Uyghur Uzbek Venetian Veps Vietnamica Volapük Võro Walloon Waray Wayuu Welsh Western Armenian Wolof Xhosa Xiang Chinese Yiddish Yoruba Yucatec Maya Zaza Zeelandic Zulu Unknown language
File description: Contains all the sentences in the selected language. Each sentence is associated with a unique id and an ISO 639-3 language code.
Fields and structure: Sentence id [tab] Lang [tab] Text

Detailed Sentences

Filename: {{sentencesDetailed | filename}}
All languages
Only sentences in: Adyghe Afrihili Afrikaans Ainu Aklanon Albanian Algerian Arabic Amharic Ancient Greek Ancient Hebrew Anglica Arabica Aragonese Assamese Assyrian Neo-Aramaic Asturian Awadhi Aymara Azerbaijani Balinese Bambara Banjar Bashkir Basque Batavica Bavarian Baybayanon Belarusian Berber Berom Bhojpuri Bislama Bodo Bosnian Breton Brithenig Bulgarian Burmese Cantonese Catalan Cayuga Cebuano Central Bikol Central Dusun Central Huasteca Nahuatl Central Kanuri Central Kurdish (Soranî) Central Mnong Chagatai Chamorro Chavacano Chechen Cherokee Chinese Pidgin English Chinook Jargon Chinyanja Choctaw Chuvash Coastal Kadazan Congo Swahili Coreana Cornish Corsican Croatian Cuyonon CycL Czech Danish Drents Dunganica Dutton World Speedwords Eastern Armenian Egyptian Arabic Emilian Erromintxela Erzya Esperantica Estonian Ewe Extremaduran Faroese Fiji Hindi Fijian Finnish Frisian Friulian Ga Gagauz Galician Gallica Gan Chinese Georgian Germanica Gheg Albanian Gilbertese Gothic Graeca Gronings Guadeloupean Creole French Guarani Guerrero Nahuatl Gujarati Gulf Arabic Haitian Creole Hakka Chinese Hausa Hawaiian Hebraica Hiligaynon Hindi Hispanica Hitchiti Hmong Daw (White) Hmong Njua (Green) Hungarian Hunsrik Iaponica Iban Icelandic Ido Igbo Indonesiana Ingrian Interglossa Interlingua Interlingue Inuktitut Iraqi Arabic Irish Isan Italica Javanese Jewish Babylonian Aramaic Jewish Palestinian Aramaic Jin Chinese Juhuri (Judeo-Tat) K'iche' Kabyle Kalmyk Kamba Kannada Karachay-Balkar Karakhanid Karelian Kashmiri Kashubian Kazakh Kekchi (Q'eqchi') Kelantan-Pattani Malay Keningau Murut Khalaj Khasi Khmer Kinyarwanda Kirundi Klingon Kölsch Komi-Permyak Komi-Zyrian Konkani (Goan) Kotava Kumyk Kven Finnish Kyrgyz Láadan Ladino Lakota Lao Latgalian Latina Laz Libyan Arabic Ligurian Lingala Lingua Abasca lingua Avarica Lingua Baluchica lingua bengalica Lingua Borussica Lingua Buriatica Lingua Cabardino-Circassica Lingua Castellana antiqua Lingua Chakassica Lingua Dhivehi Lingua Evenkensis Lingua Franca Nova Lingua Francogallica antiqua Lingua Garhvali Lingua Groenlandica Lingua Ho lingua Iacutica Lingua Ilocana Lingua Karakalpakensis Lingua Ladina Lingua Lettonica lingua limburgica Lingua Mari montana Lingua Mari pratensis lingua Meitei lingua Neapolitana Lingua Ojibwayensis Lingua Ossetica Lingua Palica Lingua Phoenicia Lingua Quenya lingua Santali lingua Saraiki Lingua Saxonica antiqua lingua Silati lingua Silesica Lingua Sindarin Lingua Sinensis Lingua Slavica orientalis antiqua lingua Suebica Lingua Syriaca Lingua Talossana Lingua Tatarica Crimensis Lingua Toki Pona Lingua Tschuctschica Lingua Zingarica Literary Chinese Lithuanian Livonian Lojban Lombard Louisiana Creole Low German (Low Saxon) Lower Sorbian Luganda Lushootseed Lusitanica Luxembourgish Macedonian Madurese Mahasu Pahari Maithili Malagasy Malay (Vernacular) Malayalam Malayana Maltese Mambae Manchu Manx Maori Mapuche Marathi Marshallese Mi'kmaq Middle English Middle French Middle Persian (Pahlavi) Min Nan Chinese Minangkabau Mingrelian Mirandese Mohawk Moksha Mon Mongolian Mono (USA) Morisyen Moroccan Arabic Muskogee (Creek) Naga (Tangshang) Nahuatl Nande Nauruan Navajo Nepalensis Newari Ngeq Nigerian Fulfulde Niuean Nogai North Frisian North Levantine Arabic North Moluccan Malay Northern Haida Northern Kurdish (Kurmancî) Northern Sami Northern Zaza (Kirmanjki) Norwegian Bokmål Norwegian Nynorsk Novial Nuer Nuosu Nyungar O'odham Occitan Odia (Oriya) Okinawan Old Aramaic Old English Old Frisian Old Norse Old Tupi Old Turkish Orizaba Nahuatl Ottoman Turkish Palatine German Palauan pampangense Pangasinan Papiamento Pashto Patois Iamaicanus Pennsylvania German Persica Picard Piedmontese Pipil Plains Cree Polonica lingua Pulaar Punjabi (Eastern) Punjabi (Western) Qashqai Quechua Rapa Nui Rendille Rohingya Romanian Romansh Rusyn Ruthenica Samoan Samogitian Sango Sanskrit Sardinian Saterland Frisian Scots Scottish Gaelic Serbian Setswana Seychellois Creole Shanghainese Shona Shuswap Sicilian Sindhi Sinhala Slovak Slovenian Somali South Levantine Arabic Southern Altai Southern Haida Southern Kurdish Southern Sami Southern Sotho Southern Subanen Southern Zaza (Dimli) Sranan Tongo Standard Moroccan Tamazight Suahili Sumerian Sundanese Swazi Swedish Swiss German Tachawit Tagal Murut Tagalog Tahaggart Tamahaq Tahitian Tajik Talysh Tamil Tarifit Tashelhit Tatar Telugu Temuan Tetun Thai Tibetan Tigre Tigrinya Tok Pisin Tokelauane Tonga (Zambezi) Tongan Tsonga Tumbuka Turkish Turkmen Tuvaluan Tuvinian Uab Meto Udmurt Ukrainian Umbundu Upper Sorbian Urdu Urhobo Uyghur Uzbek Venetian Veps Vietnamica Volapük Võro Walloon Waray Wayuu Welsh Western Armenian Wolof Xhosa Xiang Chinese Yiddish Yoruba Yucatec Maya Zaza Zeelandic Zulu Unknown language
File description: Contains additional fields for each sentence (owner name, date created/modified).
Fields and structure: Sentence id [tab] Lang [tab] Text [tab] Username [tab] Date added [tab] Date last modified

Original and Translated Sentences

Filename

sentences_base.tar.bz2

File description

Each sentence is listed as original or a translation of another. The "base" field can have the following values:

zero: The sentence is original, not a translation of another.
greater than zero: The id of the sentence from which it was translated.
\N: Unknown (rare).

Fields and structure

Sentence id [tab] Base field

Sentences (CC0)

Filename: {{sentencesCC0 | filename}}
All languages
Only sentences in: Algerian Arabic Ancient Greek Ancient Hebrew Anglica Arabica Batavica Belarusian Berber Cantonese Catalan Czech Danish Esperantica Finnish Gallica Germanica Hebraica Hindi Hispanica Hungarian Iaponica Ido Interlingua Italica Jewish Babylonian Aramaic Jewish Palestinian Aramaic Kabyle Karelian Klingon Konkani (Goan) Kven Finnish Láadan Ladino Latina Ligurian lingua bengalica Lingua Ho Lingua Phoenicia lingua Santali lingua Silati Lingua Sinensis Lingua Toki Pona Literary Chinese Lusitanica Middle English Norwegian Bokmål Nyungar Old Aramaic Old Frisian Old Norse Polonica lingua Ruthenica Standard Moroccan Tamazight Swedish Tachawit Ukrainian Volapük Welsh Yiddish Unknown language
File description: Contains all the sentences available under CC0.
Fields and structure: Sentence id [tab] Lang [tab] Text [tab] Date last modified

Lists

Filename: user_lists.tar.bz2
File description: Contains the list of sentence lists.
Fields and structure: List id [tab] Username [tab] Date created [tab] Date last modified [tab] List name [tab] Editable by

Sentences in lists

Filename: sentences_in_lists.tar.bz2
File description: Indicates the sentences that are contained by any lists. 13 [tab] 381279 means that sentence #381279 is contained by the list that has an id of 13.
Fields and structure: List id [tab] Sentence id

Japanese indices

Filename: jpn_indices.tar.bz2
File description: Contains the equivalent of the "B lines" in the Tanaka Corpus file distributed by Jim Breen. See this page for the format. Each entry is associated with a pair of Japanese/English sentences. Sentence id refers to the id of the Japanese sentence. Meaning id refers to the id of the English sentence.
Fields and structure: Sentence id [tab] Meaning id [tab] Text

Sentences with audio

Filename: sentences_with_audio.tar.bz2
File description: Contains the ids of the sentences, in all languages, for which audio is available. Other fields indicate who recorded the audio, its license and a URL to attribute the author. If the license field is empty, you may not reuse the audio outside the Tatoeba project.
Downloading audio: A single sentence can have one or more audio, each from a different voice. To download a particular audio, use its audio id to compute the download URL. For example, to download the audio with the id 1234, the URL is https://tatoeba.org/audio/download/1234.
Fields and structure: Sentence id [tab] Audio id [tab] Username [tab] License [tab] Attribution URL

User skill level per language

Filename: user_languages.tar.bz2
File description: Indicates the self-reported skill levels of members in individual languages.
Fields and structure: Lang [tab] Skill level [tab] Username [tab] Details

Users' sentence reviews

Filename: users_sentences.csv
File description: Contains sentences reviewed by users. The value of the review can be -1 (sentence not OK), 0 (undecided or unsure), or 1 (sentence OK). Warning: this data is still experimental.
Fields and structure: Username [tab] Sentence id [tab] Review [tab] Date added [tab] Date last modified

Transcriptions

Filename: {{transcriptions | filename}}
All languages
Only sentences in: Cantonese Iaponica Lingua Sinensis Uzbek
File description: Contains all transcriptions in auxiliary or alternative scripts. A username associated with a transcription indicates the user who last reviewed and possibly modified it. A transcription without a username has not been marked as reviewed. The script name is defined according to the ISO 15924 standard.
Fields and structure: Sentence id [tab] Lang [tab] Script name [tab] Username [tab] Transcription

Note

General information about the files

Creative commons

Licenses covering audio

Questions?

Downloads

Sentences

Detailed Sentences

Original and Translated Sentences

Sentences (CC0)

Links

Tags

Lists

Sentences in lists

Japanese indices

Sentences with audio

User skill level per language

Users' sentence reviews

Transcriptions

Need some help?

Developers

About

Note

General information about the files

Creative commons

Licenses covering audio

Questions?

Downloads

Custom exports

Sentence pairs

Weekly exports

Sentences

Detailed Sentences

Original and Translated Sentences

Sentences (CC0)

Links

Tags

Lists

Sentences in lists

Japanese indices

Sentences with audio

User skill level per language

Users' sentence reviews

Transcriptions

Need some help?

Developers

About