menu
Tatoeba
language
Daftar Log in
language Bahasa Melayu
menu
Tatoeba

chevron_right Daftar

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search

Note

The data you will find here will NOT be useful unless you are coding a language tool or processing data.

If you simply want sentences that you can use to learn a language, check out the sentence lists. You can build your own, or view the ones that others have created. The lists can be downloaded and printed.

General information about the files

Many of the Japanese and English sentences are from the Tanaka Corpus, which belongs to the public domain.

Creative commons

These files are released under CC BY 2.0 FR.

Creative Commons License CC-BY

A part of our sentences are also available under CC0 1.0.

Creative Commons License CC0

Licenses covering audio

The license covering an audio file is chosen by the contributor, and is indicated on the page that lists the audio files that he or she has contributed.

Questions?

If you have questions or requests, feel free to contact us. In general, we answer quickly.

Muat Turun

arrow_back

Custom exports

Sentence pairs

Use this tool to generate and download customized exports on demand.

translate Sentence pairs
Download all sentences in language A with translations in language B

Download all sentences in language A that are translated into language B, along with the translations.

Weekly exports

info The files provided below are updated every Saturday at 6:30 a.m. (UTC).

Ayat-ayat

Filename

{{sentences | filename}}

Semua bahasa
Only sentences in: Adyghe Afrihili Afrikaans Ainu Aklanon Albania Altai Selatan Amharic Arab Arab Algeria Arab Iraq Arab Libya Arab Maghribi Arab Mesir Arab Syam Selatan Aragon Aramaic Tua Aramaic Yahudi Palestin Assam Assyrian Neo-Aramaic Asturia Avar Awadhi Aymara Azerbaijan Bahasa Abkhaz Bahasa Arab Syam Utara Bahasa Armenia Barat Bahasa Banjar Bahasa Chagatai Bahasa Chukchi Bahasa Divehi Bahasa Inggeris Pertengahan bahasa Jerman Pennsylvania bahasa Kadazan Bahasa Kapampangan Bahasa Khakas Bahasa Kirundi Bahasa Laz Bahasa Lombard Bahasa Manchu bahasa Melayu Pattani Bahasa Mon Bahasa Nauru Bahasa Pali Bahasa Phoenicia Bahasa Quenya Bahasa Silesia Bahasa Suryani Bahasa Swati bahasa Temuan Bahasa Turki Uthmaniyyah Bahasa Tuvalu Bali Baluchi Bambara Bashkir Basque Bavaria Baybayan Belanda Belarus Benggali Berber Berom Bhojpuri Bikol Tengah Bislama Bodo Bokmål Norway Bosnia Breton Brithenig Bulgaria Burma Buryat Catalonia Cayuga Cebuano Central Huasteca Nahuatl Central Kanuri Central Mnong Chamorro Chavacano Chechen Cherokee Chinook Jargon Chinyanja Choctaw Chuvash Cina Gan Cina Hakka Cina Hokkien Cina Mandarin Cina Sastera Cina Shanghai Cina Xiang Congo Swahili Cornish Corsica Crimean Tatar Croatia Cuyonon CycL Czech Denmark Drents Dungan Dusun Tengah Dutton World Speedwords Eastern Armenian Emilian Erromintxela Erzya Esperanto Estonia Evenki Ewe Extremaduran Faroe Fiji Fiji Hindi Finland Frisia Frisia Tua Friulian Ga Gagauz Galicia Garhwali Georgia Gheg Albanian Gothic Greek Purba Greenlandic Gronings Guadeloupean Creole French Guarani Guerrero Nahuatl Gujerat Haiti Hausa Hawaii Hiligaynon Hill Mari Hindi Hitchiti Hmong Daw (White) Hmong Njua (Green) Ho Hungary Hunsrik Iban Ibrani Ibrani Purba Iceland Ido Igbo Ilocano Indonesia Inggeris Inggeris Kuno Inggeris Pidgin Cina Ingria Interglossa Interlingua Interlingue Inuktitut Ireland Isan Itali Jamaican Patois Jawa Jepun Jerman Jerman Hilir (Sachsen Hilir) Jerman Palatin Jerman Switzerland Jewish Babylonian Aramaic Jin Chinese Juhuri (Judeo-Tat) K'iche' Kabardia Kabyle Kalmyk Kamba Kannada Kantonis Karachay-Balkar Karakalpak Karakhanid Karelian Kashmir Kashubia Kazakhstan Kekchi (Q'eqchi') Keningau Murut Khalaj Khasi Khmer Kinyarwanda Kirghiz Kiribati Klingon Kolsh Komi-Permyak Komi-Zyrian Konkani (Goan) Korea Kotava Kreol Louisiana Kumyk Kurdi Tengah (Sorani) Kurdi Utara (Kurmancî) Kurdish Selatan Kven Finnish Láadan Ladin Ladino Lakota Laos Latgalian Latin Latvia Ligurian Limburgish Lingala Lingua Franca Nova Lithuania Livonia Lojban Luganda Lushootseed Luxembourg Macedonia Madura Mahasu Pahari Maithili Malagasy Malayalam Malta Mambae Manx Maori Mapuche Marathi Marshall Meadow Mari Meitei Melayu Melayu (Vernakular) Melayu Maluku Utara Micmac Middle Persian (Pahlavi) Minangkabau Mingrelian Mirandese Mohawk Moksha Mongolia Mono (USA) Morisyen Muskogee (Creek) Naga (Tangshang) Nahuatl Nande Navajo Neapolitan Nepal Newari Ngeq Nigerian Fulfulde Niu Nogai Norse Tua North Frisian Northern Haida Novial Nuer Nuosu Nynorsk Norway Nyungar O'odham Occitania Odia (Oriya) Ojibwe Okinawa Orizaba Nahuatl Ossetia Palauan Pangasinan Papiamento Parsi Pashto Perancis Perancis Tengah Perancis Tua Picard Piedmontese Pipil Plains Cree Poland Portugis Prussia Tua Pulaar Punjabi (Barat) Punjabi (Barat) Qashqai Quechua Rapa Nui Rendille Rohingya Romani Romania Romansh Rusia Rusyn Sami Selatan Sami Utara Samoa Samogitia Sango Sanskrit Santali Saraiki Sardinia Saterland Frisian Saxon Tua Scots Scots Gaelic Sepanyol Sepanyol Tua Serbia Setswana Seychellois Creole Shona Shuswap Sicili Sindarin Sindhi Sinhala Slavik Timur Lama Slovak Slovenia Somali Sorbia Hilir Sorbia Hulu Sotho Selatan Southern Haida Sranan Tongo Subanun Selatan Sumeria Sunda Swabia Swahili Sweden Sylheti Tachawit Tagal Murut Tagalog Tahaggart Tamahaq Tahiti Tajik Talossa Talysh Tamazight Maghribi Standard Tamil Tarifit Tashelhit Tatar Telugu Teluk Arab Tetun Thai Tibet Tigre Tigrinya Tok Pisin Tokelau Toki Pona Tonga Tonga (Zambezi) Tsonga Tumbuka Tupi Tua Turki Turki Tua Turkmen Tuvinian Uab Meto Udmurt Ukraine Umbundu Urdu Urhobo Uyghur Uzbekistan Venetian Veps Vietnam Volapük Võro Wales Walloon Waray Wayuu Wolof Xhosa Yakut Yiddish Yoruba Yucatec Maya Yunani Zaza Zaza Selatan (Dimli) Zaza Utara (Kirmanjki) Zeelandic Zulu Unknown language
File description
Contains all the sentences in the selected language. Each sentence is associated with a unique id and an ISO 639-3 language code.
Fields and structure
id ayat [tab] Bahasa [tab] Text

Detailed Sentences

Filename

{{sentencesDetailed | filename}}

Semua bahasa
Only sentences in: Adyghe Afrihili Afrikaans Ainu Aklanon Albania Altai Selatan Amharic Arab Arab Algeria Arab Iraq Arab Libya Arab Maghribi Arab Mesir Arab Syam Selatan Aragon Aramaic Tua Aramaic Yahudi Palestin Assam Assyrian Neo-Aramaic Asturia Avar Awadhi Aymara Azerbaijan Bahasa Abkhaz Bahasa Arab Syam Utara Bahasa Armenia Barat Bahasa Banjar Bahasa Chagatai Bahasa Chukchi Bahasa Divehi Bahasa Inggeris Pertengahan bahasa Jerman Pennsylvania bahasa Kadazan Bahasa Kapampangan Bahasa Khakas Bahasa Kirundi Bahasa Laz Bahasa Lombard Bahasa Manchu bahasa Melayu Pattani Bahasa Mon Bahasa Nauru Bahasa Pali Bahasa Phoenicia Bahasa Quenya Bahasa Silesia Bahasa Suryani Bahasa Swati bahasa Temuan Bahasa Turki Uthmaniyyah Bahasa Tuvalu Bali Baluchi Bambara Bashkir Basque Bavaria Baybayan Belanda Belarus Benggali Berber Berom Bhojpuri Bikol Tengah Bislama Bodo Bokmål Norway Bosnia Breton Brithenig Bulgaria Burma Buryat Catalonia Cayuga Cebuano Central Huasteca Nahuatl Central Kanuri Central Mnong Chamorro Chavacano Chechen Cherokee Chinook Jargon Chinyanja Choctaw Chuvash Cina Gan Cina Hakka Cina Hokkien Cina Mandarin Cina Sastera Cina Shanghai Cina Xiang Congo Swahili Cornish Corsica Crimean Tatar Croatia Cuyonon CycL Czech Denmark Drents Dungan Dusun Tengah Dutton World Speedwords Eastern Armenian Emilian Erromintxela Erzya Esperanto Estonia Evenki Ewe Extremaduran Faroe Fiji Fiji Hindi Finland Frisia Frisia Tua Friulian Ga Gagauz Galicia Garhwali Georgia Gheg Albanian Gothic Greek Purba Greenlandic Gronings Guadeloupean Creole French Guarani Guerrero Nahuatl Gujerat Haiti Hausa Hawaii Hiligaynon Hill Mari Hindi Hitchiti Hmong Daw (White) Hmong Njua (Green) Ho Hungary Hunsrik Iban Ibrani Ibrani Purba Iceland Ido Igbo Ilocano Indonesia Inggeris Inggeris Kuno Inggeris Pidgin Cina Ingria Interglossa Interlingua Interlingue Inuktitut Ireland Isan Itali Jamaican Patois Jawa Jepun Jerman Jerman Hilir (Sachsen Hilir) Jerman Palatin Jerman Switzerland Jewish Babylonian Aramaic Jin Chinese Juhuri (Judeo-Tat) K'iche' Kabardia Kabyle Kalmyk Kamba Kannada Kantonis Karachay-Balkar Karakalpak Karakhanid Karelian Kashmir Kashubia Kazakhstan Kekchi (Q'eqchi') Keningau Murut Khalaj Khasi Khmer Kinyarwanda Kirghiz Kiribati Klingon Kolsh Komi-Permyak Komi-Zyrian Konkani (Goan) Korea Kotava Kreol Louisiana Kumyk Kurdi Tengah (Sorani) Kurdi Utara (Kurmancî) Kurdish Selatan Kven Finnish Láadan Ladin Ladino Lakota Laos Latgalian Latin Latvia Ligurian Limburgish Lingala Lingua Franca Nova Lithuania Livonia Lojban Luganda Lushootseed Luxembourg Macedonia Madura Mahasu Pahari Maithili Malagasy Malayalam Malta Mambae Manx Maori Mapuche Marathi Marshall Meadow Mari Meitei Melayu Melayu (Vernakular) Melayu Maluku Utara Micmac Middle Persian (Pahlavi) Minangkabau Mingrelian Mirandese Mohawk Moksha Mongolia Mono (USA) Morisyen Muskogee (Creek) Naga (Tangshang) Nahuatl Nande Navajo Neapolitan Nepal Newari Ngeq Nigerian Fulfulde Niu Nogai Norse Tua North Frisian Northern Haida Novial Nuer Nuosu Nynorsk Norway Nyungar O'odham Occitania Odia (Oriya) Ojibwe Okinawa Orizaba Nahuatl Ossetia Palauan Pangasinan Papiamento Parsi Pashto Perancis Perancis Tengah Perancis Tua Picard Piedmontese Pipil Plains Cree Poland Portugis Prussia Tua Pulaar Punjabi (Barat) Punjabi (Barat) Qashqai Quechua Rapa Nui Rendille Rohingya Romani Romania Romansh Rusia Rusyn Sami Selatan Sami Utara Samoa Samogitia Sango Sanskrit Santali Saraiki Sardinia Saterland Frisian Saxon Tua Scots Scots Gaelic Sepanyol Sepanyol Tua Serbia Setswana Seychellois Creole Shona Shuswap Sicili Sindarin Sindhi Sinhala Slavik Timur Lama Slovak Slovenia Somali Sorbia Hilir Sorbia Hulu Sotho Selatan Southern Haida Sranan Tongo Subanun Selatan Sumeria Sunda Swabia Swahili Sweden Sylheti Tachawit Tagal Murut Tagalog Tahaggart Tamahaq Tahiti Tajik Talossa Talysh Tamazight Maghribi Standard Tamil Tarifit Tashelhit Tatar Telugu Teluk Arab Tetun Thai Tibet Tigre Tigrinya Tok Pisin Tokelau Toki Pona Tonga Tonga (Zambezi) Tsonga Tumbuka Tupi Tua Turki Turki Tua Turkmen Tuvinian Uab Meto Udmurt Ukraine Umbundu Urdu Urhobo Uyghur Uzbekistan Venetian Veps Vietnam Volapük Võro Wales Walloon Waray Wayuu Wolof Xhosa Yakut Yiddish Yoruba Yucatec Maya Yunani Zaza Zaza Selatan (Dimli) Zaza Utara (Kirmanjki) Zeelandic Zulu Unknown language
File description
Contains additional fields for each sentence (owner name, date created/modified).
Fields and structure
id ayat [tab] Bahasa [tab] Text [tab] Nama pengguna [tab] Date added [tab] Date last modified

Original and Translated Sentences

Filename
sentences_base.tar.bz2
File description
Each sentence is listed as original or a translation of another. The "base" field can have the following values:
  • zero: The sentence is original, not a translation of another.
  • greater than zero: The id of the sentence from which it was translated.
  • \N: Unknown (rare).
Fields and structure
id ayat [tab] Base field

Sentences (CC0)

Filename

{{sentencesCC0 | filename}}

Semua bahasa
Only sentences in: Arab Arab Algeria Aramaic Tua Aramaic Yahudi Palestin Bahasa Inggeris Pertengahan Bahasa Phoenicia Belanda Belarus Benggali Berber Bokmål Norway Catalonia Cina Mandarin Cina Sastera Czech Denmark Esperanto Finland Frisia Tua Greek Purba Hindi Ho Hungary Ibrani Ibrani Purba Ido Inggeris Interlingua Itali Jepun Jerman Jewish Babylonian Aramaic Kabyle Kantonis Karelian Klingon Konkani (Goan) Kven Finnish Láadan Ladino Latin Ligurian Norse Tua Nyungar Perancis Poland Portugis Rusia Santali Sepanyol Sweden Sylheti Tachawit Tamazight Maghribi Standard Toki Pona Ukraine Volapük Wales Yiddish Unknown language
File description
Contains all the sentences available under CC0.
Fields and structure
id ayat [tab] Bahasa [tab] Text [tab] Date last modified

Links

Filename
links.tar.bz2
File description
Contains the links between the sentences. 1 [tab] 77 means that sentence #77 is the translation of sentence #1. The reciprocal link is also present, so the file will also contain a line that says 77 [tab] 1.
Fields and structure
id ayat [tab] Translation id

Tags

Filename
tags.tar.bz2
File description
Contains the list of tags associated with each sentence. 381279 [tab] proverb means that sentence #381279 has been assigned the "proverb" tag.
Fields and structure
id ayat [tab] Tag name

Lists

Filename
user_lists.tar.bz2
File description
Contains the list of sentence lists.
Fields and structure
List id [tab] Nama pengguna [tab] Date created [tab] Date last modified [tab] List name [tab] Editable by

Sentences in lists

Filename
sentences_in_lists.tar.bz2
File description
Indicates the sentences that are contained by any lists. 13 [tab] 381279 means that sentence #381279 is contained by the list that has an id of 13.
Fields and structure
List id [tab] id ayat

Japanese indices

Filename
jpn_indices.tar.bz2
File description
Contains the equivalent of the "B lines" in the Tanaka Corpus file distributed by Jim Breen. See this page for the format. Each entry is associated with a pair of Japanese/English sentences. id ayat refers to the id of the Japanese sentence. Meaning id refers to the id of the English sentence.
Fields and structure
id ayat [tab] Meaning id [tab] Text

Ayat-ayat bersama audio

Filename
sentences_with_audio.tar.bz2
File description
Contains the ids of the sentences, in all languages, for which audio is available. Other fields indicate who recorded the audio, its license and a URL to attribute the author. If the license field is empty, you may not reuse the audio outside the Tatoeba project.
Downloading audio
A single sentence can have one or more audio, each from a different voice. To download a particular audio, use its audio id to compute the download URL. For example, to download the audio with the id 1234, the URL is https://tatoeba.org/audio/download/1234.
Fields and structure
id ayat [tab] Audio id [tab] Nama pengguna [tab] License [tab] Attribution URL

User skill level per language

Filename
user_languages.tar.bz2
File description
Indicates the self-reported skill levels of members in individual languages.
Fields and structure
Bahasa [tab] Skill level [tab] Nama pengguna [tab] Details

Users' sentence reviews

Filename
users_sentences.csv
File description
Contains sentences reviewed by users. The value of the review can be -1 (sentence not OK), 0 (undecided or unsure), or 1 (sentence OK). Warning: this data is still experimental.
Fields and structure
Nama pengguna [tab] id ayat [tab] Review [tab] Date added [tab] Date last modified

Transcriptions

Filename

{{transcriptions | filename}}

Semua bahasa
Only sentences in: Cina Mandarin Jepun Kantonis Uzbekistan
File description
Contains all transcriptions in auxiliary or alternative scripts. A username associated with a transcription indicates the user who last reviewed and possibly modified it. A transcription without a username has not been marked as reviewed. The script name is defined according to the ISO 15924 standard.
Fields and structure
id ayat [tab] Bahasa [tab] Script name [tab] Nama pengguna [tab] Transcription