This dataset provides information on Toki Pona words, Sitelen Pona glyphs and fonts, and LPSL signs. In use by dictionaries linku.la, nimi.li, the wiki about Toki Pona, the Linku discord bot, and more.
This is a collaborative project brought to you by the Toki Pona community.
Your help is welcome! Feel free to submit pull requests if you find anything that needs improvement.
Want to help but you're new to Github? We can help!
You can also join the Discord and talk to the maintainers!
but steps for everyone, including pre-reqs of validation which will only produce changes if the inputs are changed, and will have static outputs otherwise
- in
./api,pnpm run generate(makes json schemas from zod schemas) - in
./,pnpm dlx @taplo/cli check(type checks all toml files using aforementioned json schemas) - in
./,python ./.github/workflows/validate_refs.py(checks that references between types are valid) - in
./,python ./.github/workflows/package_data.py(copies all toml data to json) - in
./api,bash ../.github/workflows/validate_generated_json.sh(checks typing of all json files)
cd api && pnpm run generate && cd .. && pnpm dlx @taplo/cli check && python ./.github/workflows/validate_refs.py && python ./.github/workflows/package_data.py && cd api && bash ../.github/workflows/validate_gene rated_json.sh
and separately there is
- in
./,python ./.github/workflows/upsync_translations.py(synchronize translation files of all types with their source; adds missing keys, overwrites empty keys, deletes spare keys in the translation)
and then helpers:
- in
./,python ./.github/workflows/update_schemas.py(correct the schema line of any type-checked tomls)
- New endpoint
/v2/glyphswhich serves sitelen pona glyph metadata - New endpoint
/v2/sandbox/glyphswhich serves sitelen pona sandbox glyph metadata - New endpoint
/v2/sandbox/wordswhich serves toki pona word metadata, previously/v1/sandbox - Languages may only be served one at a time via a single
langURL parameter
-
Translation strings are now under
translations->[field]rather thantranslations->[langcode]->[field]- Key
translationscontainsdefinition,etymology,commentary - Key
sp_etymologymoved to corresponding sitelen pona glyph - Root key
etymologyand translation keyetymologymerged into single translatable fieldetymlogyundertranslations
- Key
-
Key
author_verbatim_sourcerenamedauthor_source -
Key
coined_yearrenamedcreation_date, and can be accurate to the day if known -
New keys
imageandsvgwhich link to an image of the word's primary sitelen pona -
New key
glyph_idswhich refers to all glyphs above Sandbox that may represent this word -
New key
primary_glyph_idwhich refers to the glyph in Linku most used to write this word, such asakesi-2forakesi -
New key
parent_idwhich is the ID of a synonymous word, generally from the lesser to the more popular word; may optionally be used to merge or hide entries -
Referential fields such as
see_alsocan no longer refer to sandbox data
- Key
wordrefers to the latin script word this glyph writes - Key
word_idrefers to a corresponding word in Linku by its id, which is often (but not always) the same asword - Key
usage_categoryfunctions as in/v1/wordsand/v2/words- Though there are a handful of overrides, from sandbox to obscure or vice versa, depending on the parent word
- Key
primaryindicates whether a glyph is primarily used to write its word; must match that word'sprimary_glyph_idfield - Key
deprecatedindicates whether a glyph is considered deprecated by its creator - Key
usagefunctions as in/v1/wordsand/v2/words - Key
translationscontainsetymology,commentary, and an arraynames
api/rawis now split intov1andv2, which have the respective packaged data from Linku taken fromwords/,luka_pona/,fonts/,languages/, andglyphs/forv2.sandboxis now nested assandbox/wordsandsandbox/glyphslanguagesis now split among all languages rather than having a single file, for parity with other types
src/libis now split intov1andv2, which have the respective type definitions for each API.api/generatedis now split intov1andv2, which have the respective JSON schema type definitions for each API created from the type definitions insrc/lib.
- New and updated scripts all in
./.github/workflows/ update_schemas.py: Update the schemas of every outstanding toml data file so they can be properly checked by taploupsync_translations.py: Sync keys from a source file to all translation files, overwriting empty keys and removing spare keys in the destinationvalidate_refs.py: Check referential data in all data files to confirm correctness (e.g. main data sources do not refer to sandbox)fetch_langs.py: Now fetches individual language files
- We are no longer updating data for
v1of the Linku API, because the TOML files that make up our "database" have changed in an incompatible way. - We are also no longer updating our type definitions for
v1of the Linku API. These are still possible to update, though there is no longer any reason to. This meanssrc/lib/v1andapi/generated/v1will be static from now on.
sona is a collaborative, open dataset for and by the toki pona community. It is the successor of jasima, and aims to replace it. If you are looking for the data from jasima, see here. Note it is no longer being updated.
wordsis word data that does not require translation, such as year of creation, author, and ku data.[word].toml: Each specific word file, identified by name and an optional number for later coinings of the same name.
lukaponais sign data that does not require translation, such as glosses, signwriting, and reference videos.[gloss].toml: Each specific sign file, identified by its gloss.
source: The original English for all translatable data.definitions.toml: Word definitions.commentary.toml: Relevant context and nuance about a word.etymology.toml: The source word or words, their languages, and their definitions for a given toki pona word.sp_etymology.toml: The source symbol or symbols for a given sitelen pona glyph.lukapona_icons.toml: A description of a given sign by what it represents.lukapona_parameters.toml: A set of descriptions for how to form a given sign. Note:handshapeis here but does not translate.
schemassrcindex.ts: Static descriptions and validators for each type of data in this repo.utils.ts: Commonly used functions inindex.ts.
translations: Translated fields fromsource, automatically sent from Crowdin.[langcode]: Each langcode directory has the same files assource.
raw: All data from all toml files assembled into a JSON blob.schemasgenerated: Generated descriptions of the expected format of each TOML file in the repo.
Please visit our Crowdin project to contribute translations.
Contributing other kinds of metadata is simple:
To add new fonts to sona, please fork the repo, edit the fonts.toml file, and submit a pull request. Examples of various existing fonts can be found in the file.
To edit information about Toki Pona words or Luka Pona signs, please fork the repo, edit the words.toml file, and submit a pull request
sona Linku is licensed under CC-BY-SA-4.0.