sona Linku: Toki Pona dataset

This dataset provides information on Toki Pona words, Sitelen Pona glyphs and fonts, and LPSL signs. In use by dictionaries linku.la, nimi.li, the wiki about Toki Pona, the Linku discord bot, and more.

Contributing

This is a collaborative project brought to you by the Toki Pona community.

Your help is welcome! Feel free to submit pull requests if you find anything that needs improvement.

Want to help but you're new to Github? We can help!

You can also join the Discord and talk to the maintainers!

Details

but steps for everyone, including pre-reqs of validation which will only produce changes if the inputs are changed, and will have static outputs otherwise

in ./api, pnpm run generate (makes json schemas from zod schemas)
in ./, pnpm dlx @taplo/cli check (type checks all toml files using aforementioned json schemas)
in ./, python ./.github/workflows/validate_refs.py (checks that references between types are valid)
in ./, python ./.github/workflows/package_data.py (copies all toml data to json)
in ./api, bash ../.github/workflows/validate_generated_json.sh (checks typing of all json files)

cd api && pnpm run generate && cd .. && pnpm dlx @taplo/cli check && python ./.github/workflows/validate_refs.py && python ./.github/workflows/package_data.py && cd api && bash ../.github/workflows/validate_gene rated_json.sh

and separately there is

in ./, python ./.github/workflows/upsync_translations.py (synchronize translation files of all types with their source; adds missing keys, overwrites empty keys, deletes spare keys in the translation)

and then helpers:

in ./, python ./.github/workflows/update_schemas.py (correct the schema line of any type-checked tomls)

Changes in API from v1 to v2

General

New endpoint /v2/glyphs which serves sitelen pona glyph metadata
New endpoint /v2/sandbox/glyphs which serves sitelen pona sandbox glyph metadata
New endpoint /v2/sandbox/words which serves toki pona word metadata, previously /v1/sandbox
Languages may only be served one at a time via a single lang URL parameter

/v1/words -> /v2/words and /v1/sandbox -> /v2/sandbox/words

Translation strings are now under translations -> [field] rather than translations -> [langcode] -> [field]
- Key translations contains definition, etymology, commentary
- Key sp_etymology moved to corresponding sitelen pona glyph
- Root key etymology and translation key etymology merged into single translatable field etymlogy under translations
Key author_verbatim_source renamed author_source
Key coined_year renamed creation_date, and can be accurate to the day if known
New keys image and svg which link to an image of the word's primary sitelen pona
New key glyph_ids which refers to all glyphs above Sandbox that may represent this word
New key primary_glyph_id which refers to the glyph in Linku most used to write this word, such as akesi-2 for akesi
New key parent_id which is the ID of a synonymous word, generally from the lesser to the more popular word; may optionally be used to merge or hide entries
Referential fields such as see_also can no longer refer to sandbox data

/v2/glyphs and /v2/sandbox/glyphs

Key word refers to the latin script word this glyph writes
Key word_id refers to a corresponding word in Linku by its id, which is often (but not always) the same as word
Key usage_category functions as in /v1/words and /v2/words
- Though there are a handful of overrides, from sandbox to obscure or vice versa, depending on the parent word
Key primary indicates whether a glyph is primarily used to write its word; must match that word's primary_glyph_id field
Key deprecated indicates whether a glyph is considered deprecated by its creator
Key usage functions as in /v1/words and /v2/words
Key translations contains etymology, commentary, and an array names

Changes in repo from v1 to v2

Metadata

api/raw is now split into v1 and v2, which have the respective packaged data from Linku taken from words/, luka_pona/, fonts/, languages/, and glyphs/ for v2.
sandbox is now nested as sandbox/words and sandbox/glyphs
languages is now split among all languages rather than having a single file, for parity with other types

Types

src/lib is now split into v1 and v2, which have the respective type definitions for each API.
api/generated is now split into v1 and v2, which have the respective JSON schema type definitions for each API created from the type definitions in src/lib.

Supporting Scripts

New and updated scripts all in ./.github/workflows/
update_schemas.py: Update the schemas of every outstanding toml data file so they can be properly checked by taplo
upsync_translations.py: Sync keys from a source file to all translation files, overwriting empty keys and removing spare keys in the destination
validate_refs.py: Check referential data in all data files to confirm correctness (e.g. main data sources do not refer to sandbox)
fetch_langs.py: Now fetches individual language files

Other

We are no longer updating data for v1 of the Linku API, because the TOML files that make up our "database" have changed in an incompatible way.
We are also no longer updating our type definitions for v1 of the Linku API. These are still possible to update, though there is no longer any reason to. This means src/lib/v1 and api/generated/v1 will be static from now on.

Overview

sona is a collaborative, open dataset for and by the toki pona community. It is the successor of jasima, and aims to replace it. If you are looking for the data from jasima, see here. Note it is no longer being updated.

Directories

Editable

words is word data that does not require translation, such as year of creation, author, and ku data.
- [word].toml: Each specific word file, identified by name and an optional number for later coinings of the same name.
lukapona is sign data that does not require translation, such as glosses, signwriting, and reference videos.
- [gloss].toml: Each specific sign file, identified by its gloss.
source: The original English for all translatable data.
- definitions.toml: Word definitions.
- commentary.toml: Relevant context and nuance about a word.
- etymology.toml: The source word or words, their languages, and their definitions for a given toki pona word.
- sp_etymology.toml: The source symbol or symbols for a given sitelen pona glyph.
- lukapona_icons.toml: A description of a given sign by what it represents.
- lukapona_parameters.toml: A set of descriptions for how to form a given sign. Note: handshape is here but does not translate.
schemas
- src
  - index.ts: Static descriptions and validators for each type of data in this repo.
  - utils.ts: Commonly used functions in index.ts.

Automated

translations: Translated fields from source, automatically sent from Crowdin.
- [langcode]: Each langcode directory has the same files as source.
raw: All data from all toml files assembled into a JSON blob.
schemas
- generated: Generated descriptions of the expected format of each TOML file in the repo.

Contributing

Translating

Please visit our Crowdin project to contribute translations.

Dictionary Data and Fonts

Contributing other kinds of metadata is simple:

To add new fonts to sona, please fork the repo, edit the fonts.toml file, and submit a pull request. Examples of various existing fonts can be found in the file.

To edit information about Toki Pona words or Luka Pona signs, please fork the repo, edit the words.toml file, and submit a pull request

License

sona Linku is licensed under CC-BY-SA-4.0.

Name		Name	Last commit message	Last commit date
Latest commit History 3,415 Commits
.github/workflows		.github/workflows
.vscode		.vscode
api		api
fonts/metadata		fonts/metadata
glyphs		glyphs
languages/metadata		languages/metadata
luka_pona		luka_pona
sandbox		sandbox
words		words
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
crowdin.yml		crowdin.yml
renovate.json		renovate.json
taplo.toml		taplo.toml
update.py		update.py
worst.py		worst.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sona Linku: Toki Pona dataset

Contributing

Details

Changes in API from v1 to v2

General

/v1/words -> /v2/words and /v1/sandbox -> /v2/sandbox/words

/v2/glyphs and /v2/sandbox/glyphs

Changes in repo from v1 to v2

Metadata

Types

Supporting Scripts

Other

Overview

Directories

Editable

Automated

Contributing

Translating

Dictionary Data and Fonts

License

About

Uh oh!

Releases 26

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

sona Linku: Toki Pona dataset

Contributing

Details

Changes in API from v1 to v2

General

/v1/words -> /v2/words and /v1/sandbox -> /v2/sandbox/words

/v2/glyphs and /v2/sandbox/glyphs

Changes in repo from v1 to v2

Metadata

Types

Supporting Scripts

Other

Overview

Directories

Editable

Automated

Contributing

Translating

Dictionary Data and Fonts

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 26

Uh oh!

Contributors

Uh oh!

Languages