Quick start

Cluster segments during CLDF export

When building a CLDF forms.csv, cluster segments for alignment columns:

from loanpy import Cluster

segments = "f l a".split()
cv = ["C", "C", "V"]
clusters = Cluster.cv(segments, cv)
# ['f.l', 'a']

Mine sound correspondences

From cognate rows that alternate descendant / ancestor languages:

import csv
from loanpy import get_sound_correspondences

with open("cognates.csv", encoding="utf-8") as f:
    rows = list(csv.DictReader(f))
stats = get_sound_correspondences(rows, aligned_col="Uralign", sep=" < ")
scorer = stats["AbsoluteFrequency"]

Score an alignment

from loanpy import Uralign

seq_d = ["ɟ", "ŋ"]
seq_a = ["j", "ŋ"]
alm_d, alm_a = Uralign.hu(seq_d.copy(), seq_a.copy(), "C", "C")
score = Uralign.get_score(alm_d, alm_a, scorer, freq_filter=2)

Adapt donor segments

from loanpy import Adapt

ad = Adapt()
ad.get_substitutions(donor_set, recipient_set, distance_fn, extra={})
adapted = ad.substitute(donor_segments)
repaired = ad.repair(adapted, cv_profile, phonotactic_templates)

Typical integrations

  • CLDF conversioncldfbench dataset modules call Cluster and Uralign when writing segmentation and alignment columns.

  • Loanword detection pipelines — analysis scripts combine Adapt, Uralign, and get_sound_correspondences over wordlist/cognate tables.

These patterns apply to any paired descendant–ancestor data; language names and file layouts are project-specific.