Package overview

Loanpy — linguistic toolkit for loanword detection and sound change.

Version 4 provides segmentation clustering, alignment scoring, sound-correspondence mining, edit-distance utilities, and adaptation (substitution plus phonotactic repair).

class loanpy.Adapt[source]

Bases: object

Map donor segments onto a recipient inventory and repair prosody.

Typical pipeline: learn substitutions from segment inventories, apply them to donor segments, then repair the CV profile against a phonotactic template list.

Examples

In a loanword-detection loop over two wordlists:

ad = Adapt()
ad.get_substitutions(donor_phonemes, recipient_phonemes, distance_fn, extra={})
adapted = ad.substitute(donor_segments)
repaired = ad.repair(adapted, cv_profile, phonotactic_templates)

Notes

Used in loanword-detection pipelines (e.g. Indo-Iranian–Hungarian make_results.py inside find_loanwords): donor segments are substituted toward a recipient inventory, optionally repaired to legal CV templates, then aligned and scored.

get_substitutions(donor_inventory: set[str], recipient_inventory: set[str], distance_func: Callable[[str, str], float], extra: dict[str, str]) None[source]

Learn one-to-one donor→recipient substitutions by minimum distance.

For each donor phoneme not in the recipient inventory, pick the recipient phoneme with smallest distance_func(donor, recipient). Merges with extra (manual overrides) into substitutions.

Parameters

donor_inventory, recipient_inventory:

Segment inventories (sets of phoneme symbols).

distance_func:

Callable returning a numeric distance (e.g. feature-based).

extra:

Fixed substitutions applied on top of learned ones.

repair(segments: list[str], cv_profile: list[str], phonotactic_inventory: list[str], extra_repair: dict[str, str] | None = None) list[str][source]

Align segments to the closest legal CV template via edit operations.

Parameters

segments:

Segment list (often after substitute()).

cv_profile:

Parallel C/V profile for segments.

phonotactic_inventory:

Allowed templates (see expand_phonotactics()).

extra_repair:

Optional map from joined CV strings to fixed templates, bypassing nearest-neighbour search.

Returns

list[str]

Segments after applying insert/delete/substitute operations implied by the CV-profile edit path (may include "C" / "V" placeholders).

Notes

make_results.py may post-process placeholder vowels/consonants for vowel harmony; loanpy only returns the structurally repaired sequence.

substitute(segments: list[str]) list[str][source]

Replace segments using substitutions (identity if unmapped).

Parameters

segments:

Donor segment list.

Returns

list[str]

Substituted segments.

class loanpy.Cluster[source]

Bases: object

Static helpers for segment clustering in CLDF pipelines.

Clustering reduces fine-grained segment lists to coarser units used in alignment and correspondence mining (e.g. f.l for consonant clusters, a.ʊ for vowel sequences).

Examples

Typical workflow during CLDF conversion:

segments = form.split()
cv = dataset.get_cv_profile(form)
clusters = Cluster.cv(segments, cv)
glides = Cluster.glides(segments, cv)

After pairwise alignment, gaps may be collapsed:

alm_a, alm_b = Cluster.gaps(alm_a, alm_b)

Notes

Used in CLDF conversion scripts (cldfbench_*.py) for datasets such as UEW-hu, SeimaTurbino-hu, UESz-year-origin, and WestOldTurkic, where clustered segments are written to forms.csv columns like Clusters or Cluster_cv.

static cv(segments: list[str], cv_profile: list[str]) list[str][source]

Join adjacent segments that share the same C/V class.

Parameters

segments:

IPA (or other) segments, one symbol per list element.

cv_profile:

Parallel list of "C" and "V" labels.

Returns

list[str]

Clustered segments joined with "." within each run of C or V.

Examples

>>> Cluster.cv(["f", "l", "a"], ["C", "C", "V"])
['f.l', 'a']
static gaps(seqA: list[str], seqB: list[str]) tuple[list[str], list[str]][source]

Collapse consecutive gaps on seqB into a single gap per position.

When two adjacent positions in seqB are gaps ("-"), the matching symbol in seqA is merged into the previous token. Trailing gaps may introduce a "+" marker in seqA.

Parameters

seqA, seqB:

Parallel aligned token lists.

Returns

tuple[list[str], list[str]]

Collapsed alignment pair.

Notes

Used in CLDF conversion for WestOldTurkic (Monogap alignments) after global pairwise alignment.

static glides(segments: list[str], cv_profile: list[str], cluster_between_vowels: tuple[str, ...] = ('ɣ', 'w', 'v', 'β', 'ð'), cluster_after_l: tuple[str, ...] = ('t͡ʃ', 'd')) list[str][source]

Cluster glides/liquids between vowels and selected consonants after l.

Parameters

segments, cv_profile:

Parallel segment and C/V lists (same length).

cluster_between_vowels:

Segments to attach to a preceding vowel cluster when sandwiched by vowels.

cluster_after_l:

Segments to attach when immediately following l.

Returns

list[str]

Further clustered segment list.

Raises

ValueError

If segments and cv_profile differ in length.

Notes

Used in CLDF conversion (e.g. UESz-year-origin Cluster_glide column, WestOldTurkic and koeblergothic Clusters). Default glide symbols include Gothic intervocalic β and ð.

class loanpy.Uralign[source]

Bases: object

Sequential alignment and scoring for etymological comparison.

The API is language-pair agnostic: method names such as hu reflect historical use (Hungarian vs. proto-Uralic) but accept any two segment lists with CV profiles.

Examples

In a loanword-detection pipeline, align donor and recipient segments then score against mined correspondences:

alm_d, alm_a = Uralign.hu(seg_d, seg_a, cv_d[0], cv_a[0])
score = Uralign.get_score(alm_d, alm_a, scorer, freq_filter=2)

Notes

  • CLDF conversionUralign.hu writes Uralign / Uralign_cluster columns in cognate tables (UEW-hu, SeimaTurbino-hu).

  • Quantitative analysis — loanword-detection pipelines (e.g. Indo-Iranian–Hungarian make_results.py) use Uralign.hu and Uralign.get_score with correspondence scorers from get_sound_correspondences().

static get_score(seqA: list[str], seqB: list[str], scorer: dict[tuple[str, str], float], freq_filter: int = 2) int[source]

Sum correspondence scores along an alignment.

For each aligned pair (a, b) the key (a, b) is looked up in scorer. Pairs below freq_filter incur a large penalty.

Parameters

seqA, seqB:

Parallel aligned token lists.

scorer:

Mapping from correspondence keys to weights (often absolute frequencies from get_sound_correspondences()).

freq_filter:

Minimum score for a pair to count positively.

Returns

int

Aggregate alignment score.

Notes

Used in make_results.py together with scores from get_sound_correspondences().

static hu(seqHU: list[str], seqPU: list[str], seqHU_cv0: str, seqPU_cv0: str, initial_gap: bool = True, final_gap: bool = True) tuple[list[str], list[str]][source]

Align two segment sequences with optional initial and final gap rules.

Parameters

seqHU, seqPU:

Segment lists (modified in place when gaps are inserted).

seqHU_cv0, seqPU_cv0:

Word-initial C/V labels for gap decisions.

initial_gap:

If True and the descendant begins with a vowel, prepend #- / - markers.

final_gap:

If True, pad or cluster the longer sequence at the word edge.

Returns

tuple[list[str], list[str]]

Aligned segment pair.

Notes

Used in CLDF conversion and in make_results.py (loanword scoring).

loanpy.add_separator(correspondences: dict[str, dict], sep: str = ' < ') dict[str, dict][source]

Return a copy of correspondences with tuple pair keys as "a < b" strings.

Use when writing TOML (string keys only). In-memory scorers from get_sound_correspondences() use (descendant, ancestor) tuple keys.

loanpy.apply_edit(word: Iterable[str], editops: list[str]) list[str][source]

Apply human-readable edit operations to a sequence of segments.

Operations are strings such as "keep a", "delete b", "insert x", or "substitute a by x" (see path_to_edit_operations()).

Parameters

word:

Input segments (characters or phoneme symbols).

editops:

Operations produced by path_to_edit_operations().

Returns

list[str]

Transformed segment list.

Notes

Called by repair() when aligning a word’s CV profile to the closest legal template.

loanpy.edit_distance_matrix(target: Iterable, source: Iterable) list[list[int]][source]

Build the minimum edit-distance matrix (insert/delete cost 1 each).

Both sequences are prefixed with #. Matching symbols cost 0 on the diagonal; mismatches use unit-cost insertions or deletions.

Parameters

target, source:

Segment sequences to align (e.g. CV profiles).

Returns

list[list[int]]

Dynamic-programming table.

Notes

Used by repair() together with shortest_edit_path() and path_to_edit_operations().

loanpy.edit_distance_with2ops(string1: str, string2: str, w_del: int | float = 1, w_ins: int | float = 1) int | float[source]

Edit distance using only insertions and deletions (no substitutions).

The cost is (len(string1) - LCS) * w_del + (len(string2) - LCS) * w_ins, where LCS is the length of the longest common subsequence.

Parameters

string1, string2:

Comparable strings (often CV profiles or orthographic forms).

w_del, w_ins:

Costs for unmatched symbols in string1 and string2.

Returns

int or float

Weighted edit distance.

Examples

>>> edit_distance_with2ops("CVC", "CVCCV", w_ins=100)
200

See also

get_closest_phonotactics() — picks a template by minimising this distance.

Notes

Used indirectly by Adapt via phonotactic repair (get_closest_phonotactics()).

loanpy.expand_phonotactics(formula: str) list[str][source]

Expand a prosodic formula into space-separated CV templates.

(C) marks an optional consonant slot; bare C and V are required. Syllables are joined with +.

Parameters

formula:

Prosodic string, e.g. "(C)V(C)+CV(C)+CV".

Returns

list[str]

All expanded templates with spaces between C/V symbols.

Examples

>>> templates = expand_phonotactics("(C)V+CV")
>>> "V C V" in templates
True

Notes

Useful when building a recipient-language phonotactic inventory from a compact reconstruction (e.g. proto-language syllable structure in a paper). Inventories built this way feed get_closest_phonotactics() and Adapt.

loanpy.get_closest_phonotactics(cv_profile: list[str], phonotactic_inventory: list[str]) str[source]

Pick the inventory template closest to a CV profile (insert/delete only).

Parameters

cv_profile:

List of "C" / "V" symbols for one word.

phonotactic_inventory:

Legal templates (spaces allowed, e.g. "C V C V").

Returns

str

Best-matching template without spaces (e.g. "CVCV").

Notes

Called by repair(). Insertions are penalised heavily (w_ins=100) so that expanding a profile prefers extra consonant slots over spurious vowels.

loanpy.get_sound_correspondences(table: Sequence[Mapping[str, str]], aligned_col: str, prefix_descendant: str = '', prefix_ancestor: str = '') dict[str, dict][source]

Extract segment correspondences from paired cognate alignment rows.

Expects table to list cognate rows in descendant, ancestor, descendant, ancestor, … order (same convention as many CLDF cognates.csv exports). Each consecutive pair of rows is zipped segment-wise along aligned_col.

Parameters

table:

Sequence of row dicts (e.g. from csv.DictReader).

aligned_col:

Column with space-separated aligned segments (e.g. "Uralign").

prefix_descendant, prefix_ancestor:

Optional prefixes prepended to segment tokens in pair keys and examples.

Returns

dict

Keys:

  • SoundCorrespondences — descendant segment → ranked ancestor segments

  • AbsoluteFrequency(desc, anc) → count

  • Cognateset_IDs(desc, anc) → cognate set ids

  • Examples(desc, anc) → example alignment strings

Examples

Build a frequency table for alignment scoring:

rows = list(csv.DictReader(open("cognates.csv", encoding="utf-8")))
stats = get_sound_correspondences(rows, "Uralign")
scorer = stats["AbsoluteFrequency"]

Notes

  • Quantitative analysismake_results.py in the Indo-Iranian–Hungarian study calls this on CLDF cognate tables to build TOML scorers and in-memory weights for Uralign.

  • CLDF workflows — training data from any wordlist with alternating descendant/ancestor rows and an alignment column can be passed in; no hard-coded language names are required.

loanpy.path_to_edit_operations(op_list: list[tuple[int, int]], s1: str, s2: str) list[str][source]

Convert matrix path coordinates to human-readable edit operations.

Parameters

op_list:

Path from shortest_edit_path() (grid coordinates).

s1, s2:

Target and source strings (CV profiles without spaces).

Returns

list[str]

Operations understood by apply_edit().

loanpy.shortest_edit_path(mtx: list[list[int]]) list[tuple[int, int]] | None[source]

Find a lowest-cost edit path through a distance matrix.

Moves are right, down, or diagonal when the matrix value is unchanged on the diagonal step.

Parameters

mtx:

Table from edit_distance_matrix().

Returns

list[tuple[int, int]] or None

Coordinate path from (0, 0) to the bottom-right corner, or None if no path exists.

loanpy.substitute_operations(operations: list[str]) list[str][source]

Merge adjacent delete/insert pairs into substitute operations (in place).

Parameters

operations:

List of operation strings; modified in place and also returned.

Returns

list[str]

The same list, with merged substitute by operations where possible.

Phoneme clustering: CV grouping, glide clustering, and gap collapsing.

class loanpy.cluster.Cluster[source]

Bases: object

Static helpers for segment clustering in CLDF pipelines.

Clustering reduces fine-grained segment lists to coarser units used in alignment and correspondence mining (e.g. f.l for consonant clusters, a.ʊ for vowel sequences).

Examples

Typical workflow during CLDF conversion:

segments = form.split()
cv = dataset.get_cv_profile(form)
clusters = Cluster.cv(segments, cv)
glides = Cluster.glides(segments, cv)

After pairwise alignment, gaps may be collapsed:

alm_a, alm_b = Cluster.gaps(alm_a, alm_b)

Notes

Used in CLDF conversion scripts (cldfbench_*.py) for datasets such as UEW-hu, SeimaTurbino-hu, UESz-year-origin, and WestOldTurkic, where clustered segments are written to forms.csv columns like Clusters or Cluster_cv.

static cv(segments: list[str], cv_profile: list[str]) list[str][source]

Join adjacent segments that share the same C/V class.

Parameters

segments:

IPA (or other) segments, one symbol per list element.

cv_profile:

Parallel list of "C" and "V" labels.

Returns

list[str]

Clustered segments joined with "." within each run of C or V.

Examples

>>> Cluster.cv(["f", "l", "a"], ["C", "C", "V"])
['f.l', 'a']
static gaps(seqA: list[str], seqB: list[str]) tuple[list[str], list[str]][source]

Collapse consecutive gaps on seqB into a single gap per position.

When two adjacent positions in seqB are gaps ("-"), the matching symbol in seqA is merged into the previous token. Trailing gaps may introduce a "+" marker in seqA.

Parameters

seqA, seqB:

Parallel aligned token lists.

Returns

tuple[list[str], list[str]]

Collapsed alignment pair.

Notes

Used in CLDF conversion for WestOldTurkic (Monogap alignments) after global pairwise alignment.

static glides(segments: list[str], cv_profile: list[str], cluster_between_vowels: tuple[str, ...] = ('ɣ', 'w', 'v', 'β', 'ð'), cluster_after_l: tuple[str, ...] = ('t͡ʃ', 'd')) list[str][source]

Cluster glides/liquids between vowels and selected consonants after l.

Parameters

segments, cv_profile:

Parallel segment and C/V lists (same length).

cluster_between_vowels:

Segments to attach to a preceding vowel cluster when sandwiched by vowels.

cluster_after_l:

Segments to attach when immediately following l.

Returns

list[str]

Further clustered segment list.

Raises

ValueError

If segments and cv_profile differ in length.

Notes

Used in CLDF conversion (e.g. UESz-year-origin Cluster_glide column, WestOldTurkic and koeblergothic Clusters). Default glide symbols include Gothic intervocalic β and ð.

Descendant–ancestor alignment and correspondence-based scoring.

class loanpy.uralign.Uralign[source]

Bases: object

Sequential alignment and scoring for etymological comparison.

The API is language-pair agnostic: method names such as hu reflect historical use (Hungarian vs. proto-Uralic) but accept any two segment lists with CV profiles.

Examples

In a loanword-detection pipeline, align donor and recipient segments then score against mined correspondences:

alm_d, alm_a = Uralign.hu(seg_d, seg_a, cv_d[0], cv_a[0])
score = Uralign.get_score(alm_d, alm_a, scorer, freq_filter=2)

Notes

  • CLDF conversionUralign.hu writes Uralign / Uralign_cluster columns in cognate tables (UEW-hu, SeimaTurbino-hu).

  • Quantitative analysis — loanword-detection pipelines (e.g. Indo-Iranian–Hungarian make_results.py) use Uralign.hu and Uralign.get_score with correspondence scorers from get_sound_correspondences().

static get_score(seqA: list[str], seqB: list[str], scorer: dict[tuple[str, str], float], freq_filter: int = 2) int[source]

Sum correspondence scores along an alignment.

For each aligned pair (a, b) the key (a, b) is looked up in scorer. Pairs below freq_filter incur a large penalty.

Parameters

seqA, seqB:

Parallel aligned token lists.

scorer:

Mapping from correspondence keys to weights (often absolute frequencies from get_sound_correspondences()).

freq_filter:

Minimum score for a pair to count positively.

Returns

int

Aggregate alignment score.

Notes

Used in make_results.py together with scores from get_sound_correspondences().

static hu(seqHU: list[str], seqPU: list[str], seqHU_cv0: str, seqPU_cv0: str, initial_gap: bool = True, final_gap: bool = True) tuple[list[str], list[str]][source]

Align two segment sequences with optional initial and final gap rules.

Parameters

seqHU, seqPU:

Segment lists (modified in place when gaps are inserted).

seqHU_cv0, seqPU_cv0:

Word-initial C/V labels for gap decisions.

initial_gap:

If True and the descendant begins with a vowel, prepend #- / - markers.

final_gap:

If True, pad or cluster the longer sequence at the word edge.

Returns

tuple[list[str], list[str]]

Aligned segment pair.

Notes

Used in CLDF conversion and in make_results.py (loanword scoring).

Loanword adaptation: substitution and phonotactic repair.

class loanpy.adapt.Adapt[source]

Bases: object

Map donor segments onto a recipient inventory and repair prosody.

Typical pipeline: learn substitutions from segment inventories, apply them to donor segments, then repair the CV profile against a phonotactic template list.

Examples

In a loanword-detection loop over two wordlists:

ad = Adapt()
ad.get_substitutions(donor_phonemes, recipient_phonemes, distance_fn, extra={})
adapted = ad.substitute(donor_segments)
repaired = ad.repair(adapted, cv_profile, phonotactic_templates)

Notes

Used in loanword-detection pipelines (e.g. Indo-Iranian–Hungarian make_results.py inside find_loanwords): donor segments are substituted toward a recipient inventory, optionally repaired to legal CV templates, then aligned and scored.

get_substitutions(donor_inventory: set[str], recipient_inventory: set[str], distance_func: Callable[[str, str], float], extra: dict[str, str]) None[source]

Learn one-to-one donor→recipient substitutions by minimum distance.

For each donor phoneme not in the recipient inventory, pick the recipient phoneme with smallest distance_func(donor, recipient). Merges with extra (manual overrides) into substitutions.

Parameters

donor_inventory, recipient_inventory:

Segment inventories (sets of phoneme symbols).

distance_func:

Callable returning a numeric distance (e.g. feature-based).

extra:

Fixed substitutions applied on top of learned ones.

repair(segments: list[str], cv_profile: list[str], phonotactic_inventory: list[str], extra_repair: dict[str, str] | None = None) list[str][source]

Align segments to the closest legal CV template via edit operations.

Parameters

segments:

Segment list (often after substitute()).

cv_profile:

Parallel C/V profile for segments.

phonotactic_inventory:

Allowed templates (see expand_phonotactics()).

extra_repair:

Optional map from joined CV strings to fixed templates, bypassing nearest-neighbour search.

Returns

list[str]

Segments after applying insert/delete/substitute operations implied by the CV-profile edit path (may include "C" / "V" placeholders).

Notes

make_results.py may post-process placeholder vowels/consonants for vowel harmony; loanpy only returns the structurally repaired sequence.

substitute(segments: list[str]) list[str][source]

Replace segments using substitutions (identity if unmapped).

Parameters

segments:

Donor segment list.

Returns

list[str]

Substituted segments.

Sound correspondences from aligned cognate tables.

loanpy.correspondences.add_separator(correspondences: dict[str, dict], sep: str = ' < ') dict[str, dict][source]

Return a copy of correspondences with tuple pair keys as "a < b" strings.

Use when writing TOML (string keys only). In-memory scorers from get_sound_correspondences() use (descendant, ancestor) tuple keys.

loanpy.correspondences.get_sound_correspondences(table: Sequence[Mapping[str, str]], aligned_col: str, prefix_descendant: str = '', prefix_ancestor: str = '') dict[str, dict][source]

Extract segment correspondences from paired cognate alignment rows.

Expects table to list cognate rows in descendant, ancestor, descendant, ancestor, … order (same convention as many CLDF cognates.csv exports). Each consecutive pair of rows is zipped segment-wise along aligned_col.

Parameters

table:

Sequence of row dicts (e.g. from csv.DictReader).

aligned_col:

Column with space-separated aligned segments (e.g. "Uralign").

prefix_descendant, prefix_ancestor:

Optional prefixes prepended to segment tokens in pair keys and examples.

Returns

dict

Keys:

  • SoundCorrespondences — descendant segment → ranked ancestor segments

  • AbsoluteFrequency(desc, anc) → count

  • Cognateset_IDs(desc, anc) → cognate set ids

  • Examples(desc, anc) → example alignment strings

Examples

Build a frequency table for alignment scoring:

rows = list(csv.DictReader(open("cognates.csv", encoding="utf-8")))
stats = get_sound_correspondences(rows, "Uralign")
scorer = stats["AbsoluteFrequency"]

Notes

  • Quantitative analysismake_results.py in the Indo-Iranian–Hungarian study calls this on CLDF cognate tables to build TOML scorers and in-memory weights for Uralign.

  • CLDF workflows — training data from any wordlist with alternating descendant/ancestor rows and an alignment column can be passed in; no hard-coded language names are required.

Sequence edit distance and edit-operation utilities.

loanpy.edit.apply_edit(word: Iterable[str], editops: list[str]) list[str][source]

Apply human-readable edit operations to a sequence of segments.

Operations are strings such as "keep a", "delete b", "insert x", or "substitute a by x" (see path_to_edit_operations()).

Parameters

word:

Input segments (characters or phoneme symbols).

editops:

Operations produced by path_to_edit_operations().

Returns

list[str]

Transformed segment list.

Notes

Called by repair() when aligning a word’s CV profile to the closest legal template.

loanpy.edit.edit_distance_matrix(target: Iterable, source: Iterable) list[list[int]][source]

Build the minimum edit-distance matrix (insert/delete cost 1 each).

Both sequences are prefixed with #. Matching symbols cost 0 on the diagonal; mismatches use unit-cost insertions or deletions.

Parameters

target, source:

Segment sequences to align (e.g. CV profiles).

Returns

list[list[int]]

Dynamic-programming table.

Notes

Used by repair() together with shortest_edit_path() and path_to_edit_operations().

loanpy.edit.edit_distance_with2ops(string1: str, string2: str, w_del: int | float = 1, w_ins: int | float = 1) int | float[source]

Edit distance using only insertions and deletions (no substitutions).

The cost is (len(string1) - LCS) * w_del + (len(string2) - LCS) * w_ins, where LCS is the length of the longest common subsequence.

Parameters

string1, string2:

Comparable strings (often CV profiles or orthographic forms).

w_del, w_ins:

Costs for unmatched symbols in string1 and string2.

Returns

int or float

Weighted edit distance.

Examples

>>> edit_distance_with2ops("CVC", "CVCCV", w_ins=100)
200

See also

get_closest_phonotactics() — picks a template by minimising this distance.

Notes

Used indirectly by Adapt via phonotactic repair (get_closest_phonotactics()).

loanpy.edit.path_to_edit_operations(op_list: list[tuple[int, int]], s1: str, s2: str) list[str][source]

Convert matrix path coordinates to human-readable edit operations.

Parameters

op_list:

Path from shortest_edit_path() (grid coordinates).

s1, s2:

Target and source strings (CV profiles without spaces).

Returns

list[str]

Operations understood by apply_edit().

loanpy.edit.shortest_edit_path(mtx: list[list[int]]) list[tuple[int, int]] | None[source]

Find a lowest-cost edit path through a distance matrix.

Moves are right, down, or diagonal when the matrix value is unchanged on the diagonal step.

Parameters

mtx:

Table from edit_distance_matrix().

Returns

list[tuple[int, int]] or None

Coordinate path from (0, 0) to the bottom-right corner, or None if no path exists.

loanpy.edit.substitute_operations(operations: list[str]) list[str][source]

Merge adjacent delete/insert pairs into substitute operations (in place).

Parameters

operations:

List of operation strings; modified in place and also returned.

Returns

list[str]

The same list, with merged substitute by operations where possible.

Prosodic template expansion and closest-template matching.

loanpy.phonotactics.expand_phonotactics(formula: str) list[str][source]

Expand a prosodic formula into space-separated CV templates.

(C) marks an optional consonant slot; bare C and V are required. Syllables are joined with +.

Parameters

formula:

Prosodic string, e.g. "(C)V(C)+CV(C)+CV".

Returns

list[str]

All expanded templates with spaces between C/V symbols.

Examples

>>> templates = expand_phonotactics("(C)V+CV")
>>> "V C V" in templates
True

Notes

Useful when building a recipient-language phonotactic inventory from a compact reconstruction (e.g. proto-language syllable structure in a paper). Inventories built this way feed get_closest_phonotactics() and Adapt.

loanpy.phonotactics.get_closest_phonotactics(cv_profile: list[str], phonotactic_inventory: list[str]) str[source]

Pick the inventory template closest to a CV profile (insert/delete only).

Parameters

cv_profile:

List of "C" / "V" symbols for one word.

phonotactic_inventory:

Legal templates (spaces allowed, e.g. "C V C V").

Returns

str

Best-matching template without spaces (e.g. "CVCV").

Notes

Called by repair(). Insertions are penalised heavily (w_ins=100) so that expanding a profile prefers extra consonant slots over spurious vowels.