Package overview
Loanpy — linguistic toolkit for loanword detection and sound change.
Version 4 provides segmentation clustering, alignment scoring, sound-correspondence mining, edit-distance utilities, and adaptation (substitution plus phonotactic repair).
- class loanpy.Adapt[source]
Bases:
objectMap donor segments onto a recipient inventory and repair prosody.
Typical pipeline: learn substitutions from segment inventories, apply them to donor segments, then repair the CV profile against a phonotactic template list.
Examples
In a loanword-detection loop over two wordlists:
ad = Adapt() ad.get_substitutions(donor_phonemes, recipient_phonemes, distance_fn, extra={}) adapted = ad.substitute(donor_segments) repaired = ad.repair(adapted, cv_profile, phonotactic_templates)
Notes
Used in loanword-detection pipelines (e.g. Indo-Iranian–Hungarian
make_results.pyinsidefind_loanwords): donor segments are substituted toward a recipient inventory, optionally repaired to legal CV templates, then aligned and scored.- get_substitutions(donor_inventory: set[str], recipient_inventory: set[str], distance_func: Callable[[str, str], float], extra: dict[str, str]) None[source]
Learn one-to-one donor→recipient substitutions by minimum distance.
For each donor phoneme not in the recipient inventory, pick the recipient phoneme with smallest
distance_func(donor, recipient). Merges withextra(manual overrides) intosubstitutions.Parameters
- donor_inventory, recipient_inventory:
Segment inventories (sets of phoneme symbols).
- distance_func:
Callable returning a numeric distance (e.g. feature-based).
- extra:
Fixed substitutions applied on top of learned ones.
- repair(segments: list[str], cv_profile: list[str], phonotactic_inventory: list[str], extra_repair: dict[str, str] | None = None) list[str][source]
Align segments to the closest legal CV template via edit operations.
Parameters
- segments:
Segment list (often after
substitute()).- cv_profile:
Parallel C/V profile for
segments.- phonotactic_inventory:
Allowed templates (see
expand_phonotactics()).- extra_repair:
Optional map from joined CV strings to fixed templates, bypassing nearest-neighbour search.
Returns
- list[str]
Segments after applying insert/delete/substitute operations implied by the CV-profile edit path (may include
"C"/"V"placeholders).
Notes
make_results.py may post-process placeholder vowels/consonants for vowel harmony; loanpy only returns the structurally repaired sequence.
- class loanpy.Cluster[source]
Bases:
objectStatic helpers for segment clustering in CLDF pipelines.
Clustering reduces fine-grained segment lists to coarser units used in alignment and correspondence mining (e.g.
f.lfor consonant clusters,a.ʊfor vowel sequences).Examples
Typical workflow during CLDF conversion:
segments = form.split() cv = dataset.get_cv_profile(form) clusters = Cluster.cv(segments, cv) glides = Cluster.glides(segments, cv)
After pairwise alignment, gaps may be collapsed:
alm_a, alm_b = Cluster.gaps(alm_a, alm_b)
Notes
Used in CLDF conversion scripts (
cldfbench_*.py) for datasets such as UEW-hu, SeimaTurbino-hu, UESz-year-origin, and WestOldTurkic, where clustered segments are written toforms.csvcolumns likeClustersorCluster_cv.- static cv(segments: list[str], cv_profile: list[str]) list[str][source]
Join adjacent segments that share the same C/V class.
Parameters
- segments:
IPA (or other) segments, one symbol per list element.
- cv_profile:
Parallel list of
"C"and"V"labels.
Returns
- list[str]
Clustered segments joined with
"."within each run of C or V.
Examples
>>> Cluster.cv(["f", "l", "a"], ["C", "C", "V"]) ['f.l', 'a']
- static gaps(seqA: list[str], seqB: list[str]) tuple[list[str], list[str]][source]
Collapse consecutive gaps on
seqBinto a single gap per position.When two adjacent positions in
seqBare gaps ("-"), the matching symbol inseqAis merged into the previous token. Trailing gaps may introduce a"+"marker inseqA.Parameters
- seqA, seqB:
Parallel aligned token lists.
Returns
- tuple[list[str], list[str]]
Collapsed alignment pair.
Notes
Used in CLDF conversion for WestOldTurkic (
Monogapalignments) after global pairwise alignment.
- static glides(segments: list[str], cv_profile: list[str], cluster_between_vowels: tuple[str, ...] = ('ɣ', 'w', 'v', 'β', 'ð'), cluster_after_l: tuple[str, ...] = ('t͡ʃ', 'd')) list[str][source]
Cluster glides/liquids between vowels and selected consonants after
l.Parameters
- segments, cv_profile:
Parallel segment and C/V lists (same length).
- cluster_between_vowels:
Segments to attach to a preceding vowel cluster when sandwiched by vowels.
- cluster_after_l:
Segments to attach when immediately following
l.
Returns
- list[str]
Further clustered segment list.
Raises
- ValueError
If
segmentsandcv_profilediffer in length.
Notes
Used in CLDF conversion (e.g. UESz-year-origin
Cluster_glidecolumn, WestOldTurkic and koeblergothicClusters). Default glide symbols include Gothic intervocalicβandð.
- class loanpy.Uralign[source]
Bases:
objectSequential alignment and scoring for etymological comparison.
The API is language-pair agnostic: method names such as
hureflect historical use (Hungarian vs. proto-Uralic) but accept any two segment lists with CV profiles.Examples
In a loanword-detection pipeline, align donor and recipient segments then score against mined correspondences:
alm_d, alm_a = Uralign.hu(seg_d, seg_a, cv_d[0], cv_a[0]) score = Uralign.get_score(alm_d, alm_a, scorer, freq_filter=2)
Notes
CLDF conversion —
Uralign.huwritesUralign/Uralign_clustercolumns in cognate tables (UEW-hu, SeimaTurbino-hu).Quantitative analysis — loanword-detection pipelines (e.g. Indo-Iranian–Hungarian
make_results.py) useUralign.huandUralign.get_scorewith correspondence scorers fromget_sound_correspondences().
- static get_score(seqA: list[str], seqB: list[str], scorer: dict[tuple[str, str], float], freq_filter: int = 2) int[source]
Sum correspondence scores along an alignment.
For each aligned pair
(a, b)the key(a, b)is looked up inscorer. Pairs belowfreq_filterincur a large penalty.Parameters
- seqA, seqB:
Parallel aligned token lists.
- scorer:
Mapping from correspondence keys to weights (often absolute frequencies from
get_sound_correspondences()).- freq_filter:
Minimum score for a pair to count positively.
Returns
- int
Aggregate alignment score.
Notes
Used in make_results.py together with scores from
get_sound_correspondences().
- static hu(seqHU: list[str], seqPU: list[str], seqHU_cv0: str, seqPU_cv0: str, initial_gap: bool = True, final_gap: bool = True) tuple[list[str], list[str]][source]
Align two segment sequences with optional initial and final gap rules.
Parameters
- seqHU, seqPU:
Segment lists (modified in place when gaps are inserted).
- seqHU_cv0, seqPU_cv0:
Word-initial C/V labels for gap decisions.
- initial_gap:
If True and the descendant begins with a vowel, prepend
#-/-markers.- final_gap:
If True, pad or cluster the longer sequence at the word edge.
Returns
- tuple[list[str], list[str]]
Aligned segment pair.
Notes
Used in CLDF conversion and in make_results.py (loanword scoring).
- loanpy.add_separator(correspondences: dict[str, dict], sep: str = ' < ') dict[str, dict][source]
Return a copy of correspondences with tuple pair keys as
"a < b"strings.Use when writing TOML (string keys only). In-memory scorers from
get_sound_correspondences()use(descendant, ancestor)tuple keys.
- loanpy.apply_edit(word: Iterable[str], editops: list[str]) list[str][source]
Apply human-readable edit operations to a sequence of segments.
Operations are strings such as
"keep a","delete b","insert x", or"substitute a by x"(seepath_to_edit_operations()).Parameters
- word:
Input segments (characters or phoneme symbols).
- editops:
Operations produced by
path_to_edit_operations().
Returns
- list[str]
Transformed segment list.
Notes
Called by
repair()when aligning a word’s CV profile to the closest legal template.
- loanpy.edit_distance_matrix(target: Iterable, source: Iterable) list[list[int]][source]
Build the minimum edit-distance matrix (insert/delete cost 1 each).
Both sequences are prefixed with
#. Matching symbols cost 0 on the diagonal; mismatches use unit-cost insertions or deletions.Parameters
- target, source:
Segment sequences to align (e.g. CV profiles).
Returns
- list[list[int]]
Dynamic-programming table.
Notes
Used by
repair()together withshortest_edit_path()andpath_to_edit_operations().
- loanpy.edit_distance_with2ops(string1: str, string2: str, w_del: int | float = 1, w_ins: int | float = 1) int | float[source]
Edit distance using only insertions and deletions (no substitutions).
The cost is
(len(string1) - LCS) * w_del + (len(string2) - LCS) * w_ins, where LCS is the length of the longest common subsequence.Parameters
- string1, string2:
Comparable strings (often CV profiles or orthographic forms).
- w_del, w_ins:
Costs for unmatched symbols in
string1andstring2.
Returns
- int or float
Weighted edit distance.
Examples
>>> edit_distance_with2ops("CVC", "CVCCV", w_ins=100) 200
See also
get_closest_phonotactics()— picks a template by minimising this distance.Notes
Used indirectly by
Adaptvia phonotactic repair (get_closest_phonotactics()).
- loanpy.expand_phonotactics(formula: str) list[str][source]
Expand a prosodic formula into space-separated CV templates.
(C)marks an optional consonant slot; bareCandVare required. Syllables are joined with+.Parameters
- formula:
Prosodic string, e.g.
"(C)V(C)+CV(C)+CV".
Returns
- list[str]
All expanded templates with spaces between C/V symbols.
Examples
>>> templates = expand_phonotactics("(C)V+CV") >>> "V C V" in templates True
Notes
Useful when building a recipient-language phonotactic inventory from a compact reconstruction (e.g. proto-language syllable structure in a paper). Inventories built this way feed
get_closest_phonotactics()andAdapt.
- loanpy.get_closest_phonotactics(cv_profile: list[str], phonotactic_inventory: list[str]) str[source]
Pick the inventory template closest to a CV profile (insert/delete only).
Parameters
- cv_profile:
List of
"C"/"V"symbols for one word.- phonotactic_inventory:
Legal templates (spaces allowed, e.g.
"C V C V").
Returns
- str
Best-matching template without spaces (e.g.
"CVCV").
Notes
Called by
repair(). Insertions are penalised heavily (w_ins=100) so that expanding a profile prefers extra consonant slots over spurious vowels.
- loanpy.get_sound_correspondences(table: Sequence[Mapping[str, str]], aligned_col: str, prefix_descendant: str = '', prefix_ancestor: str = '') dict[str, dict][source]
Extract segment correspondences from paired cognate alignment rows.
Expects
tableto list cognate rows in descendant, ancestor, descendant, ancestor, … order (same convention as many CLDFcognates.csvexports). Each consecutive pair of rows is zipped segment-wise alongaligned_col.Parameters
- table:
Sequence of row dicts (e.g. from
csv.DictReader).- aligned_col:
Column with space-separated aligned segments (e.g.
"Uralign").- prefix_descendant, prefix_ancestor:
Optional prefixes prepended to segment tokens in pair keys and examples.
Returns
- dict
Keys:
SoundCorrespondences— descendant segment → ranked ancestor segmentsAbsoluteFrequency—(desc, anc)→ countCognateset_IDs—(desc, anc)→ cognate set idsExamples—(desc, anc)→ example alignment strings
Examples
Build a frequency table for alignment scoring:
rows = list(csv.DictReader(open("cognates.csv", encoding="utf-8"))) stats = get_sound_correspondences(rows, "Uralign") scorer = stats["AbsoluteFrequency"]
Notes
Quantitative analysis —
make_results.pyin the Indo-Iranian–Hungarian study calls this on CLDF cognate tables to build TOML scorers and in-memory weights forUralign.CLDF workflows — training data from any wordlist with alternating descendant/ancestor rows and an alignment column can be passed in; no hard-coded language names are required.
- loanpy.path_to_edit_operations(op_list: list[tuple[int, int]], s1: str, s2: str) list[str][source]
Convert matrix path coordinates to human-readable edit operations.
Parameters
- op_list:
Path from
shortest_edit_path()(grid coordinates).- s1, s2:
Target and source strings (CV profiles without spaces).
Returns
- list[str]
Operations understood by
apply_edit().
- loanpy.shortest_edit_path(mtx: list[list[int]]) list[tuple[int, int]] | None[source]
Find a lowest-cost edit path through a distance matrix.
Moves are right, down, or diagonal when the matrix value is unchanged on the diagonal step.
Parameters
- mtx:
Table from
edit_distance_matrix().
Returns
- list[tuple[int, int]] or None
Coordinate path from
(0, 0)to the bottom-right corner, orNoneif no path exists.
- loanpy.substitute_operations(operations: list[str]) list[str][source]
Merge adjacent delete/insert pairs into substitute operations (in place).
Parameters
- operations:
List of operation strings; modified in place and also returned.
Returns
- list[str]
The same list, with merged
substitute … by …operations where possible.
Phoneme clustering: CV grouping, glide clustering, and gap collapsing.
- class loanpy.cluster.Cluster[source]
Bases:
objectStatic helpers for segment clustering in CLDF pipelines.
Clustering reduces fine-grained segment lists to coarser units used in alignment and correspondence mining (e.g.
f.lfor consonant clusters,a.ʊfor vowel sequences).Examples
Typical workflow during CLDF conversion:
segments = form.split() cv = dataset.get_cv_profile(form) clusters = Cluster.cv(segments, cv) glides = Cluster.glides(segments, cv)
After pairwise alignment, gaps may be collapsed:
alm_a, alm_b = Cluster.gaps(alm_a, alm_b)
Notes
Used in CLDF conversion scripts (
cldfbench_*.py) for datasets such as UEW-hu, SeimaTurbino-hu, UESz-year-origin, and WestOldTurkic, where clustered segments are written toforms.csvcolumns likeClustersorCluster_cv.- static cv(segments: list[str], cv_profile: list[str]) list[str][source]
Join adjacent segments that share the same C/V class.
Parameters
- segments:
IPA (or other) segments, one symbol per list element.
- cv_profile:
Parallel list of
"C"and"V"labels.
Returns
- list[str]
Clustered segments joined with
"."within each run of C or V.
Examples
>>> Cluster.cv(["f", "l", "a"], ["C", "C", "V"]) ['f.l', 'a']
- static gaps(seqA: list[str], seqB: list[str]) tuple[list[str], list[str]][source]
Collapse consecutive gaps on
seqBinto a single gap per position.When two adjacent positions in
seqBare gaps ("-"), the matching symbol inseqAis merged into the previous token. Trailing gaps may introduce a"+"marker inseqA.Parameters
- seqA, seqB:
Parallel aligned token lists.
Returns
- tuple[list[str], list[str]]
Collapsed alignment pair.
Notes
Used in CLDF conversion for WestOldTurkic (
Monogapalignments) after global pairwise alignment.
- static glides(segments: list[str], cv_profile: list[str], cluster_between_vowels: tuple[str, ...] = ('ɣ', 'w', 'v', 'β', 'ð'), cluster_after_l: tuple[str, ...] = ('t͡ʃ', 'd')) list[str][source]
Cluster glides/liquids between vowels and selected consonants after
l.Parameters
- segments, cv_profile:
Parallel segment and C/V lists (same length).
- cluster_between_vowels:
Segments to attach to a preceding vowel cluster when sandwiched by vowels.
- cluster_after_l:
Segments to attach when immediately following
l.
Returns
- list[str]
Further clustered segment list.
Raises
- ValueError
If
segmentsandcv_profilediffer in length.
Notes
Used in CLDF conversion (e.g. UESz-year-origin
Cluster_glidecolumn, WestOldTurkic and koeblergothicClusters). Default glide symbols include Gothic intervocalicβandð.
Descendant–ancestor alignment and correspondence-based scoring.
- class loanpy.uralign.Uralign[source]
Bases:
objectSequential alignment and scoring for etymological comparison.
The API is language-pair agnostic: method names such as
hureflect historical use (Hungarian vs. proto-Uralic) but accept any two segment lists with CV profiles.Examples
In a loanword-detection pipeline, align donor and recipient segments then score against mined correspondences:
alm_d, alm_a = Uralign.hu(seg_d, seg_a, cv_d[0], cv_a[0]) score = Uralign.get_score(alm_d, alm_a, scorer, freq_filter=2)
Notes
CLDF conversion —
Uralign.huwritesUralign/Uralign_clustercolumns in cognate tables (UEW-hu, SeimaTurbino-hu).Quantitative analysis — loanword-detection pipelines (e.g. Indo-Iranian–Hungarian
make_results.py) useUralign.huandUralign.get_scorewith correspondence scorers fromget_sound_correspondences().
- static get_score(seqA: list[str], seqB: list[str], scorer: dict[tuple[str, str], float], freq_filter: int = 2) int[source]
Sum correspondence scores along an alignment.
For each aligned pair
(a, b)the key(a, b)is looked up inscorer. Pairs belowfreq_filterincur a large penalty.Parameters
- seqA, seqB:
Parallel aligned token lists.
- scorer:
Mapping from correspondence keys to weights (often absolute frequencies from
get_sound_correspondences()).- freq_filter:
Minimum score for a pair to count positively.
Returns
- int
Aggregate alignment score.
Notes
Used in make_results.py together with scores from
get_sound_correspondences().
- static hu(seqHU: list[str], seqPU: list[str], seqHU_cv0: str, seqPU_cv0: str, initial_gap: bool = True, final_gap: bool = True) tuple[list[str], list[str]][source]
Align two segment sequences with optional initial and final gap rules.
Parameters
- seqHU, seqPU:
Segment lists (modified in place when gaps are inserted).
- seqHU_cv0, seqPU_cv0:
Word-initial C/V labels for gap decisions.
- initial_gap:
If True and the descendant begins with a vowel, prepend
#-/-markers.- final_gap:
If True, pad or cluster the longer sequence at the word edge.
Returns
- tuple[list[str], list[str]]
Aligned segment pair.
Notes
Used in CLDF conversion and in make_results.py (loanword scoring).
Loanword adaptation: substitution and phonotactic repair.
- class loanpy.adapt.Adapt[source]
Bases:
objectMap donor segments onto a recipient inventory and repair prosody.
Typical pipeline: learn substitutions from segment inventories, apply them to donor segments, then repair the CV profile against a phonotactic template list.
Examples
In a loanword-detection loop over two wordlists:
ad = Adapt() ad.get_substitutions(donor_phonemes, recipient_phonemes, distance_fn, extra={}) adapted = ad.substitute(donor_segments) repaired = ad.repair(adapted, cv_profile, phonotactic_templates)
Notes
Used in loanword-detection pipelines (e.g. Indo-Iranian–Hungarian
make_results.pyinsidefind_loanwords): donor segments are substituted toward a recipient inventory, optionally repaired to legal CV templates, then aligned and scored.- get_substitutions(donor_inventory: set[str], recipient_inventory: set[str], distance_func: Callable[[str, str], float], extra: dict[str, str]) None[source]
Learn one-to-one donor→recipient substitutions by minimum distance.
For each donor phoneme not in the recipient inventory, pick the recipient phoneme with smallest
distance_func(donor, recipient). Merges withextra(manual overrides) intosubstitutions.Parameters
- donor_inventory, recipient_inventory:
Segment inventories (sets of phoneme symbols).
- distance_func:
Callable returning a numeric distance (e.g. feature-based).
- extra:
Fixed substitutions applied on top of learned ones.
- repair(segments: list[str], cv_profile: list[str], phonotactic_inventory: list[str], extra_repair: dict[str, str] | None = None) list[str][source]
Align segments to the closest legal CV template via edit operations.
Parameters
- segments:
Segment list (often after
substitute()).- cv_profile:
Parallel C/V profile for
segments.- phonotactic_inventory:
Allowed templates (see
expand_phonotactics()).- extra_repair:
Optional map from joined CV strings to fixed templates, bypassing nearest-neighbour search.
Returns
- list[str]
Segments after applying insert/delete/substitute operations implied by the CV-profile edit path (may include
"C"/"V"placeholders).
Notes
make_results.py may post-process placeholder vowels/consonants for vowel harmony; loanpy only returns the structurally repaired sequence.
Sound correspondences from aligned cognate tables.
- loanpy.correspondences.add_separator(correspondences: dict[str, dict], sep: str = ' < ') dict[str, dict][source]
Return a copy of correspondences with tuple pair keys as
"a < b"strings.Use when writing TOML (string keys only). In-memory scorers from
get_sound_correspondences()use(descendant, ancestor)tuple keys.
- loanpy.correspondences.get_sound_correspondences(table: Sequence[Mapping[str, str]], aligned_col: str, prefix_descendant: str = '', prefix_ancestor: str = '') dict[str, dict][source]
Extract segment correspondences from paired cognate alignment rows.
Expects
tableto list cognate rows in descendant, ancestor, descendant, ancestor, … order (same convention as many CLDFcognates.csvexports). Each consecutive pair of rows is zipped segment-wise alongaligned_col.Parameters
- table:
Sequence of row dicts (e.g. from
csv.DictReader).- aligned_col:
Column with space-separated aligned segments (e.g.
"Uralign").- prefix_descendant, prefix_ancestor:
Optional prefixes prepended to segment tokens in pair keys and examples.
Returns
- dict
Keys:
SoundCorrespondences— descendant segment → ranked ancestor segmentsAbsoluteFrequency—(desc, anc)→ countCognateset_IDs—(desc, anc)→ cognate set idsExamples—(desc, anc)→ example alignment strings
Examples
Build a frequency table for alignment scoring:
rows = list(csv.DictReader(open("cognates.csv", encoding="utf-8"))) stats = get_sound_correspondences(rows, "Uralign") scorer = stats["AbsoluteFrequency"]
Notes
Quantitative analysis —
make_results.pyin the Indo-Iranian–Hungarian study calls this on CLDF cognate tables to build TOML scorers and in-memory weights forUralign.CLDF workflows — training data from any wordlist with alternating descendant/ancestor rows and an alignment column can be passed in; no hard-coded language names are required.
Sequence edit distance and edit-operation utilities.
- loanpy.edit.apply_edit(word: Iterable[str], editops: list[str]) list[str][source]
Apply human-readable edit operations to a sequence of segments.
Operations are strings such as
"keep a","delete b","insert x", or"substitute a by x"(seepath_to_edit_operations()).Parameters
- word:
Input segments (characters or phoneme symbols).
- editops:
Operations produced by
path_to_edit_operations().
Returns
- list[str]
Transformed segment list.
Notes
Called by
repair()when aligning a word’s CV profile to the closest legal template.
- loanpy.edit.edit_distance_matrix(target: Iterable, source: Iterable) list[list[int]][source]
Build the minimum edit-distance matrix (insert/delete cost 1 each).
Both sequences are prefixed with
#. Matching symbols cost 0 on the diagonal; mismatches use unit-cost insertions or deletions.Parameters
- target, source:
Segment sequences to align (e.g. CV profiles).
Returns
- list[list[int]]
Dynamic-programming table.
Notes
Used by
repair()together withshortest_edit_path()andpath_to_edit_operations().
- loanpy.edit.edit_distance_with2ops(string1: str, string2: str, w_del: int | float = 1, w_ins: int | float = 1) int | float[source]
Edit distance using only insertions and deletions (no substitutions).
The cost is
(len(string1) - LCS) * w_del + (len(string2) - LCS) * w_ins, where LCS is the length of the longest common subsequence.Parameters
- string1, string2:
Comparable strings (often CV profiles or orthographic forms).
- w_del, w_ins:
Costs for unmatched symbols in
string1andstring2.
Returns
- int or float
Weighted edit distance.
Examples
>>> edit_distance_with2ops("CVC", "CVCCV", w_ins=100) 200
See also
get_closest_phonotactics()— picks a template by minimising this distance.Notes
Used indirectly by
Adaptvia phonotactic repair (get_closest_phonotactics()).
- loanpy.edit.path_to_edit_operations(op_list: list[tuple[int, int]], s1: str, s2: str) list[str][source]
Convert matrix path coordinates to human-readable edit operations.
Parameters
- op_list:
Path from
shortest_edit_path()(grid coordinates).- s1, s2:
Target and source strings (CV profiles without spaces).
Returns
- list[str]
Operations understood by
apply_edit().
- loanpy.edit.shortest_edit_path(mtx: list[list[int]]) list[tuple[int, int]] | None[source]
Find a lowest-cost edit path through a distance matrix.
Moves are right, down, or diagonal when the matrix value is unchanged on the diagonal step.
Parameters
- mtx:
Table from
edit_distance_matrix().
Returns
- list[tuple[int, int]] or None
Coordinate path from
(0, 0)to the bottom-right corner, orNoneif no path exists.
- loanpy.edit.substitute_operations(operations: list[str]) list[str][source]
Merge adjacent delete/insert pairs into substitute operations (in place).
Parameters
- operations:
List of operation strings; modified in place and also returned.
Returns
- list[str]
The same list, with merged
substitute … by …operations where possible.
Prosodic template expansion and closest-template matching.
- loanpy.phonotactics.expand_phonotactics(formula: str) list[str][source]
Expand a prosodic formula into space-separated CV templates.
(C)marks an optional consonant slot; bareCandVare required. Syllables are joined with+.Parameters
- formula:
Prosodic string, e.g.
"(C)V(C)+CV(C)+CV".
Returns
- list[str]
All expanded templates with spaces between C/V symbols.
Examples
>>> templates = expand_phonotactics("(C)V+CV") >>> "V C V" in templates True
Notes
Useful when building a recipient-language phonotactic inventory from a compact reconstruction (e.g. proto-language syllable structure in a paper). Inventories built this way feed
get_closest_phonotactics()andAdapt.
- loanpy.phonotactics.get_closest_phonotactics(cv_profile: list[str], phonotactic_inventory: list[str]) str[source]
Pick the inventory template closest to a CV profile (insert/delete only).
Parameters
- cv_profile:
List of
"C"/"V"symbols for one word.- phonotactic_inventory:
Legal templates (spaces allowed, e.g.
"C V C V").
Returns
- str
Best-matching template without spaces (e.g.
"CVCV").
Notes
Called by
repair(). Insertions are penalised heavily (w_ins=100) so that expanding a profile prefers extra consonant slots over spurious vowels.