pig.synth
This module contains components related to synthesis APIs within the PIG system. It includes functionalities for synthesizing code, handling synthesis tasks, and various utilities to support API migration tasks.
call
- pig.synth.call.FindCParent(parent, node)
Find the nearest enclosing class of
node, if any.Traverses the parent mapping upward until an
ast.ClassDefis found. ReturnsNoneif the traversal reaches anast.Modulewithout encountering a class.- Parameters:
parent (dict) – The parent mapping as produced by
ParentAst(), mapping each node to the set of its direct children.node – The AST node whose enclosing class is to be found.
- Returns:
The nearest enclosing
ast.ClassDef, orNoneifnodeis not nested inside any class.- Return type:
ast.ClassDef | None
- pig.synth.call.FindExprParent(parent, node)
Find the nearest
ast.Attributeorast.Callancestor ofnode.Traverses the parent mapping upward until an
ast.Attributeorast.Callnode is encountered. This is useful for resolving the outermost expression context of a name reference within a call chain.- Parameters:
parent (dict) – The parent mapping as produced by
ParentAst(), mapping each node to the set of its direct children.node – The AST node whose nearest attribute or call ancestor is to be found.
- Returns:
The nearest enclosing
ast.Attributeorast.Callnode, ornodeitself if no such ancestor exists.- Return type:
ast.Attribute | ast.Call | ast.AST
- pig.synth.call.FindFCParent(parent, node, depth=1)
Find the nearest function-or-class-container ancestor of
node.Traverses the parent mapping upward, skipping nodes that are not function or class containers (i.e. not in
llm_pre.stmtInFuncClass), until a qualifying ancestor is found. UnlikeFindSSParent(), this function considers onlyast.FunctionDef,ast.AsyncFunctionDef, andast.ClassDefas valid container types.- Parameters:
parent (dict) – The parent mapping as produced by
ParentAst(), mapping each node to the set of its direct children.node – The AST node whose function-or-class ancestor is to be found. Returns
Noneimmediately ifnodeis anast.Module.depth (int) – The number of function-or-class container hops to traverse.
depth=1returns the immediate enclosing function or class.
- Returns:
The function-or-class container ancestor at the requested depth, or
Noneifnodeis a module or no such ancestor exists.- Return type:
ast.FunctionDef | ast.AsyncFunctionDef | ast.ClassDef | None
- pig.synth.call.FindFParent(parent, node, depth=1)
Find the nearest enclosing function of
nodeat the given depth.Traverses the parent mapping upward, skipping nodes that are not
ast.FunctionDeforast.AsyncFunctionDef, until a function ancestor is found. UnlikeFindFCParent(), class definitions are not considered valid container boundaries.- Parameters:
parent (dict) – The parent mapping as produced by
ParentAst(), mapping each node to the set of its direct children.node – The AST node whose enclosing function is to be found. Returns
Noneimmediately ifnodeis anast.Module.depth (int) – The number of function-level hops to traverse.
depth=1returns the immediate enclosing function.
- Returns:
The enclosing
ast.FunctionDeforast.AsyncFunctionDefat the requested depth, orNoneifnodeis a module or no enclosing function exists.- Return type:
ast.FunctionDef | ast.AsyncFunctionDef | None
- pig.synth.call.FindParent(parent, node)
Find the direct parent of
noderegardless of its type.Searches the parent mapping for the key whose child set contains
node. UnlikeFindRealParent(), this function makes no distinction between expression-type and statement-type nodes.- Parameters:
parent (dict) – The parent mapping as produced by
ParentAst(), mapping each node to the set of its direct children.node – The AST node whose parent is to be found.
- Returns:
The direct parent node, or
nodeitself if no parent is found.- Return type:
ast.AST
- pig.synth.call.FindRealParent(parent, node, depth)
Find the nearest statement-type ancestor of
nodeat the given depth.Traverses the parent mapping upward, skipping expression-type intermediate nodes, until a statement-type (
stmt_type) ancestor is found. Thedepthparameter controls how many statement-level hops to take.- Parameters:
parent (dict) – The parent mapping as produced by
ParentAst(), mapping each node to the set of its direct children.node – The AST node whose statement ancestor is to be found.
depth (int) – The number of statement-level ancestors to traverse.
depth=1returns the immediate statement ancestor ofnode.
- Returns:
The statement-type ancestor at the requested depth, the
ast.Moduleif the root is reached, orNoneif no statement ancestor exists.- Return type:
ast.AST | None
- pig.synth.call.FindSSParent(parent, node, depth=1)
Find the nearest statement-container ancestor of
nodeat the given depth.Traverses the parent mapping upward, skipping nodes that are not statement-containers (i.e. not in
llm_pre.stmtInstmt), until a container-type ancestor is found. UnlikeFindRealParent(), this function considers only nodes that can themselves contain statements (e.g.ast.FunctionDef,ast.If,ast.For).- Parameters:
parent (dict) – The parent mapping as produced by
ParentAst(), mapping each node to the set of its direct children.node – The AST node whose statement-container ancestor is to be found.
depth (int) – The number of statement-container hops to traverse.
depth=1returns the immediate container ancestor ofnode.
- Returns:
The statement-container ancestor at the requested depth, or
Noneif no such ancestor exists.- Return type:
ast.AST | None
- pig.synth.call.FunctionDefs(root, ParentO)
Collect all user-defined functions and async functions in the AST.
Walks the entire AST and maps each function’s name to its defining node. If a function is defined inside a class, it is also registered under the
'self.<name>'key to allow resolution of method calls onself.- Parameters:
root (ast.AST) – The root AST node to walk.
ParentO – The parent-resolver object used to locate the enclosing class via
FindCParent().
- Returns:
A mapping from function name (and
'self.<name>'for methods) to the correspondingast.FunctionDeforast.AsyncFunctionDefnode.- Return type:
Dict[str, ast.FunctionDef | ast.AsyncFunctionDef]
- class pig.synth.call.NameExtractor(check=False, check1=False, libo='qwer')
Extract name references from an AST subtree.
Walks an AST node and collects identifiers into categorised lists, with optional filtering for
self-prefixed attribute access and library-namespaced names.- Parameters:
check (bool) – If
True, suppress collection of bare attribute names (i.e. onlyself.attrforms are collected for attributes).check1 (bool) – If
True, treatself.<attr>access as a single qualified name (e.g.'self.foo') rather than resolving the constituent parts separately.libo (str) – The library name to exclude from name collection. Any
ast.Attributenode whose unparsed form contains this string is skipped entirely.
- Variables:
list (list[str]) – Collected variable and attribute name references.
constants (list) – Collected constant values.
types (list[str]) – Collected annotation type names from
ast.AnnAssignnodes.
- pig.synth.call.ParentAst(root)
Build a mapping from each AST node to its direct children.
Walks the entire AST and collects, for each node, the set of its immediate child nodes as returned by
ast.iter_child_nodes().- Parameters:
root (ast.AST) – The root AST node to walk.
- Returns:
A mapping from each parent node to the set of its direct children.
- Return type:
dict[ast.AST, set[ast.AST]]
- class pig.synth.call.Preparation(func, apios=[])
Build a call-relation table and collect API usage nodes from the AST.
Visits the AST to produce two main data structures:
tableM: a call-relation mapping of the form{callee: {caller, ...}}, expressing which functions must be available for a given call to succeed.nodes: a mapping from each API name inapiosto the set of AST nodes where it is referenced.
Function and class definitions are also catalogued in
funcdefsandclassdefsrespectively for later use by dependency resolution.- Parameters:
func (list) – The list of function names to track in the call-relation table.
apios (list) – The list of old API names whose usage sites are to be collected. Defaults to an empty list.
- Variables:
tableM (Dict[str, Set[str]]) – Call-relation table mapping each callee name to the set of caller names that depend on it.
nodes (Dict[str, set[ast.AST]]) – Mapping from each API name to the set of AST nodes where it is referenced.
funcdefs (dict) – Mapping from function name to its defining
ast.FunctionDeforast.AsyncFunctionDefnode.classdefs (dict) – Mapping from class name to its defining
ast.ClassDefnode.apios (list) – The list of old API names being tracked.
fix_import
- pig.synth.fix_import.ImportFindPath(libo, libn, v1, nodes, apis, cmp=None)
Resolve the correct import statement for a variable in the new library.
Searches the new library’s API map to find all candidate import paths for
v1, then validates each candidate against the actual usage pattern innodesto determine the most appropriate import form. Duplicate candidates are resolved viaduplicate_imports_resolve()or, as a last resort, by string similarity againstcmp.The resolution process is handled by three inner helpers:
find(v1): scansapisto collect all candidate paths wherev1appears as a class, function, constant, or module name.pmaker(cand_path): expands a single candidate into the set of concrete import forms it could take (e.g.import A.Bandfrom A import B).check(nodes, cand_path): inspects howv1is actually used in each node and selects the import form whose path components match the usage pattern.
- Parameters:
libo (str) – The name of the original library.
libn (str) – The name of the new library.
v1 (str) – The variable name whose import statement is to be resolved.
nodes (set) – The set of AST nodes where
v1is referenced, used to infer the correct import form from actual usage.apis – The API map of the new library, structured as
{module_path: (classes, _, functions, _, constants)}.cmp (ast.Import | ast.ImportFrom | None) – An optional existing import node from the LLM-generated code, used as a similarity reference when duplicate candidates remain after resolution. Defaults to
None.
- Returns:
A set containing the resolved import node(s). Normally contains a single
ast.Importorast.ImportFromnode; may contain multiple if resolution is inconclusive.- Return type:
set
- pig.synth.fix_import.Importfind(code, nodes, var, libo, libn, apis, check=True)
Resolve and validate the import statement for a given variable name.
Searches
codefor existing import statements that definevar, then verifies and corrects the import path against the actual library structure. IfcheckisTrue, the resolved import is validated and potentially rewritten to point to the correct new library path usingis_total_import(),llm_pre.libname(), andImportFindPath(). IfcheckisFalse, the existing import is returned as-is.- Parameters:
code (ast.AST) – The AST of the file being analysed, used to extract existing import statements.
nodes (set) – The set of AST nodes where
varis referenced, used to resolve the fully qualified import path viais_total_import().var (str) – The variable name whose import statement is to be resolved.
libo (str) – The name of the original library.
libn (str) – The name of the new library.
apis – The API mapping passed through to
ImportFindPath()for path resolution.check (bool) – If
True, validate and rewrite the import path to match the correct new library. IfFalse, return the existing import statement unchanged. Defaults toTrue.
- Returns:
A tuple
(import_nodes, resolved_vars)whereimport_nodesis a set of correctedast.Importorast.ImportFromnodes, andresolved_varsis the set of variable names that were successfully resolved.- Return type:
tuple[set, set]
- pig.synth.fix_import.check_available_import(import_node, libn)
Check whether an import node resolves to a real path in the library source.
Converts the dotted module path of
import_nodeinto a file system path and verifies that it exists within the library’s source tree. Forast.ImportFromnodes, additionally checks that the imported name is actually accessible at that path viaget_accessible_apis().Import nodes that do not reference
libnat all are considered valid and returnTrueimmediately.- Parameters:
import_node (Union[ast.Import, ast.ImportFrom]) – The import statement to validate.
libn (str) – The name of the new library, used to locate its source root via
GIT_LOC.
- Returns:
Trueif the import path exists in the library source tree and (forast.ImportFrom) the imported name is accessible there;Falseotherwise.- Return type:
bool
- Raises:
ValueError – If
import_nodeis neither anast.Importnor anast.ImportFrom.
- pig.synth.fix_import.duplicate_imports_resolve(imps, nodes, libn, var, cmp=None)
Resolve a set of duplicate import candidates down to a single correct import.
Given multiple candidate import statements for the same variable
var, inspects howvaris actually used innodesand cross-references each candidate against the new library’s source files to determine which import is valid. Duplicates that survive the initial check are further resolved by module path depth or import type counts.The resolution process is handled by several inner helpers:
api_type(): determines whethervaris used as anast.Attribute,ast.Call, or bareast.Nameby tallying occurrences acrossnodes.find_next_attribute(nodes, var): identifies the most common attribute accessed onvar(e.g.var.attr), used when usage type is'Attribute'.find_args(node): extracts the positional and keyword argument counts from aast.Callnode.find_last_call(node): returns the final name or attribute in a call expression (e.g.cfroma.b.c()).check(...): validates a single candidate import against the library’s accessible APIs and the observed usage type, returning(True, import_node)if valid or(False, None)otherwise.
- Parameters:
imps (set[ast.Import | ast.ImportFrom]) – The set of candidate
ast.Importorast.ImportFromnodes to resolve.nodes (set[ast.AST]) – The AST nodes where
varis referenced, used to infer usage type and argument signatures.libn (str) – The name of the new library, used to locate its source files via
GIT_LOC.var (str) – The variable name whose import is being resolved.
cmp (ast.Import | ast.ImportFrom | None) – An optional existing import node from the LLM-generated code, used as a string-similarity reference if duplicates remain after all other resolution steps. Defaults to
None.
- Returns:
A set containing a single resolved import node. May contain more than one entry if resolution is inconclusive.
- Return type:
set[ast.Import | ast.ImportFrom]
Extract all names referenced in an API-related expression.
Visits an AST subtree and collects every
ast.Nameidentifier andast.Attributename that appears within expressions involvingapio. Forast.Callnodes, only arguments or the function itself that containapioare traversed.- Parameters:
apio (str) – The old API name used to filter which call sub-expressions are visited.
- Variables:
names (set[str]) – The set of collected name strings.
- pig.synth.fix_import.get_accessible_apis(_path, libn, name=None, dir=False)
Extract publicly accessible API names and their signatures from a library path.
Parses the Python source file or directory at
_pathand collects all top-level classes, functions, annotated assignments, global variables, and re-exported names. Ifnameis specified, only the entry matching that name is returned, with the source path appended for import resolution.The extraction is handled by three inner helpers:
get_apis(path, stack): recursively parses files and directories, populating the result dicts with discovered API entries.get_func_args(node): extracts the full argument signature of a function, including positional, keyword-only, default, and variadic arguments.get_class_args(node): extracts the__init__signature of a class, falling back to(0, …)if no__init__is present and no base classes exist, or('inf', …)if base classes are present.
Each API entry is stored as a list of the form:
[type, min_args, min_kwargs, max_args, max_kwargs, default_names, kw_names, ordinary_names]wheretypeis one of'class','func', or'var'.- Parameters:
_path (Path) – Path to the library source file (
.py) or directory to inspect.libn (str) – The name of the library being inspected (reserved for future use).
name (str | None) – If provided, only the API entry matching this name is returned, with
[path, 'imp']appended to its value.dir (bool) – If
True, treat_pathas a directory and enumerate its submodules instead of parsing a single file.
- Returns:
A tuple
(result, result2)whereresultmaps each API name to its signature list, andresult2maps names discovered in__init__.py(for directory paths) to their signatures.- Return type:
tuple[dict, dict]
- pig.synth.fix_import.is_total_import(root, var, libn)
Resolve the fully qualified import path of a variable within a library.
Walks the AST to reconstruct the attribute access chain leading to
var(e.g.torch.nn.Module→['torch', 'nn', 'Module']), then verifies each component against the library’s file system to determine how deep the chain corresponds to a real module path.- Parameters:
root (ast.AST) – The AST node to search for the variable reference.
var (str) – The name of the variable or attribute whose import path is to be resolved.
libn (str) – The name of the target library, used to look up the library’s root path from
GIT_LOC.
- Returns:
The fully qualified dotted path of
varup to the deepest resolvable module component (e.g.'torch.nn').- Return type:
str
llm_pre
- class pig.synth.llm_pre.DefUseGraph(imps={})
- pig.synth.llm_pre.DupImpSolver(code)
Remove duplicate import statements and reinsert them as deduplicated entries.
Uses an inner
ast.NodeTransformer(ImpDupRemover) to strip allast.Importandast.ImportFromnodes from the module body while collecting their unique aliases. A second inner helper (ImpDupSolver) then reinserts the deduplicated imports at the top of the module.Deduplication is keyed on
(name, asname)forast.Importand on(module, level)→{(name, asname), ...}forast.ImportFrom, so multiplefrom X import a, bstatements for the same module are merged into a single node.- Parameters:
code (ast.Module) – The AST module whose import statements are to be deduplicated.
- Returns:
The modified AST module with all duplicate imports removed and unique imports reinserted at the top of the module body.
- Return type:
ast.AST
- class pig.synth.llm_pre.ExtractArgs(only_args=False)
- class pig.synth.llm_pre.ExtractVarMap(nodeo)
- pig.synth.llm_pre.FindLastExpr(parent, node, depth)
Find the statement-type ancestor of an expression node at the given depth.
Traverses the parent mapping upward, skipping intermediate expression-type nodes, until a statement-type (
stmt_type) ancestor is reached. Similar toFindRealParent(), but returnsnodeitself (rather thanNone) if no ancestor is found.- Parameters:
parent (dict) – The parent mapping as produced by
ParentAst(), mapping each node to the set of its direct children.node (ast.AST) – The AST expression node to start traversal from.
depth (int) – The number of statement-level hops to traverse.
depth=1returns the immediate statement ancestor ofnode.
- Returns:
The statement-type ancestor at the requested depth,
Noneif an unexpected node type is encountered during traversal, ornodeitself if no parent is found at all.- Return type:
ast.AST | None
- class pig.synth.llm_pre.ModUseVars(mapping, funcdefs, ParentO, name_of_nodeo=None)
- pig.synth.llm_pre.check_two_sim(roota, rooto, var, noden, rootc, surnodes)
Rename a variable in
nodento match its counterpart in the new library code.Locates the definition of
varin the old API’s AST (roota), finds the corresponding name in the new library’s AST (rootc) viamatching.matcher1(), and rewrites allast.Namereferences tovarinnodenandsurnodesto use the new name.- Parameters:
roota (ast.AST) – The AST of the old API usage context, used to locate the definition of
var.rooto (ast.AST) – The original source AST (reserved for future use).
var (str) – The variable name to look up and potentially rename.
noden (ast.AST) – The primary AST node in which
varreferences are to be rewritten.rootc (str) – The unparsed source string of the new library’s AST, used as the rename target by
matching.matcher1().surnodes (list[ast.AST]) – Additional AST nodes in which
varreferences are also rewritten if a new name is found.
- Returns:
Trueif at least one reference tovarwas renamed,Falseotherwise.- Return type:
bool
- pig.synth.llm_pre.extract_var_map(nodeo, noden, codeo, coden, parento, b0=False)
Build a variable rename mapping between old and new API usage nodes.
Compares the variables used in
nodeo(old API) andnoden(new API) viaExtractVarMapand filters the raw candidates down to a mapping of{old_name: new_name}pairs that represent genuine renames.Filtering is handled by four inner helpers:
check_text_sim(var1, var2): accepts a pair if their string similarity exceeds 0.5 or one is a substring of the other.check_ast_sim(var1, var2, codeo, coden): locates the assignment nodes for each variable and delegates tomatching.single_matcher()to confirm structural similarity.check_targets(node, var): searches a single AST node for an assignment tovar, handlingast.Assign,ast.AugAssign,ast.AnnAssign, andast.Withtargets.filter(codeo, coden, v1, parento, nodeo): excludes pairs where the variable is defined in both the old and new code within the same scope, indicating it is not a rename but a shared local name.
When
b0isTrue, only AST-similarity is used and constant assignments that directly match their value in the old code are excluded. Whenb0isFalse, both text and AST similarity are applied alongside the scope filter.- Parameters:
nodeo (ast.AST) – The old API AST node whose variable references are the source of the mapping.
noden (ast.AST) – The new API AST node whose variable references are the rename targets.
codeo (ast.AST) – The full old code AST, used to locate variable definitions for similarity checks.
coden (ast.AST) – The full new code AST, used to locate variable definitions for similarity checks.
parento – The parent mapping of the old code as produced by
call.ParentAst(), used for scope resolution.b0 (bool) – If
True, apply AST-similarity-only filtering suitable for direct API node comparisons. IfFalse, apply the full text and AST similarity pipeline with scope filtering. Defaults toFalse.
- Returns:
A mapping from each old variable name to its corresponding new variable name.
- Return type:
dict[str, str]
- pig.synth.llm_pre.is_async(node)
Check whether an AST node contains any asynchronous constructs.
Walks the AST subtree rooted at
nodeand returnsTrueif anyast.Await,ast.AsyncWith, orast.AsyncFornode is found.- Parameters:
node (ast.AST) – The AST node to inspect.
- Returns:
Trueif the node contains asynchronous constructs,Falseotherwise.- Return type:
bool
- pig.synth.llm_pre.libname(libo)
Resolve the importable top-level package name for a given library identifier.
Looks up
liboin the combined git location map (GIT_LOC) and derives the importable name from the final component of its repository path. A small set of known mismatches between PyPI package names and importable names are corrected by hard-coded overrides.- Parameters:
libo (str) – The library identifier (typically the PyPI package name) to resolve.
- Returns:
The importable top-level package name for
libo, orliboitself if it cannot be found in the location map.- Return type:
str
- pig.synth.llm_pre.scope_name(nodeo, noden, parent)
Return the name of the enclosing function or class scope for a node.
Determines the appropriate scope by inspecting
noden: if the new node is itself a function or class definition, the search starts two levels up fromnodeo; otherwise it starts one level up. The scope name is resolved viaslicing.extract_name().- Parameters:
nodeo (ast.AST) – The original AST node whose enclosing scope is to be found.
noden (ast.AST) – The new AST node used to decide the traversal depth.
parent (dict) – The parent mapping as produced by
call.ParentAst().
- Returns:
The name of the enclosing function, class, or
'module'if the traversal reaches the module root.- Return type:
str | None
matching
This file is used to match the target ast node with the new ast node Input: target node, LLM Code Output: matched ast node
- pig.synth.matching.filter_stmt(noden, nodeo, apins, rootn, apio)
Determine whether a new API statement genuinely references a new API.
Checks whether any name referenced in
nodencorresponds to a known new API entry inapins, after stripping locally assigned names and function arguments that would shadow external references. Several early-exit conditions are also handled:If
nodenandnodeounparse identically andapiois'get', returnsTrueimmediately.If
apioappears as a string constant insidenodeo, returnsTrueimmediately.
Names are collected from
ast.Name,ast.Attribute, andast.operatornodes. Import aliases are resolved so that anas-aliased name is expanded to its original name before matching. Locally assigned targets and function arguments within the enclosing function or class scope are removed from the candidate set to avoid false positives.- Parameters:
noden (ast.AST) – The new API AST node whose name references are inspected.
nodeo (ast.AST) – The old API AST node, used for identity comparison and constant checking.
apins (dict) – A mapping from API path to lists of
(name, ...)tuples representing known new API names.rootn (ast.AST) – The root AST module of the new code, used to resolve import aliases and locate the enclosing scope.
apio (str) – The old API name, used for the identity and constant early-exit checks.
- Returns:
Trueif any name innodenmatches a known new API entry,Falseotherwise.- Return type:
bool
- pig.synth.matching.matcher(rootb_str, roota_str, nodeo, roota, rooto, dec=False, api=None, gumtree=True)
Find the new code node that corresponds to an old API call node.
Uses GumTree (via JPype) to compute a tree-diff between the old and new code strings, then walks the sub-expressions of
nodeoto locate the best-matching node inroota. Matching is performed by unparsing each sub-expression, finding its character offsets inrootb_str, and querying the JavaPMatcherfor the corresponding span inroota_str. The candidate with the highest vote count across sub-expressions is returned as the winner viadecide_winner().- Parameters:
rootb_str (str) – The unparsed source string of the old code, used as the baseline for the GumTree diff.
roota_str (str) – The unparsed source string of the new code, used as the revised version for the GumTree diff.
nodeo (ast.AST) – The old API call node to find a match for. If it is a
ast.Withorast.AsyncWith, the context expression is used as the matching target.roota (ast.AST) – The AST of the new code, used to resolve matched character spans back to AST nodes via
BestMap()and to build a parent mapping viacall.ParentAst().rooto (ast.AST) – The AST of the old code (reserved for future use).
dec (bool) – If
True, return the matched node directly without resolving it to its nearest statement ancestor viacall.FindRealParent(). Defaults toFalse.api – Reserved for future use. Defaults to
None.gumtree (bool) – If
True, use the custom GumTree jar (ours.jar); otherwise use the default jar (default.jar). Also controls whether unmatched sub-expressions are treated as hard failures. Defaults toTrue.
- Returns:
A tuple
(nodeo, noden)wherenodenis the best-matching AST node found inroota, orNoneif no match could be determined. Forast.argnodes with type annotations, returns(old_annotation, new_annotation)instead.- Return type:
tuple[ast.AST, ast.AST | None]
- pig.synth.matching.single_matcher(rootb_str, roota_str, nodeo, result_noden, roota)
Determine whether two assignment nodes are structurally equivalent.
Uses GumTree (via JPype and
ours.jar) to compute a tree-diff between the old and new code strings, then checks whethernodeoandresult_nodenare matched to each other in the resulting mapping.The JVM is started on first call if it is not already running.
- Parameters:
rootb_str (str) – The unparsed source string of the old code, passed to the GumTree matcher as the baseline.
roota_str (str) – The unparsed source string of the new code, passed to the GumTree matcher as the revised version.
nodeo (Union[ast.Assign, ast.AnnAssign, ast.AugAssign, ast.Expression]) – The assignment node from the old code to match.
result_noden (Union[ast.Assign, ast.AnnAssign, ast.AugAssign, ast.Expression]) – The assignment node from the new code to match against
nodeo.roota (ast.AST) – The AST of the new code, used to build a parent mapping via
call.ParentAst().
- Returns:
Trueifnodeoandresult_nodenare identified as a matched pair by GumTree,Falseotherwise.- Return type:
bool
- pig.synth.matching.total_mappings(rooto, rootn, codea, parento, mapping, libo, libn, oldapi, nodeo=None, noden=None, name1=None, name2=None)
Resolve the final variable rename mapping between old and new API code.
Aggregates per-scope variable mappings collected across multiple nodes and filters out invalid candidates — such as library names, API names, imported aliases, and names already used as assignment targets in the old code — to produce a clean
{old_name: new_name}mapping suitable for applying to the migrated code.Filtering is performed by the inner helper
check(val), which rejects a candidate new name if any of the following hold:The old or new code imports
libounder an alias that matchesval.valcoincides with the old API name, library name, or their canonical forms as resolved byllm_pre.libname().valis already used as an assignment target or function argument in the old code.
If
nodeo,noden,name1, andname2are all provided, node-level mappings extracted viallm_pre.extract_var_map()are incorporated before the scope-level mapping is applied.- Parameters:
rooto (ast.AST) – The AST of the original (old API) code.
rootn (ast.AST) – The AST of the new (LLM-generated) code.
codea (str) – The unparsed source string of the migrated code, used as context for name resolution.
parento – The parent mapping of the old code as produced by
call.ParentAst().mapping (dict[tuple[str, str], set[str]]) – A mapping from
(old_variable_name, scope_name)tuples to the set of candidate new variable names collected across nodes.libo (str) – The name of the original library.
libn (str) – The name of the new library.
oldapi (str) – The old API name, excluded from rename candidates.
nodeo (ast.AST | None) – The specific old API call node, used for node-level mapping when provided alongside
noden,name1, andname2.noden (ast.AST | None) – The specific new API call node, used for node-level mapping.
name1 (str | None) – The scope name of
nodeo, used to restrict node-level mapping to the correct scope.name2 (str | None) – The scope name of
noden, used to restrict node-level mapping to the correct scope.
- Returns:
A mapping from each old variable name to its resolved new name.
- Return type:
dict[str, str]
sketch
- pig.synth.sketch.PreRequired(h, key, val, history, mappings, CENs, UnAssignedVarsO, ParentO, ParentN, coden, FuncDefs, OldApi, libn, libo, apis, b_imports, b_surround, rootb_str, roota_str, rooto, roota, has_dec=False)
Resolve unassigned variables in a migrated node by adding surround nodes and imports.
For a given
(key, val)pair representing an old-to-new API node mapping, identifies variables used invalthat are not yet assigned in the new code (target_names), then attempts to satisfy them in two stages:Surround nodes (if
b_surroundisTrue): callssynthesis.Surround()to find and insert neighbouring statements from the old code that define the remaining variables.Import statements (if
b_importsisTrue): callsfix_import.Importfind()with full path-checking for each variable still unresolved after stage 1; otherwise falls back to a direct import lookup (check=False).
Variables that appear in
CENs,FuncDefs,UnAssignedVarsO, or the current variable renamemappingsare excluded fromtarget_namesbefore resolution begins. Ifvalreferences the old API name as a load (but not a store), it is added back totarget_namesto ensure the corresponding import is included.- Parameters:
h (ast.AST) – The current working AST being built up, modified in-place by
synthesis.Surround().key (ast.AST) – The original old API call node that triggered this migration step.
val (ast.AST) – The new API node that replaces
key.history (dict) – A mutable state dict tracking processed nodes; its
'changes'entry is updated with any newly added surround nodes.mappings (dict) – The variable rename mapping from
(old_name, scope_name)to candidate new names, used to exclude already-resolved variables fromtarget_names.CENs (set) – The set of built-in and context-defined names to exclude from dependency resolution.
UnAssignedVarsO (dict) – A mapping from scope name to the set of variables that are unassigned in the old code’s scope.
ParentO – The parent mapping of the old code as produced by
call.ParentAst().ParentN – The parent mapping of the new code as produced by
call.ParentAst().coden (ast.AST) – The AST of the new (LLM-generated) code, passed to
fix_import.Importfind()for import resolution.FuncDefs (set) – The set of function names defined in the old code, excluded from
target_names.OldApi (str) – The old API name; if referenced as a load in
val, it is added totarget_namesto ensure its import is retained.libn (str) – The name of the new library.
libo (str) – The name of the original library.
apis – The API map of the new library, passed through to
fix_import.Importfind().b_imports (bool) – If
True, resolve remaining variables to import statements using full path validation; otherwise use direct lookup.b_surround (bool) – If
True, attempt to resolve unassigned variables by inserting surround nodes from the old code before falling back to imports.rootb_str (str) – The unparsed source string of the old code, passed through to
synthesis.Surround()for GumTree matching.roota_str (str) – The unparsed source string of the new code, passed through to
synthesis.Surround()for GumTree matching.rooto (ast.AST) – The AST of the old code.
roota (ast.AST) – The AST of the new code.
has_dec (bool) – Reserved for future use. Defaults to
False.
- Returns:
A tuple
(h, NCImport, CENs1, history)wherehis the updated working AST,NCImportis the set of newly resolved import nodes,CENs1is the updated set of resolved import names, andhistoryis the updated history dict.- Return type:
tuple[ast.AST, set, set, dict]
- pig.synth.sketch.migrator(OldApi, OCNs, ParentN, ParentO, codeo, coden, libo, libn, history, FuncDefs, UnAssignedVarsO, CENs, OldTree1, codeo_str, coden_str, apis, b_imports=True, b_varmap=True, b_surround=True, b_postprocess=True, gumtree=True)
Migrate all old API call sites in
codeoto their new API equivalents.For each old API call node in
OCNs[OldApi], finds the corresponding new API node incodenvia GumTree-based tree matching, applies variable rename mappings, resolves unassigned variables through surround nodes and import statements, and returns the fully migrated AST.The migration proceeds in four stages:
Classification – partitions
OCNs[OldApi]into normal statement nodes, decorator nodes, class-base nodes, exception handlers, and type-annotation args.Matching – for each normal node, uses
matching.matcher()(orllm_pre.MatchName()for structural nodes) to find the corresponding new node incoden. Nodes for which no match is found are added todel_nodes_candsfor later removal.Pre-processing – for each
(old, new)pair inresult, callsPreRequired()to insert surround nodes and import statements that satisfy unresolved variable references.Post-processing – applies variable rename mappings via
total_mappings(), rewrites the working ASThin-place, and removes any nodes indel_nodes_candsthat were not matched.
Special cases handled during matching:
ast.ExceptHandlernodes are matched at the handler orTrylevel and stored by their.typeattribute.ast.Namenodes matched toast.argare replaced by the argument’s type annotation.ast.With/ast.AsyncWithnodes are matched as whole context-manager statements.Decorator nodes that match a
ast.ClassDeforast.FunctionDefhave their bodies merged rather than replaced.
- Parameters:
OldApi (str) – The old API name whose call sites are to be migrated.
OCNs – A mapping from API name to the list of AST nodes where that API is used, as produced by
call.Preparation.ParentN (dict) – The parent mapping of the new code as produced by
call.ParentAst().ParentO (dict) – The parent mapping of the old code as produced by
call.ParentAst().codeo (ast.AST) – The AST of the original (old API) code.
coden (ast.AST) – The AST of the new (LLM-generated) code.
libo (str) – The name of the original library.
libn (str) – The name of the new library.
history (dict[str, dict]) – A mutable state dict tracking processed nodes and accumulated changes across multiple
SketchMakercalls.FuncDefs – The set of function names defined in the old code, excluded from variable dependency resolution.
UnAssignedVarsO – A mapping from scope name to the set of variables unassigned in the old code’s scope.
CENs (set) – The set of built-in and context-defined names to exclude from dependency resolution.
OldTree1 – The GumTree representation of the old code, used by
matching.var_divide()for sub-expression matching.ParentO1 – An alternative parent mapping of the old code used for deeper ancestor lookups.
codeo_str (str) – The unparsed source string of the old code.
coden_str (str) – The unparsed source string of the new code.
apis – The API map of the new library, passed to
fix_import.Importfind()andmatching.filter_stmt().b_imports (bool) – If
True, resolve unassigned variables to import statements using full path validation. Defaults toTrue.b_varmap (bool) – If
True, apply variable rename mappings to the migrated nodes. Defaults toTrue.b_surround (bool) – If
True, insert surround nodes from the old code to satisfy unresolved variable references. Defaults toTrue.b_postprocess (bool) – If
True, run the post-processing step to apply rename mappings and clean up deleted nodes. Defaults toTrue.gumtree (bool) – If
True, use the custom GumTree matcher (ours.jar); otherwise use the default matcher. Defaults toTrue.
- Returns:
The migrated AST with all matched old API nodes replaced by their new equivalents, surround nodes inserted, and imports resolved.
- Return type:
ast.AST
synthesis
- class pig.synth.synthesis.AsyncFD(NCF, check0, check1)
- class pig.synth.synthesis.FindSurFCs(nv)
- class pig.synth.synthesis.ImportDeleter(libo)
Remove all import statements that reference a given library.
Walks the AST and strips any
ast.Importorast.ImportFromnode whose module or alias name containslibo. A special case handles'ruamel.yaml'which uses a dotted top-level name.- Parameters:
libo (str) – The library name to remove from import statements.
- class pig.synth.synthesis.NameBool(name, ctx, depth, usedvars=None)
- class pig.synth.synthesis.SynthDel(ONs, UnAssVars, UnUseVars, history=None, replace=False, replacenode=None, usedvars=None, dec=False)
- class pig.synth.synthesis.SynthImport(NCImports)
Prepend a set of new import statements to the module body.
Visits the
ast.Modulenode and inserts all nodes inNCImportsat the beginning ofnode.body.- Parameters:
NCImports (set[Union[ast.Import, ast.Module, ast.ImportFrom]]) – The set of import nodes to prepend.
- class pig.synth.synthesis.SynthSame(OCNP, NCNP, history, ParentO, HAS_CB=False, HAS_DEC=False)
- class pig.synth.synthesis.TrimRoot(targets, exception)
Remove a specific set of statement nodes from the AST.
Walks the AST and returns
Nonefor any node found intargets, effectively deleting it.ast.Withandast.AsyncWithnodes that matchexceptionare preserved regardless of whether they appear intargets.Intended for cleaning up
With/AsyncWithtarget statements that have been absorbed into a new context-manager node.- Parameters:
targets (list[ast.stmt]) – The list of statement nodes to remove.
exception (ast.stmt) – A single node that must never be removed, even if it appears in
targets(typically the newWith/AsyncWithnode that absorbed the targets).
- class pig.synth.synthesis.UnusedVars(libo=None, name='module')
- class pig.synth.synthesis.VarExtractor(name='module', check=False)
Collect all variable names referenced within an AST, organised by scope.
Walks the AST and records every
ast.Nameidentifier into a per-scope dict keyed by the enclosing function or class name ('module'for top-level code). Import names are collected separately inself.imports.- Parameters:
name (str) – The initial scope name. Defaults to
'module'.check (bool) – If
True,self.<attr>assignments are tracked as separate qualified names (e.g.'self.foo').
- Variables:
vars (dict[str, set[str]]) – Mapping from scope name to the set of variable names referenced in that scope.
imports (set[str]) – Set of all imported names and aliases.
- pig.synth.synthesis.stmt_to_dec(key, val, h, ParentO, funcdefs)
Merge decorator lists from a new function node into the matching function in
h.If the new function name
val.namealready exists infuncdefs, its decorator list is appended to the matching node found inh. Otherwise, the enclosing function or class ofkeyin the old code is located viacall.FindFCParent()and its decorator list is updated instead.- Parameters:
key (ast.AST) – The old API call node used to locate the enclosing function or class when
val.nameis not infuncdefs.val (Union[ast.FunctionDef, ast.AsyncFunctionDef]) – The new function node whose decorator list is to be merged.
h (ast.AST) – The current working AST, walked to find the target function.
ParentO (dict) – The parent mapping of the old code as produced by
call.ParentAst().funcdefs (set) – The set of function names already defined in the working AST.
- Returns:
The modified working AST with the decorator list updated.
- Return type:
ast.AST