pig.synth

This module contains components related to synthesis APIs within the PIG system. It includes functionalities for synthesizing code, handling synthesis tasks, and various utilities to support API migration tasks.

call

pig.synth.call.FindCParent(parent, node)

Find the nearest enclosing class of node, if any.

Traverses the parent mapping upward until an ast.ClassDef is found. Returns None if the traversal reaches an ast.Module without encountering a class.

Parameters:
  • parent (dict) – The parent mapping as produced by ParentAst(), mapping each node to the set of its direct children.

  • node – The AST node whose enclosing class is to be found.

Returns:

The nearest enclosing ast.ClassDef, or None if node is not nested inside any class.

Return type:

ast.ClassDef | None

pig.synth.call.FindExprParent(parent, node)

Find the nearest ast.Attribute or ast.Call ancestor of node.

Traverses the parent mapping upward until an ast.Attribute or ast.Call node is encountered. This is useful for resolving the outermost expression context of a name reference within a call chain.

Parameters:
  • parent (dict) – The parent mapping as produced by ParentAst(), mapping each node to the set of its direct children.

  • node – The AST node whose nearest attribute or call ancestor is to be found.

Returns:

The nearest enclosing ast.Attribute or ast.Call node, or node itself if no such ancestor exists.

Return type:

ast.Attribute | ast.Call | ast.AST

pig.synth.call.FindFCParent(parent, node, depth=1)

Find the nearest function-or-class-container ancestor of node.

Traverses the parent mapping upward, skipping nodes that are not function or class containers (i.e. not in llm_pre.stmtInFuncClass), until a qualifying ancestor is found. Unlike FindSSParent(), this function considers only ast.FunctionDef, ast.AsyncFunctionDef, and ast.ClassDef as valid container types.

Parameters:
  • parent (dict) – The parent mapping as produced by ParentAst(), mapping each node to the set of its direct children.

  • node – The AST node whose function-or-class ancestor is to be found. Returns None immediately if node is an ast.Module.

  • depth (int) – The number of function-or-class container hops to traverse. depth=1 returns the immediate enclosing function or class.

Returns:

The function-or-class container ancestor at the requested depth, or None if node is a module or no such ancestor exists.

Return type:

ast.FunctionDef | ast.AsyncFunctionDef | ast.ClassDef | None

pig.synth.call.FindFParent(parent, node, depth=1)

Find the nearest enclosing function of node at the given depth.

Traverses the parent mapping upward, skipping nodes that are not ast.FunctionDef or ast.AsyncFunctionDef, until a function ancestor is found. Unlike FindFCParent(), class definitions are not considered valid container boundaries.

Parameters:
  • parent (dict) – The parent mapping as produced by ParentAst(), mapping each node to the set of its direct children.

  • node – The AST node whose enclosing function is to be found. Returns None immediately if node is an ast.Module.

  • depth (int) – The number of function-level hops to traverse. depth=1 returns the immediate enclosing function.

Returns:

The enclosing ast.FunctionDef or ast.AsyncFunctionDef at the requested depth, or None if node is a module or no enclosing function exists.

Return type:

ast.FunctionDef | ast.AsyncFunctionDef | None

pig.synth.call.FindParent(parent, node)

Find the direct parent of node regardless of its type.

Searches the parent mapping for the key whose child set contains node. Unlike FindRealParent(), this function makes no distinction between expression-type and statement-type nodes.

Parameters:
  • parent (dict) – The parent mapping as produced by ParentAst(), mapping each node to the set of its direct children.

  • node – The AST node whose parent is to be found.

Returns:

The direct parent node, or node itself if no parent is found.

Return type:

ast.AST

pig.synth.call.FindRealParent(parent, node, depth)

Find the nearest statement-type ancestor of node at the given depth.

Traverses the parent mapping upward, skipping expression-type intermediate nodes, until a statement-type (stmt_type) ancestor is found. The depth parameter controls how many statement-level hops to take.

Parameters:
  • parent (dict) – The parent mapping as produced by ParentAst(), mapping each node to the set of its direct children.

  • node – The AST node whose statement ancestor is to be found.

  • depth (int) – The number of statement-level ancestors to traverse. depth=1 returns the immediate statement ancestor of node.

Returns:

The statement-type ancestor at the requested depth, the ast.Module if the root is reached, or None if no statement ancestor exists.

Return type:

ast.AST | None

pig.synth.call.FindSSParent(parent, node, depth=1)

Find the nearest statement-container ancestor of node at the given depth.

Traverses the parent mapping upward, skipping nodes that are not statement-containers (i.e. not in llm_pre.stmtInstmt), until a container-type ancestor is found. Unlike FindRealParent(), this function considers only nodes that can themselves contain statements (e.g. ast.FunctionDef, ast.If, ast.For).

Parameters:
  • parent (dict) – The parent mapping as produced by ParentAst(), mapping each node to the set of its direct children.

  • node – The AST node whose statement-container ancestor is to be found.

  • depth (int) – The number of statement-container hops to traverse. depth=1 returns the immediate container ancestor of node.

Returns:

The statement-container ancestor at the requested depth, or None if no such ancestor exists.

Return type:

ast.AST | None

pig.synth.call.FunctionDefs(root, ParentO)

Collect all user-defined functions and async functions in the AST.

Walks the entire AST and maps each function’s name to its defining node. If a function is defined inside a class, it is also registered under the 'self.<name>' key to allow resolution of method calls on self.

Parameters:
  • root (ast.AST) – The root AST node to walk.

  • ParentO – The parent-resolver object used to locate the enclosing class via FindCParent().

Returns:

A mapping from function name (and 'self.<name>' for methods) to the corresponding ast.FunctionDef or ast.AsyncFunctionDef node.

Return type:

Dict[str, ast.FunctionDef | ast.AsyncFunctionDef]

class pig.synth.call.NameExtractor(check=False, check1=False, libo='qwer')

Extract name references from an AST subtree.

Walks an AST node and collects identifiers into categorised lists, with optional filtering for self-prefixed attribute access and library-namespaced names.

Parameters:
  • check (bool) – If True, suppress collection of bare attribute names (i.e. only self.attr forms are collected for attributes).

  • check1 (bool) – If True, treat self.<attr> access as a single qualified name (e.g. 'self.foo') rather than resolving the constituent parts separately.

  • libo (str) – The library name to exclude from name collection. Any ast.Attribute node whose unparsed form contains this string is skipped entirely.

Variables:
  • list (list[str]) – Collected variable and attribute name references.

  • constants (list) – Collected constant values.

  • types (list[str]) – Collected annotation type names from ast.AnnAssign nodes.

pig.synth.call.ParentAst(root)

Build a mapping from each AST node to its direct children.

Walks the entire AST and collects, for each node, the set of its immediate child nodes as returned by ast.iter_child_nodes().

Parameters:

root (ast.AST) – The root AST node to walk.

Returns:

A mapping from each parent node to the set of its direct children.

Return type:

dict[ast.AST, set[ast.AST]]

class pig.synth.call.Preparation(func, apios=[])

Build a call-relation table and collect API usage nodes from the AST.

Visits the AST to produce two main data structures:

  • tableM: a call-relation mapping of the form {callee: {caller, ...}}, expressing which functions must be available for a given call to succeed.

  • nodes: a mapping from each API name in apios to the set of AST nodes where it is referenced.

Function and class definitions are also catalogued in funcdefs and classdefs respectively for later use by dependency resolution.

Parameters:
  • func (list) – The list of function names to track in the call-relation table.

  • apios (list) – The list of old API names whose usage sites are to be collected. Defaults to an empty list.

Variables:
  • tableM (Dict[str, Set[str]]) – Call-relation table mapping each callee name to the set of caller names that depend on it.

  • nodes (Dict[str, set[ast.AST]]) – Mapping from each API name to the set of AST nodes where it is referenced.

  • funcdefs (dict) – Mapping from function name to its defining ast.FunctionDef or ast.AsyncFunctionDef node.

  • classdefs (dict) – Mapping from class name to its defining ast.ClassDef node.

  • apios (list) – The list of old API names being tracked.

fix_import

pig.synth.fix_import.ImportFindPath(libo, libn, v1, nodes, apis, cmp=None)

Resolve the correct import statement for a variable in the new library.

Searches the new library’s API map to find all candidate import paths for v1, then validates each candidate against the actual usage pattern in nodes to determine the most appropriate import form. Duplicate candidates are resolved via duplicate_imports_resolve() or, as a last resort, by string similarity against cmp.

The resolution process is handled by three inner helpers:

  • find(v1): scans apis to collect all candidate paths where v1 appears as a class, function, constant, or module name.

  • pmaker(cand_path): expands a single candidate into the set of concrete import forms it could take (e.g. import A.B and from A import B).

  • check(nodes, cand_path): inspects how v1 is actually used in each node and selects the import form whose path components match the usage pattern.

Parameters:
  • libo (str) – The name of the original library.

  • libn (str) – The name of the new library.

  • v1 (str) – The variable name whose import statement is to be resolved.

  • nodes (set) – The set of AST nodes where v1 is referenced, used to infer the correct import form from actual usage.

  • apis – The API map of the new library, structured as {module_path: (classes, _, functions, _, constants)}.

  • cmp (ast.Import | ast.ImportFrom | None) – An optional existing import node from the LLM-generated code, used as a similarity reference when duplicate candidates remain after resolution. Defaults to None.

Returns:

A set containing the resolved import node(s). Normally contains a single ast.Import or ast.ImportFrom node; may contain multiple if resolution is inconclusive.

Return type:

set

pig.synth.fix_import.Importfind(code, nodes, var, libo, libn, apis, check=True)

Resolve and validate the import statement for a given variable name.

Searches code for existing import statements that define var, then verifies and corrects the import path against the actual library structure. If check is True, the resolved import is validated and potentially rewritten to point to the correct new library path using is_total_import(), llm_pre.libname(), and ImportFindPath(). If check is False, the existing import is returned as-is.

Parameters:
  • code (ast.AST) – The AST of the file being analysed, used to extract existing import statements.

  • nodes (set) – The set of AST nodes where var is referenced, used to resolve the fully qualified import path via is_total_import().

  • var (str) – The variable name whose import statement is to be resolved.

  • libo (str) – The name of the original library.

  • libn (str) – The name of the new library.

  • apis – The API mapping passed through to ImportFindPath() for path resolution.

  • check (bool) – If True, validate and rewrite the import path to match the correct new library. If False, return the existing import statement unchanged. Defaults to True.

Returns:

A tuple (import_nodes, resolved_vars) where import_nodes is a set of corrected ast.Import or ast.ImportFrom nodes, and resolved_vars is the set of variable names that were successfully resolved.

Return type:

tuple[set, set]

pig.synth.fix_import.check_available_import(import_node, libn)

Check whether an import node resolves to a real path in the library source.

Converts the dotted module path of import_node into a file system path and verifies that it exists within the library’s source tree. For ast.ImportFrom nodes, additionally checks that the imported name is actually accessible at that path via get_accessible_apis().

Import nodes that do not reference libn at all are considered valid and return True immediately.

Parameters:
  • import_node (Union[ast.Import, ast.ImportFrom]) – The import statement to validate.

  • libn (str) – The name of the new library, used to locate its source root via GIT_LOC.

Returns:

True if the import path exists in the library source tree and (for ast.ImportFrom) the imported name is accessible there; False otherwise.

Return type:

bool

Raises:

ValueError – If import_node is neither an ast.Import nor an ast.ImportFrom.

pig.synth.fix_import.duplicate_imports_resolve(imps, nodes, libn, var, cmp=None)

Resolve a set of duplicate import candidates down to a single correct import.

Given multiple candidate import statements for the same variable var, inspects how var is actually used in nodes and cross-references each candidate against the new library’s source files to determine which import is valid. Duplicates that survive the initial check are further resolved by module path depth or import type counts.

The resolution process is handled by several inner helpers:

  • api_type(): determines whether var is used as an ast.Attribute, ast.Call, or bare ast.Name by tallying occurrences across nodes.

  • find_next_attribute(nodes, var): identifies the most common attribute accessed on var (e.g. var.attr), used when usage type is 'Attribute'.

  • find_args(node): extracts the positional and keyword argument counts from a ast.Call node.

  • find_last_call(node): returns the final name or attribute in a call expression (e.g. c from a.b.c()).

  • check(...): validates a single candidate import against the library’s accessible APIs and the observed usage type, returning (True, import_node) if valid or (False, None) otherwise.

Parameters:
  • imps (set[ast.Import | ast.ImportFrom]) – The set of candidate ast.Import or ast.ImportFrom nodes to resolve.

  • nodes (set[ast.AST]) – The AST nodes where var is referenced, used to infer usage type and argument signatures.

  • libn (str) – The name of the new library, used to locate its source files via GIT_LOC.

  • var (str) – The variable name whose import is being resolved.

  • cmp (ast.Import | ast.ImportFrom | None) – An optional existing import node from the LLM-generated code, used as a string-similarity reference if duplicates remain after all other resolution steps. Defaults to None.

Returns:

A set containing a single resolved import node. May contain more than one entry if resolution is inconclusive.

Return type:

set[ast.Import | ast.ImportFrom]

Extract all names referenced in an API-related expression.

Visits an AST subtree and collects every ast.Name identifier and ast.Attribute name that appears within expressions involving apio. For ast.Call nodes, only arguments or the function itself that contain apio are traversed.

Parameters:

apio (str) – The old API name used to filter which call sub-expressions are visited.

Variables:

names (set[str]) – The set of collected name strings.

pig.synth.fix_import.get_accessible_apis(_path, libn, name=None, dir=False)

Extract publicly accessible API names and their signatures from a library path.

Parses the Python source file or directory at _path and collects all top-level classes, functions, annotated assignments, global variables, and re-exported names. If name is specified, only the entry matching that name is returned, with the source path appended for import resolution.

The extraction is handled by three inner helpers:

  • get_apis(path, stack): recursively parses files and directories, populating the result dicts with discovered API entries.

  • get_func_args(node): extracts the full argument signature of a function, including positional, keyword-only, default, and variadic arguments.

  • get_class_args(node): extracts the __init__ signature of a class, falling back to (0, …) if no __init__ is present and no base classes exist, or ('inf', …) if base classes are present.

Each API entry is stored as a list of the form: [type, min_args, min_kwargs, max_args, max_kwargs, default_names, kw_names, ordinary_names] where type is one of 'class', 'func', or 'var'.

Parameters:
  • _path (Path) – Path to the library source file (.py) or directory to inspect.

  • libn (str) – The name of the library being inspected (reserved for future use).

  • name (str | None) – If provided, only the API entry matching this name is returned, with [path, 'imp'] appended to its value.

  • dir (bool) – If True, treat _path as a directory and enumerate its submodules instead of parsing a single file.

Returns:

A tuple (result, result2) where result maps each API name to its signature list, and result2 maps names discovered in __init__.py (for directory paths) to their signatures.

Return type:

tuple[dict, dict]

pig.synth.fix_import.is_total_import(root, var, libn)

Resolve the fully qualified import path of a variable within a library.

Walks the AST to reconstruct the attribute access chain leading to var (e.g. torch.nn.Module['torch', 'nn', 'Module']), then verifies each component against the library’s file system to determine how deep the chain corresponds to a real module path.

Parameters:
  • root (ast.AST) – The AST node to search for the variable reference.

  • var (str) – The name of the variable or attribute whose import path is to be resolved.

  • libn (str) – The name of the target library, used to look up the library’s root path from GIT_LOC.

Returns:

The fully qualified dotted path of var up to the deepest resolvable module component (e.g. 'torch.nn').

Return type:

str

llm_pre

class pig.synth.llm_pre.DefUseGraph(imps={})
pig.synth.llm_pre.DupImpSolver(code)

Remove duplicate import statements and reinsert them as deduplicated entries.

Uses an inner ast.NodeTransformer (ImpDupRemover) to strip all ast.Import and ast.ImportFrom nodes from the module body while collecting their unique aliases. A second inner helper (ImpDupSolver) then reinserts the deduplicated imports at the top of the module.

Deduplication is keyed on (name, asname) for ast.Import and on (module, level){(name, asname), ...} for ast.ImportFrom, so multiple from X import a, b statements for the same module are merged into a single node.

Parameters:

code (ast.Module) – The AST module whose import statements are to be deduplicated.

Returns:

The modified AST module with all duplicate imports removed and unique imports reinserted at the top of the module body.

Return type:

ast.AST

class pig.synth.llm_pre.ExtractArgs(only_args=False)
class pig.synth.llm_pre.ExtractVarMap(nodeo)
pig.synth.llm_pre.FindLastExpr(parent, node, depth)

Find the statement-type ancestor of an expression node at the given depth.

Traverses the parent mapping upward, skipping intermediate expression-type nodes, until a statement-type (stmt_type) ancestor is reached. Similar to FindRealParent(), but returns node itself (rather than None) if no ancestor is found.

Parameters:
  • parent (dict) – The parent mapping as produced by ParentAst(), mapping each node to the set of its direct children.

  • node (ast.AST) – The AST expression node to start traversal from.

  • depth (int) – The number of statement-level hops to traverse. depth=1 returns the immediate statement ancestor of node.

Returns:

The statement-type ancestor at the requested depth, None if an unexpected node type is encountered during traversal, or node itself if no parent is found at all.

Return type:

ast.AST | None

class pig.synth.llm_pre.ModUseVars(mapping, funcdefs, ParentO, name_of_nodeo=None)
pig.synth.llm_pre.check_two_sim(roota, rooto, var, noden, rootc, surnodes)

Rename a variable in noden to match its counterpart in the new library code.

Locates the definition of var in the old API’s AST (roota), finds the corresponding name in the new library’s AST (rootc) via matching.matcher1(), and rewrites all ast.Name references to var in noden and surnodes to use the new name.

Parameters:
  • roota (ast.AST) – The AST of the old API usage context, used to locate the definition of var.

  • rooto (ast.AST) – The original source AST (reserved for future use).

  • var (str) – The variable name to look up and potentially rename.

  • noden (ast.AST) – The primary AST node in which var references are to be rewritten.

  • rootc (str) – The unparsed source string of the new library’s AST, used as the rename target by matching.matcher1().

  • surnodes (list[ast.AST]) – Additional AST nodes in which var references are also rewritten if a new name is found.

Returns:

True if at least one reference to var was renamed, False otherwise.

Return type:

bool

pig.synth.llm_pre.extract_var_map(nodeo, noden, codeo, coden, parento, b0=False)

Build a variable rename mapping between old and new API usage nodes.

Compares the variables used in nodeo (old API) and noden (new API) via ExtractVarMap and filters the raw candidates down to a mapping of {old_name: new_name} pairs that represent genuine renames.

Filtering is handled by four inner helpers:

  • check_text_sim(var1, var2): accepts a pair if their string similarity exceeds 0.5 or one is a substring of the other.

  • check_ast_sim(var1, var2, codeo, coden): locates the assignment nodes for each variable and delegates to matching.single_matcher() to confirm structural similarity.

  • check_targets(node, var): searches a single AST node for an assignment to var, handling ast.Assign, ast.AugAssign, ast.AnnAssign, and ast.With targets.

  • filter(codeo, coden, v1, parento, nodeo): excludes pairs where the variable is defined in both the old and new code within the same scope, indicating it is not a rename but a shared local name.

When b0 is True, only AST-similarity is used and constant assignments that directly match their value in the old code are excluded. When b0 is False, both text and AST similarity are applied alongside the scope filter.

Parameters:
  • nodeo (ast.AST) – The old API AST node whose variable references are the source of the mapping.

  • noden (ast.AST) – The new API AST node whose variable references are the rename targets.

  • codeo (ast.AST) – The full old code AST, used to locate variable definitions for similarity checks.

  • coden (ast.AST) – The full new code AST, used to locate variable definitions for similarity checks.

  • parento – The parent mapping of the old code as produced by call.ParentAst(), used for scope resolution.

  • b0 (bool) – If True, apply AST-similarity-only filtering suitable for direct API node comparisons. If False, apply the full text and AST similarity pipeline with scope filtering. Defaults to False.

Returns:

A mapping from each old variable name to its corresponding new variable name.

Return type:

dict[str, str]

pig.synth.llm_pre.is_async(node)

Check whether an AST node contains any asynchronous constructs.

Walks the AST subtree rooted at node and returns True if any ast.Await, ast.AsyncWith, or ast.AsyncFor node is found.

Parameters:

node (ast.AST) – The AST node to inspect.

Returns:

True if the node contains asynchronous constructs, False otherwise.

Return type:

bool

pig.synth.llm_pre.libname(libo)

Resolve the importable top-level package name for a given library identifier.

Looks up libo in the combined git location map (GIT_LOC) and derives the importable name from the final component of its repository path. A small set of known mismatches between PyPI package names and importable names are corrected by hard-coded overrides.

Parameters:

libo (str) – The library identifier (typically the PyPI package name) to resolve.

Returns:

The importable top-level package name for libo, or libo itself if it cannot be found in the location map.

Return type:

str

pig.synth.llm_pre.scope_name(nodeo, noden, parent)

Return the name of the enclosing function or class scope for a node.

Determines the appropriate scope by inspecting noden: if the new node is itself a function or class definition, the search starts two levels up from nodeo; otherwise it starts one level up. The scope name is resolved via slicing.extract_name().

Parameters:
  • nodeo (ast.AST) – The original AST node whose enclosing scope is to be found.

  • noden (ast.AST) – The new AST node used to decide the traversal depth.

  • parent (dict) – The parent mapping as produced by call.ParentAst().

Returns:

The name of the enclosing function, class, or 'module' if the traversal reaches the module root.

Return type:

str | None

matching

This file is used to match the target ast node with the new ast node Input: target node, LLM Code Output: matched ast node

pig.synth.matching.filter_stmt(noden, nodeo, apins, rootn, apio)

Determine whether a new API statement genuinely references a new API.

Checks whether any name referenced in noden corresponds to a known new API entry in apins, after stripping locally assigned names and function arguments that would shadow external references. Several early-exit conditions are also handled:

  • If noden and nodeo unparse identically and apio is 'get', returns True immediately.

  • If apio appears as a string constant inside nodeo, returns True immediately.

Names are collected from ast.Name, ast.Attribute, and ast.operator nodes. Import aliases are resolved so that an as-aliased name is expanded to its original name before matching. Locally assigned targets and function arguments within the enclosing function or class scope are removed from the candidate set to avoid false positives.

Parameters:
  • noden (ast.AST) – The new API AST node whose name references are inspected.

  • nodeo (ast.AST) – The old API AST node, used for identity comparison and constant checking.

  • apins (dict) – A mapping from API path to lists of (name, ...) tuples representing known new API names.

  • rootn (ast.AST) – The root AST module of the new code, used to resolve import aliases and locate the enclosing scope.

  • apio (str) – The old API name, used for the identity and constant early-exit checks.

Returns:

True if any name in noden matches a known new API entry, False otherwise.

Return type:

bool

pig.synth.matching.matcher(rootb_str, roota_str, nodeo, roota, rooto, dec=False, api=None, gumtree=True)

Find the new code node that corresponds to an old API call node.

Uses GumTree (via JPype) to compute a tree-diff between the old and new code strings, then walks the sub-expressions of nodeo to locate the best-matching node in roota. Matching is performed by unparsing each sub-expression, finding its character offsets in rootb_str, and querying the Java PMatcher for the corresponding span in roota_str. The candidate with the highest vote count across sub-expressions is returned as the winner via decide_winner().

Parameters:
  • rootb_str (str) – The unparsed source string of the old code, used as the baseline for the GumTree diff.

  • roota_str (str) – The unparsed source string of the new code, used as the revised version for the GumTree diff.

  • nodeo (ast.AST) – The old API call node to find a match for. If it is a ast.With or ast.AsyncWith, the context expression is used as the matching target.

  • roota (ast.AST) – The AST of the new code, used to resolve matched character spans back to AST nodes via BestMap() and to build a parent mapping via call.ParentAst().

  • rooto (ast.AST) – The AST of the old code (reserved for future use).

  • dec (bool) – If True, return the matched node directly without resolving it to its nearest statement ancestor via call.FindRealParent(). Defaults to False.

  • api – Reserved for future use. Defaults to None.

  • gumtree (bool) – If True, use the custom GumTree jar (ours.jar); otherwise use the default jar (default.jar). Also controls whether unmatched sub-expressions are treated as hard failures. Defaults to True.

Returns:

A tuple (nodeo, noden) where noden is the best-matching AST node found in roota, or None if no match could be determined. For ast.arg nodes with type annotations, returns (old_annotation, new_annotation) instead.

Return type:

tuple[ast.AST, ast.AST | None]

pig.synth.matching.single_matcher(rootb_str, roota_str, nodeo, result_noden, roota)

Determine whether two assignment nodes are structurally equivalent.

Uses GumTree (via JPype and ours.jar) to compute a tree-diff between the old and new code strings, then checks whether nodeo and result_noden are matched to each other in the resulting mapping.

The JVM is started on first call if it is not already running.

Parameters:
  • rootb_str (str) – The unparsed source string of the old code, passed to the GumTree matcher as the baseline.

  • roota_str (str) – The unparsed source string of the new code, passed to the GumTree matcher as the revised version.

  • nodeo (Union[ast.Assign, ast.AnnAssign, ast.AugAssign, ast.Expression]) – The assignment node from the old code to match.

  • result_noden (Union[ast.Assign, ast.AnnAssign, ast.AugAssign, ast.Expression]) – The assignment node from the new code to match against nodeo.

  • roota (ast.AST) – The AST of the new code, used to build a parent mapping via call.ParentAst().

Returns:

True if nodeo and result_noden are identified as a matched pair by GumTree, False otherwise.

Return type:

bool

pig.synth.matching.total_mappings(rooto, rootn, codea, parento, mapping, libo, libn, oldapi, nodeo=None, noden=None, name1=None, name2=None)

Resolve the final variable rename mapping between old and new API code.

Aggregates per-scope variable mappings collected across multiple nodes and filters out invalid candidates — such as library names, API names, imported aliases, and names already used as assignment targets in the old code — to produce a clean {old_name: new_name} mapping suitable for applying to the migrated code.

Filtering is performed by the inner helper check(val), which rejects a candidate new name if any of the following hold:

  • The old or new code imports libo under an alias that matches val.

  • val coincides with the old API name, library name, or their canonical forms as resolved by llm_pre.libname().

  • val is already used as an assignment target or function argument in the old code.

If nodeo, noden, name1, and name2 are all provided, node-level mappings extracted via llm_pre.extract_var_map() are incorporated before the scope-level mapping is applied.

Parameters:
  • rooto (ast.AST) – The AST of the original (old API) code.

  • rootn (ast.AST) – The AST of the new (LLM-generated) code.

  • codea (str) – The unparsed source string of the migrated code, used as context for name resolution.

  • parento – The parent mapping of the old code as produced by call.ParentAst().

  • mapping (dict[tuple[str, str], set[str]]) – A mapping from (old_variable_name, scope_name) tuples to the set of candidate new variable names collected across nodes.

  • libo (str) – The name of the original library.

  • libn (str) – The name of the new library.

  • oldapi (str) – The old API name, excluded from rename candidates.

  • nodeo (ast.AST | None) – The specific old API call node, used for node-level mapping when provided alongside noden, name1, and name2.

  • noden (ast.AST | None) – The specific new API call node, used for node-level mapping.

  • name1 (str | None) – The scope name of nodeo, used to restrict node-level mapping to the correct scope.

  • name2 (str | None) – The scope name of noden, used to restrict node-level mapping to the correct scope.

Returns:

A mapping from each old variable name to its resolved new name.

Return type:

dict[str, str]

sketch

pig.synth.sketch.PreRequired(h, key, val, history, mappings, CENs, UnAssignedVarsO, ParentO, ParentN, coden, FuncDefs, OldApi, libn, libo, apis, b_imports, b_surround, rootb_str, roota_str, rooto, roota, has_dec=False)

Resolve unassigned variables in a migrated node by adding surround nodes and imports.

For a given (key, val) pair representing an old-to-new API node mapping, identifies variables used in val that are not yet assigned in the new code (target_names), then attempts to satisfy them in two stages:

  1. Surround nodes (if b_surround is True): calls synthesis.Surround() to find and insert neighbouring statements from the old code that define the remaining variables.

  2. Import statements (if b_imports is True): calls fix_import.Importfind() with full path-checking for each variable still unresolved after stage 1; otherwise falls back to a direct import lookup (check=False).

Variables that appear in CENs, FuncDefs, UnAssignedVarsO, or the current variable rename mappings are excluded from target_names before resolution begins. If val references the old API name as a load (but not a store), it is added back to target_names to ensure the corresponding import is included.

Parameters:
  • h (ast.AST) – The current working AST being built up, modified in-place by synthesis.Surround().

  • key (ast.AST) – The original old API call node that triggered this migration step.

  • val (ast.AST) – The new API node that replaces key.

  • history (dict) – A mutable state dict tracking processed nodes; its 'changes' entry is updated with any newly added surround nodes.

  • mappings (dict) – The variable rename mapping from (old_name, scope_name) to candidate new names, used to exclude already-resolved variables from target_names.

  • CENs (set) – The set of built-in and context-defined names to exclude from dependency resolution.

  • UnAssignedVarsO (dict) – A mapping from scope name to the set of variables that are unassigned in the old code’s scope.

  • ParentO – The parent mapping of the old code as produced by call.ParentAst().

  • ParentN – The parent mapping of the new code as produced by call.ParentAst().

  • coden (ast.AST) – The AST of the new (LLM-generated) code, passed to fix_import.Importfind() for import resolution.

  • FuncDefs (set) – The set of function names defined in the old code, excluded from target_names.

  • OldApi (str) – The old API name; if referenced as a load in val, it is added to target_names to ensure its import is retained.

  • libn (str) – The name of the new library.

  • libo (str) – The name of the original library.

  • apis – The API map of the new library, passed through to fix_import.Importfind().

  • b_imports (bool) – If True, resolve remaining variables to import statements using full path validation; otherwise use direct lookup.

  • b_surround (bool) – If True, attempt to resolve unassigned variables by inserting surround nodes from the old code before falling back to imports.

  • rootb_str (str) – The unparsed source string of the old code, passed through to synthesis.Surround() for GumTree matching.

  • roota_str (str) – The unparsed source string of the new code, passed through to synthesis.Surround() for GumTree matching.

  • rooto (ast.AST) – The AST of the old code.

  • roota (ast.AST) – The AST of the new code.

  • has_dec (bool) – Reserved for future use. Defaults to False.

Returns:

A tuple (h, NCImport, CENs1, history) where h is the updated working AST, NCImport is the set of newly resolved import nodes, CENs1 is the updated set of resolved import names, and history is the updated history dict.

Return type:

tuple[ast.AST, set, set, dict]

pig.synth.sketch.migrator(OldApi, OCNs, ParentN, ParentO, codeo, coden, libo, libn, history, FuncDefs, UnAssignedVarsO, CENs, OldTree1, codeo_str, coden_str, apis, b_imports=True, b_varmap=True, b_surround=True, b_postprocess=True, gumtree=True)

Migrate all old API call sites in codeo to their new API equivalents.

For each old API call node in OCNs[OldApi], finds the corresponding new API node in coden via GumTree-based tree matching, applies variable rename mappings, resolves unassigned variables through surround nodes and import statements, and returns the fully migrated AST.

The migration proceeds in four stages:

  1. Classification – partitions OCNs[OldApi] into normal statement nodes, decorator nodes, class-base nodes, exception handlers, and type-annotation args.

  2. Matching – for each normal node, uses matching.matcher() (or llm_pre.MatchName() for structural nodes) to find the corresponding new node in coden. Nodes for which no match is found are added to del_nodes_cands for later removal.

  3. Pre-processing – for each (old, new) pair in result, calls PreRequired() to insert surround nodes and import statements that satisfy unresolved variable references.

  4. Post-processing – applies variable rename mappings via total_mappings(), rewrites the working AST h in-place, and removes any nodes in del_nodes_cands that were not matched.

Special cases handled during matching:

  • ast.ExceptHandler nodes are matched at the handler or Try level and stored by their .type attribute.

  • ast.Name nodes matched to ast.arg are replaced by the argument’s type annotation.

  • ast.With / ast.AsyncWith nodes are matched as whole context-manager statements.

  • Decorator nodes that match a ast.ClassDef or ast.FunctionDef have their bodies merged rather than replaced.

Parameters:
  • OldApi (str) – The old API name whose call sites are to be migrated.

  • OCNs – A mapping from API name to the list of AST nodes where that API is used, as produced by call.Preparation.

  • ParentN (dict) – The parent mapping of the new code as produced by call.ParentAst().

  • ParentO (dict) – The parent mapping of the old code as produced by call.ParentAst().

  • codeo (ast.AST) – The AST of the original (old API) code.

  • coden (ast.AST) – The AST of the new (LLM-generated) code.

  • libo (str) – The name of the original library.

  • libn (str) – The name of the new library.

  • history (dict[str, dict]) – A mutable state dict tracking processed nodes and accumulated changes across multiple SketchMaker calls.

  • FuncDefs – The set of function names defined in the old code, excluded from variable dependency resolution.

  • UnAssignedVarsO – A mapping from scope name to the set of variables unassigned in the old code’s scope.

  • CENs (set) – The set of built-in and context-defined names to exclude from dependency resolution.

  • OldTree1 – The GumTree representation of the old code, used by matching.var_divide() for sub-expression matching.

  • ParentO1 – An alternative parent mapping of the old code used for deeper ancestor lookups.

  • codeo_str (str) – The unparsed source string of the old code.

  • coden_str (str) – The unparsed source string of the new code.

  • apis – The API map of the new library, passed to fix_import.Importfind() and matching.filter_stmt().

  • b_imports (bool) – If True, resolve unassigned variables to import statements using full path validation. Defaults to True.

  • b_varmap (bool) – If True, apply variable rename mappings to the migrated nodes. Defaults to True.

  • b_surround (bool) – If True, insert surround nodes from the old code to satisfy unresolved variable references. Defaults to True.

  • b_postprocess (bool) – If True, run the post-processing step to apply rename mappings and clean up deleted nodes. Defaults to True.

  • gumtree (bool) – If True, use the custom GumTree matcher (ours.jar); otherwise use the default matcher. Defaults to True.

Returns:

The migrated AST with all matched old API nodes replaced by their new equivalents, surround nodes inserted, and imports resolved.

Return type:

ast.AST

synthesis

class pig.synth.synthesis.AsyncFD(NCF, check0, check1)
class pig.synth.synthesis.FindSurFCs(nv)
class pig.synth.synthesis.ImportDeleter(libo)

Remove all import statements that reference a given library.

Walks the AST and strips any ast.Import or ast.ImportFrom node whose module or alias name contains libo. A special case handles 'ruamel.yaml' which uses a dotted top-level name.

Parameters:

libo (str) – The library name to remove from import statements.

class pig.synth.synthesis.NameBool(name, ctx, depth, usedvars=None)
class pig.synth.synthesis.SynthDel(ONs, UnAssVars, UnUseVars, history=None, replace=False, replacenode=None, usedvars=None, dec=False)
class pig.synth.synthesis.SynthImport(NCImports)

Prepend a set of new import statements to the module body.

Visits the ast.Module node and inserts all nodes in NCImports at the beginning of node.body.

Parameters:

NCImports (set[Union[ast.Import, ast.Module, ast.ImportFrom]]) – The set of import nodes to prepend.

class pig.synth.synthesis.SynthSame(OCNP, NCNP, history, ParentO, HAS_CB=False, HAS_DEC=False)
class pig.synth.synthesis.TrimRoot(targets, exception)

Remove a specific set of statement nodes from the AST.

Walks the AST and returns None for any node found in targets, effectively deleting it. ast.With and ast.AsyncWith nodes that match exception are preserved regardless of whether they appear in targets.

Intended for cleaning up With/AsyncWith target statements that have been absorbed into a new context-manager node.

Parameters:
  • targets (list[ast.stmt]) – The list of statement nodes to remove.

  • exception (ast.stmt) – A single node that must never be removed, even if it appears in targets (typically the new With/AsyncWith node that absorbed the targets).

class pig.synth.synthesis.UnusedVars(libo=None, name='module')
class pig.synth.synthesis.VarExtractor(name='module', check=False)

Collect all variable names referenced within an AST, organised by scope.

Walks the AST and records every ast.Name identifier into a per-scope dict keyed by the enclosing function or class name ('module' for top-level code). Import names are collected separately in self.imports.

Parameters:
  • name (str) – The initial scope name. Defaults to 'module'.

  • check (bool) – If True, self.<attr> assignments are tracked as separate qualified names (e.g. 'self.foo').

Variables:
  • vars (dict[str, set[str]]) – Mapping from scope name to the set of variable names referenced in that scope.

  • imports (set[str]) – Set of all imported names and aliases.

pig.synth.synthesis.stmt_to_dec(key, val, h, ParentO, funcdefs)

Merge decorator lists from a new function node into the matching function in h.

If the new function name val.name already exists in funcdefs, its decorator list is appended to the matching node found in h. Otherwise, the enclosing function or class of key in the old code is located via call.FindFCParent() and its decorator list is updated instead.

Parameters:
  • key (ast.AST) – The old API call node used to locate the enclosing function or class when val.name is not in funcdefs.

  • val (Union[ast.FunctionDef, ast.AsyncFunctionDef]) – The new function node whose decorator list is to be merged.

  • h (ast.AST) – The current working AST, walked to find the target function.

  • ParentO (dict) – The parent mapping of the old code as produced by call.ParentAst().

  • funcdefs (set) – The set of function names already defined in the working AST.

Returns:

The modified working AST with the decorator list updated.

Return type:

ast.AST