API Reference

This page provides detailed documentation for all public classes and functions in skfeaturellm.

Core Classes

Main module for LLM-powered feature engineering.

class skfeaturellm.feature_engineer.LLMFeatureEngineer(problem_type: str, model_name: str = 'gpt-4', target_col: str | None = None, max_features: int | None = None, feature_prefix: str = 'llm_feat_', verbose: int = 0, **kwargs)[source]

Bases: BaseEstimator, TransformerMixin

A scikit-learn compatible transformer that uses LLMs for feature engineering.

Parameters:

model_name (str, default="gpt-4") – Name of the model to use
problem_type (str) – Machine learning problem type (classification or regression)
target_col (Optional[str]) – Name of the target column for supervised feature engineering
max_features (Optional[int]) – Maximum number of features to generate
feature_prefix (str) – Prefix to add to generated feature names
verbose (int, default=0) – Verbosity level for fit_selective(). 0 = silent, 1 = one line per round, 2 = include selected feature names.
**kwargs – Additional keyword arguments for the LLMInterface

evaluate_features(X: DataFrame, y: Series, is_transformed: bool = False) → FeatureEvaluationResult[source]

Evaluate the quality of generated features.

Parameters:

X (pd.DataFrame) – Input features
y (pd.Series) – Target variable
is_transformed (bool) – Whether the features have already been transformed

Returns:

Result object containing the evaluation metrics

Return type:

FeatureEvaluationResult

fit(X: DataFrame, y: Series | None = None, feature_descriptions: List[Dict[str, Any]] | None = None, target_description: str | None = None) → LLMFeatureEngineer[source]

Generate feature engineering ideas using LLM and store the transformations.

Parameters:

X (pd.DataFrame) – Input features
y (Optional[pd.Series]) – Target variable used to compute dataset statistics for the prompt
feature_descriptions (Optional[List[Dict[str, Any]]]) – List of feature descriptions
target_description (Optional[str]) – Description of the target variable

Returns:

self – The fitted transformer

Return type:

LLMFeatureEngineer

fit_selective(X: DataFrame, y: Series, selector: SelectorMixin, n_rounds: int = 3, eval_set: tuple[DataFrame, Series] | None = None, feature_descriptions: List[Dict[str, Any]] | None = None, target_description: str | None = None) → LLMFeatureEngineer[source]

Iteratively generate and select features using an LLM and a feature selector.

In each round the LLM proposes new features, the selector is fitted on the generated features (using eval_set if provided, otherwise training data), and the selection results are fed back to the LLM as context for the next round. Only the features that survive selection across all rounds are kept.

Parameters:

X (pd.DataFrame) – Training features. Transformations are always fitted on this data.
y (pd.Series) – Training target.
selector (SelectorMixin) – An initialised scikit-learn–compatible selector (e.g. SelectKBest(k=5), SelectFromModel(RandomForestClassifier())).
n_rounds (int, default=3) – Number of generate→select→feedback rounds.
eval_set (tuple of (pd.DataFrame, pd.Series), optional) – Validation data (X_val, y_val). When provided the selector is fitted on the validation features so that selection reflects generalisation, not training performance.
feature_descriptions (list of dict, optional) – Descriptions for input features. Auto-detected from X if omitted.
target_description (str, optional) – Description of the target variable passed to the LLM.

Returns:

self – The fitted transformer. Call transform() to apply the selected features and to_transformer() to export them for production.

Return type:

LLMFeatureEngineer

set_fit_request(*, feature_descriptions: bool | None | str = '$UNCHANGED$', target_description: bool | None | str = '$UNCHANGED$') → LLMFeatureEngineer

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Parameters:

feature_descriptions (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for feature_descriptions parameter in fit.
target_description (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for target_description parameter in fit.

Returns:

self – The updated object.

Return type:

object

to_transformer(features: List[str] | None = None) → FeatureEngineeringTransformer[source]

Create a FeatureEngineeringTransformer from the successfully generated features.

Parameters:: features (list of str, optional) – Names of features to include. Accepts names with or without the feature_prefix. If None, all successfully generated features are included.
Returns:: Unfitted transformer ready to be used in a Pipeline.
Return type:: FeatureEngineeringTransformer
Raises:: ValueError – If fit() has not been called yet.

transform(X: DataFrame) → DataFrame[source]

Apply the generated feature transformations to new data.

Parameters:: X (pd.DataFrame) – Input features
Returns:: Input dataframe with the generated features
Return type:: pd.DataFrame

Feature Engineering Transformer

FeatureEngineeringTransformer: a scikit-learn-compatible transformer backed by a fixed set of LLM-generated (or manually configured) transformations.

Designed for the production phase after exploration with LLMFeatureEngineer:

ideas = engineer.fit(X_train, y_train).generated_features transformer = engineer.to_transformer() pipeline = Pipeline([(“features”, transformer), (“model”, XGBClassifier())]) pipeline.fit(X_train, y_train)

class skfeaturellm.feature_engineering_transformer.FeatureEngineeringTransformer(transformations: List[Dict[str, Any]] | None = None, feature_prefix: str = 'llm_feat_', raise_on_error: bool = False)[source]

Bases: BaseEstimator, TransformerMixin

Scikit-learn-compatible transformer that applies a fixed set of transformations.

Unlike LLMFeatureEngineer (which calls an LLM during fit), FeatureEngineeringTransformer is fully deterministic — it receives transformation configs at construction time and simply fits/applies them. This makes it safe to use inside Pipeline, GridSearchCV, cross_val_score, and joblib.

Parameters:

transformations (list of dict) – List of transformation config dicts, each with at minimum a “type” key. Same format accepted by TransformationPipeline.from_dict().
feature_prefix (str, default “llm_feat_”) – Prefix applied to generated feature names.
raise_on_error (bool, default False) – If True, raise on transformation errors. If False, skip with a warning.

executor_

Fitted executor (available after fit()).

Type:: TransformationPipeline

feature_names_in_

Column names seen during fit().

Type:: list of str

Examples

>>> transformer = FeatureEngineeringTransformer(
...     transformations=[{"type": "log", "feature_name": "log_income", "columns": ["income"]}]
... )
>>> transformer.fit(X_train).transform(X_test)

fit(X: DataFrame, y: Series | None = None) → FeatureEngineeringTransformer[source]

Build and fit the transformation executor.

Parameters:

X (pd.DataFrame) – Training data.
y (pd.Series, optional) – Ignored; present for sklearn API compatibility.

Return type:

self

get_feature_names_out(input_features: List[str] | None = None) → ndarray[source]

Return feature names for the output of transform().

Parameters:: input_features (list of str, optional) – Ignored; original feature names come from feature_names_in_.
Return type:: np.ndarray of str

classmethod load(path: str | Path) → FeatureEngineeringTransformer[source]

Load a FeatureEngineeringTransformer from a JSON file produced by save().

Parameters:: path (str or Path) – Source file path.
Returns:: An unfitted transformer; call fit() before transforming.
Return type:: FeatureEngineeringTransformer

save(path: str | Path) → None[source]

Save transformer configuration to a JSON file.

Only the constructor parameters are saved (not the fitted state). Call fit() again after loading to restore the fitted executor.

Parameters:: path (str or Path) – Destination file path.

transform(X: DataFrame) → DataFrame[source]

Apply all fitted transformations.

Parameters:: X (pd.DataFrame) – Data to transform.
Returns:: Copy of X with new feature columns appended.
Return type:: pd.DataFrame

LLM Interface

Module for handling interactions with Language Models.

class skfeaturellm.llm_interface.LLMInterface(model_name: str = 'gpt-4o', **kwargs)[source]

Bases: object

Interface for interacting with Language Models for feature engineering.

Parameters:

model_name (str, default="gpt-4o") – Name of the model to use
**kwargs – Additional keyword arguments passed to init_chat_model (e.g., temperature, max_tokens, api_key, etc.)

generate_engineered_features(feature_descriptions: List[FeatureDescription], target_description: str | None = None, max_features: int | None = None, problem_type: ProblemType | None = None, dataset_statistics: str | None = None) → FeatureEngineeringIdeas[source]

Generate feature engineering ideas.

Parameters:

feature_descriptions (List[FeatureDescription]) – Descriptions for input features
target_description (Optional[str]) – Description of the target variable and task
max_features (Optional[int]) – Maximum number of features to generate
dataset_statistics (Optional[str]) – Pre-formatted dataset statistics string

Returns:

Generated feature engineering ideas

Return type:

FeatureEngineeringIdeas

generate_engineered_features_iterative(prompt_context: Dict, conversation_history: List[BaseMessage], feedback_context: Dict | None = None) → Tuple[FeatureEngineeringIdeas, List[BaseMessage]][source]

Generate feature engineering ideas in an iterative conversation.

Parameters:

prompt_context (Dict) – Prompt context dict
conversation_history (List[BaseMessage]) – Accumulated conversation messages. Empty on the first round.
feedback_context (Optional[Dict]) – Feedback dict with keys selected_features_table, rejected_features_table, and max_features. Required for rounds after the first.

Returns:

The generated ideas and the updated conversation history (input messages + AI response appended).

Return type:

Tuple[FeatureEngineeringIdeas, List[BaseMessage]]

generate_prompt_context(feature_descriptions: List[Dict[str, str]], target_description: str | None = None, max_features: int | None = None, problem_type: ProblemType | None = None, dataset_statistics: str | None = None) → str[source]

Generate the prompt for the LLM.

Parameters:

feature_descriptions (List[Dict[str, str]]) – List of dictionaries containing feature descriptions
target_description (Optional[str]) – Description of the target variable and task
max_features (Optional[int]) – Maximum number of features to generate
dataset_statistics (Optional[str]) – Pre-formatted dataset statistics string from _format_dataset_statistics

Returns:

Formatted prompt

Return type:

str

Schemas

Pydantic models for data validation and serialization.

class skfeaturellm.schemas.FeatureDescription(*, name: str, type: str, description: str)[source]

Bases: BaseModel

Schema for describing a single feature in the dataset.

name

The name of the feature

Type:: str

type

The data type of the feature (e.g., ‘int’, ‘float’, ‘str’, ‘datetime’)

Type:: str

description

A description of what the feature represents

Type:: str

description: str

format() → str[source]

Format the feature description in a human-readable way.

Returns:: Formatted feature description in the format: “name (type): description”
Return type:: str

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str

type: str

class skfeaturellm.schemas.FeatureDescriptions(*, features: List[FeatureDescription])[source]

Bases: BaseModel

Schema for a collection of feature descriptions.

features

List of feature descriptions

Type:: List[skfeaturellm.schemas.FeatureDescription]

features: List[FeatureDescription]

format() → str[source]

Format all feature descriptions in a human-readable way.

Returns:: Formatted feature descriptions, one per line
Return type:: str

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class skfeaturellm.schemas.FeatureEngineeringIdea(*, type: str, feature_name: str, description: str, columns: List[str], parameters: TransformationParameters | None = None)[source]

Bases: BaseModel

Schema for a feature engineering idea generated by the LLM.

This schema is designed to map directly to the transformation executor, ensuring that LLM output can be reliably executed.

Supports both binary operations (add, sub, mul, div) and unary operations (log, sqrt, abs, etc.).

type

The transformation type

Type:: str

feature_name

Name for the resulting feature

Type:: str

description

Explanation of what the feature represents and why it’s useful

Type:: str

columns

List of column names required for the transformation

Type:: List[str]

parameters

Optional dictionary of additional parameters (e.g., constants)

Type:: skfeaturellm.schemas.TransformationParameters | None

Examples

Unary operation (log): >>> FeatureEngineeringIdea( … type=”log”, … feature_name=”log_income”, … description=”Log of income to reduce skewness”, … columns=[“income”] … )

Binary operation (division of two columns): >>> FeatureEngineeringIdea( … type=”div”, … feature_name=”income_per_person”, … description=”Average income per household member”, … columns=[“total_income”, “household_size”] … )

Binary operation (multiply column by constant): >>> FeatureEngineeringIdea( … type=”mul”, … feature_name=”income_doubled”, … description=”Income multiplied by 2 for scaling”, … columns=[“income”], … parameters={“constant”: 2.0} … )

columns: List[str]

description: str

feature_name: str

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

parameters: TransformationParameters | None

to_executor_dict() → dict[source]

Convert to a dictionary format compatible with TransformationPipeline.

Returns:: Dictionary with type, feature_name, columns, and optional parameters
Return type:: dict

type: str

validate_operands() → FeatureEngineeringIdea[source]: Validate operands based on transformation type.

class skfeaturellm.schemas.FeatureEngineeringIdeas(*, ideas: List[FeatureEngineeringIdea])[source]

Bases: BaseModel

Schema for a list of feature engineering ideas generated by the LLM.

This is the top-level schema used with LangChain’s with_structured_output().

ideas: List[FeatureEngineeringIdea]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

to_executor_config() → dict[source]

Convert to a configuration dict compatible with TransformationPipeline.from_dict().

Returns:: Dictionary with ‘transformations’ key containing list of transformation configs
Return type:: dict

class skfeaturellm.schemas.TransformationParameters(*, constant: float | None = None, power: float | None = None, n_bins: int | None = None, bin_edges: List[float] | None = None)[source]

Bases: BaseModel

Parameters for transformations.

Used for structured output compatibility. Explicitly defines allowed fields: constant for binary ops, power for pow op, n_bins for bin op.

bin_edges: List[float] | None

constant: float | None

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_bins: int | None

power: float | None

Feature Evaluation

class skfeaturellm.feature_evaluation.FeatureEvaluationResult(metrics_df: DataFrame, primary_metric: str | None = None, X: DataFrame | None = None, y: Series | None = None, problem_type: ProblemType | None = None)[source]

Bases: object

Class for storing and presenting feature evaluation results.

property summary: DataFrame: Returns the metrics DataFrame sorted by the primary metric descending.

to_dict() → Dict[str, Any][source]: Convert results to dictionary.

to_html(path: str) → None[source]

Save a self-contained HTML report to disk.

Parameters:: path (str) – File path where the HTML report will be saved.

class skfeaturellm.feature_evaluation.FeatureEvaluator(problem_type: ProblemType)[source]

Bases: object

Class for evaluating the quality of generated features.

evaluate(X: DataFrame, y: Series, features: List[str]) → FeatureEvaluationResult[source]

Evaluate features using various metrics.

Parameters:

X (pd.DataFrame) – Input features
y (pd.Series) – Target variable
features (List[str]) – List of features to evaluate

Returns:

Result object containing the evaluation metrics

Return type:

FeatureEvaluationResult

plot_distributions(X: DataFrame, y: Series, features: List[str]) → Dict[str, Figure][source]

Plot feature vs target for each feature.

Parameters:

X (pd.DataFrame) – Input features
y (pd.Series) – Target variable
features (List[str]) – List of features to plot

Returns:

Dictionary mapping feature names to their figures

Return type:

Dict[str, Figure]

Reporting

Transformations

The Feature Transformation DSL: structured, validated transformations for feature engineering.

Feature Transformation DSL.

This subpackage provides a structured, validated, and secure way to represent and execute feature transformations.

class skfeaturellm.transformations.AbsTransformation(feature_name: str, columns: List[str], parameters: Dict[str, Any] | None = None)[source]

Bases: UnaryTransformation

Absolute value transformation: abs(column).

Examples

>>> t = AbsTransformation("abs_diff", columns=["difference"])

classmethod get_prompt_description() → str[source]

Return a description of this transformation for use in LLM prompts.

Returns:: Human-readable description of what this transformation does
Return type:: str

class skfeaturellm.transformations.AddTransformation(feature_name: str, columns: List[str], parameters: Dict[str, Any] | None = None)[source]

Bases: BinaryArithmeticTransformation

Addition transformation: left + right.

Examples

>>> t = AddTransformation("total", columns=["a", "b"])
>>> t = AddTransformation("plus_ten", columns=["a"], parameters={"constant": 10.0})

classmethod get_prompt_description() → str[source]

Return a description of this transformation for use in LLM prompts.

Returns:: Human-readable description of what this transformation does
Return type:: str

class skfeaturellm.transformations.BaseTransformation[source]

Bases: ABC

Abstract base class for all feature transformations.

Subclasses must implement: - transform(): Apply the transformation to a DataFrame (replaces execute()) - get_required_columns(): Return columns needed for the transformation - feature_name property: Name of the output feature - get_prompt_description(): Return description for LLM prompts

The fit/transform pattern mirrors scikit-learn conventions: - fit(df): learn any stateful parameters from training data; stateless

transforms inherit the default no-op implementation.

transform(df): apply the transformation using fitted state.
fit_transform(df): convenience method combining fit + transform.

abstract property feature_name: str: Name of the resulting feature.

fit(df: DataFrame) → BaseTransformation[source]

Fit the transformation to training data.

The default implementation validates required columns and returns self. Stateful subclasses should override this to learn parameters from the training data.

Parameters:: df (pd.DataFrame) – The training DataFrame
Returns:: self
Return type:: BaseTransformation

fit_transform(df: DataFrame) → Series[source]

Fit and transform in a single step.

Parameters:: df (pd.DataFrame) – The input DataFrame
Returns:: The resulting feature values
Return type:: pd.Series

abstract classmethod get_prompt_description() → str[source]

Return a description of this transformation for use in LLM prompts.

Returns:: Human-readable description of what this transformation does
Return type:: str

abstract get_required_columns() → Set[str][source]

Return the set of column names required by this transformation.

Returns:: Set of required column names
Return type:: Set[str]

abstract transform(df: DataFrame) → Series[source]

Apply the transformation to a DataFrame.

Parameters:: df (pd.DataFrame) – The input DataFrame
Returns:: The resulting feature values with name set to feature_name
Return type:: pd.Series
Raises:: TransformationError – If the transformation fails

validate_columns(df: DataFrame) → None[source]

Validate that all required columns exist in the DataFrame.

Parameters:: df (pd.DataFrame) – The input DataFrame
Raises:: ColumnNotFoundError – If any required column is missing

class skfeaturellm.transformations.BinaryArithmeticTransformation(feature_name: str, columns: List[str], parameters: Dict[str, Any] | None = None)[source]

Bases: BaseTransformation

Base class for binary arithmetic transformations.

Supports operations between two columns or between a column and a constant.

Parameters:

feature_name (str) – Name for the resulting feature
columns (List[str]) – List of column names (1 or 2 columns)
parameters (Optional[Dict[str, Any]]) – Optional parameters dict with ‘constant’ key for column-constant operations

property feature_name: str: Name of the resulting feature.

get_required_columns() → Set[str][source]

Return the set of column names required by this transformation.

Returns:: Set of required column names
Return type:: Set[str]

transform(df: DataFrame) → Series[source]: Apply the transformation.

exception skfeaturellm.transformations.ColumnNotFoundError[source]

Bases: TransformationError

Raised when a required column is not found in the DataFrame.

class skfeaturellm.transformations.DivTransformation(feature_name: str, columns: List[str], parameters: Dict[str, Any] | None = None)[source]

Bases: BinaryArithmeticTransformation

Division transformation: left / right.

Raises DivisionByZeroError if division by zero is detected.

Examples

>>> t = DivTransformation("ratio", columns=["a", "b"])
>>> t = DivTransformation("halved", columns=["a"], parameters={"constant": 2.0})

classmethod get_prompt_description() → str[source]

Return a description of this transformation for use in LLM prompts.

Returns:: Human-readable description of what this transformation does
Return type:: str

exception skfeaturellm.transformations.DivisionByZeroError[source]

Bases: TransformationError

Raised when a division by zero is detected.

class skfeaturellm.transformations.ExpTransformation(feature_name: str, columns: List[str], parameters: Dict[str, Any] | None = None)[source]

Bases: UnaryTransformation

Exponential transformation: exp(column).

Examples

>>> t = ExpTransformation("exp_log_price", columns=["log_price"])

classmethod get_prompt_description() → str[source]

Return a description of this transformation for use in LLM prompts.

Returns:: Human-readable description of what this transformation does
Return type:: str

exception skfeaturellm.transformations.InvalidValueError[source]

Bases: TransformationError

Raised when a transformation encounters invalid values (e.g., log of negative).

class skfeaturellm.transformations.Log1pTransformation(feature_name: str, columns: List[str], parameters: Dict[str, Any] | None = None)[source]

Bases: UnaryTransformation

Log(1+x) transformation: log(1 + column).

Useful for data with zeros. Raises InvalidValueError if any values are < 0.

Examples

>>> t = Log1pTransformation("log1p_count", columns=["count"])

classmethod get_prompt_description() → str[source]

Return a description of this transformation for use in LLM prompts.

Returns:: Human-readable description of what this transformation does
Return type:: str

class skfeaturellm.transformations.LogTransformation(feature_name: str, columns: List[str], parameters: Dict[str, Any] | None = None)[source]

Bases: UnaryTransformation

Natural logarithm transformation: log(column).

Raises InvalidValueError if any values are <= 0.

Examples

>>> t = LogTransformation("log_income", columns=["income"])

classmethod get_prompt_description() → str[source]

Return a description of this transformation for use in LLM prompts.

Returns:: Human-readable description of what this transformation does
Return type:: str

class skfeaturellm.transformations.MaxTransformation(feature_name: str, columns: List[str], parameters: Dict[str, Any] | None = None)[source]

Bases: BinaryArithmeticTransformation

Element-wise maximum transformation: max(left, right).

Examples

>>> t = MaxTransformation("max_ab", columns=["a", "b"])
>>> t = MaxTransformation("at_least_zero", columns=["a"], parameters={"constant": 0.0})

classmethod get_prompt_description() → str[source]

Return a description of this transformation for use in LLM prompts.

Returns:: Human-readable description of what this transformation does
Return type:: str

class skfeaturellm.transformations.MinTransformation(feature_name: str, columns: List[str], parameters: Dict[str, Any] | None = None)[source]

Bases: BinaryArithmeticTransformation

Element-wise minimum transformation: min(left, right).

Examples

>>> t = MinTransformation("min_ab", columns=["a", "b"])
>>> t = MinTransformation("at_most_100", columns=["a"], parameters={"constant": 100.0})

classmethod get_prompt_description() → str[source]

Return a description of this transformation for use in LLM prompts.

Returns:: Human-readable description of what this transformation does
Return type:: str

class skfeaturellm.transformations.MulTransformation(feature_name: str, columns: List[str], parameters: Dict[str, Any] | None = None)[source]

Bases: BinaryArithmeticTransformation

Multiplication transformation: left * right.

Examples

>>> t = MulTransformation("product", columns=["a", "b"])
>>> t = MulTransformation("doubled", columns=["a"], parameters={"constant": 2.0})

classmethod get_prompt_description() → str[source]

Return a description of this transformation for use in LLM prompts.

Returns:: Human-readable description of what this transformation does
Return type:: str

class skfeaturellm.transformations.PowTransformation(feature_name: str, columns: List[str], parameters: Dict[str, Any] | None = None)[source]

Bases: UnaryTransformation

Power transformation: column ** power.

Raises InvalidValueError for invalid operations (e.g., negative base with fractional exponent).

Examples

>>> t = PowTransformation("age_squared", columns=["age"], parameters={"power": 2})
>>> t = PowTransformation("sqrt_area", columns=["area"], parameters={"power": 0.5})
>>> t = PowTransformation("inverse_distance", columns=["distance"], parameters={"power": -1})

classmethod get_prompt_description() → str[source]

Return a description of this transformation for use in LLM prompts.

Returns:: Human-readable description of what this transformation does
Return type:: str

class skfeaturellm.transformations.SqrtTransformation(feature_name: str, columns: List[str], parameters: Dict[str, Any] | None = None)[source]

Bases: UnaryTransformation

Square root transformation: sqrt(column).

Raises InvalidValueError if any values are < 0.

Examples

>>> t = SqrtTransformation("sqrt_area", columns=["area"])

classmethod get_prompt_description() → str[source]

Return a description of this transformation for use in LLM prompts.

Returns:: Human-readable description of what this transformation does
Return type:: str

class skfeaturellm.transformations.SubTransformation(feature_name: str, columns: List[str], parameters: Dict[str, Any] | None = None)[source]

Bases: BinaryArithmeticTransformation

Subtraction transformation: left - right.

Examples

>>> t = SubTransformation("difference", columns=["a", "b"])
>>> t = SubTransformation("minus_ten", columns=["a"], parameters={"constant": 10.0})

classmethod get_prompt_description() → str[source]

Return a description of this transformation for use in LLM prompts.

Returns:: Human-readable description of what this transformation does
Return type:: str

exception skfeaturellm.transformations.TransformationError[source]

Bases: Exception

Base exception for transformation errors.

exception skfeaturellm.transformations.TransformationParseError[source]

Bases: TransformationError

Raised when parsing a transformation definition fails.

class skfeaturellm.transformations.TransformationPipeline(transformations: List[BaseTransformation] | None = None, raise_on_error: bool = True)[source]

Bases: object

Executes a set of transformations against a DataFrame.

The executor can be initialized with transformations directly, or loaded from JSON/YAML configuration files.

Parameters:

transformations (List[BaseTransformation], optional) – List of transformation objects to execute
raise_on_error (bool, default=True) – If True, raise exceptions on transformation errors. If False, skip failed transformations with a warning.

Examples

Direct instantiation:

>>> from skfeaturellm.transformations import AddTransformation, DivTransformation
>>> executor = TransformationPipeline(transformations=[
...     DivTransformation("ratio", "a", right_column="b"),
...     AddTransformation("sum", "a", right_column="b"),
... ])
>>> result_df = executor.fit(df).transform(df)

From JSON file:

>>> executor = TransformationPipeline.from_json("transformations.json")
>>> result_df = executor.fit(df).transform(df)

From dict (e.g., LLM output):

>>> config = {"transformations": [{"type": "add", "feature_name": "sum", ...}]}
>>> executor = TransformationPipeline.from_dict(config)
>>> result_df = executor.fit(df).transform(df)

fit(df: DataFrame) → TransformationPipeline[source]

Fit all transformations to training data.

Parameters:: df (pd.DataFrame) – The training DataFrame
Returns:: self
Return type:: TransformationPipeline

classmethod from_dict(config: Dict[str, Any], raise_on_error: bool = True) → TransformationPipeline[source]

Create an executor from a dictionary configuration.

Parameters:

config (Dict[str, Any]) – Configuration dict with a “transformations” key containing a list of transformation definitions
raise_on_error (bool, default=True) – If True, raise exceptions on transformation errors

Returns:

Configured executor instance

Return type:

TransformationPipeline

Raises:

TransformationParseError – If the configuration is invalid

classmethod from_json(path: str | Path, raise_on_error: bool = True) → TransformationPipeline[source]

Create an executor from a JSON configuration file.

Parameters:

path (Union[str, Path]) – Path to the JSON configuration file
raise_on_error (bool, default=True) – If True, raise exceptions on transformation errors

Returns:

Configured executor instance

Return type:

TransformationPipeline

classmethod from_yaml(path: str | Path, raise_on_error: bool = True) → TransformationPipeline[source]

Create an executor from a YAML configuration file.

Parameters:

path (Union[str, Path]) – Path to the YAML configuration file
raise_on_error (bool, default=True) – If True, raise exceptions on transformation errors

Returns:

Configured executor instance

Return type:

TransformationPipeline

Raises:

ImportError – If PyYAML is not installed

get_required_columns(transformations: List[BaseTransformation] | None = None) → Set[str][source]

Get all column names required by transformations.

Parameters:: transformations (List[BaseTransformation], optional) – List of transformations to analyze. If not provided, uses self.transformations.
Returns:: Set of required column names
Return type:: Set[str]

transform(df: DataFrame) → DataFrame[source]

Apply all fitted transformations and return a DataFrame with new features.

Parameters:: df (pd.DataFrame) – The input DataFrame
Returns:: A copy of the input DataFrame with new feature columns added
Return type:: pd.DataFrame

class skfeaturellm.transformations.UnaryTransformation(feature_name: str, columns: List[str], parameters: Dict[str, Any] | None = None)[source]

Bases: BaseTransformation

Base class for unary transformations (single column operations).

Parameters:

feature_name (str) – Name for the resulting feature
columns (List[str]) – List with exactly one column name
parameters (Optional[Dict[str, Any]]) – Optional parameters (not used for basic unary operations)

property feature_name: str: Name of the resulting feature.

get_required_columns() → Set[str][source]

Return the set of column names required by this transformation.

Returns:: Set of required column names
Return type:: Set[str]

transform(df: DataFrame) → Series[source]: Apply the transformation.

skfeaturellm.transformations.get_all_operation_types() → Set[str][source]

Get all registered operation type names.

Returns:: Set of all operation names
Return type:: Set[str]

skfeaturellm.transformations.get_binary_operation_types() → Set[str][source]

Get the set of registered binary operation type names.

Returns:: Set of binary operation names (e.g., {“add”, “sub”, “mul”, “div”})
Return type:: Set[str]

skfeaturellm.transformations.get_registered_transformations() → Dict[str, Type[BaseTransformation]][source]: Return a copy of the transformation registry.

skfeaturellm.transformations.get_transformation_types_for_prompt() → str[source]

Generate documentation of available transformation types for LLM prompts.

This function dynamically generates the transformation types section by querying the registry and calling get_prompt_description() on each registered transformation class.

Returns:: Formatted documentation string listing all available transformations
Return type:: str

skfeaturellm.transformations.get_unary_operation_types() → Set[str][source]

Get the set of registered unary operation type names.

Returns:: Set of unary operation names (e.g., {“log”, “sqrt”, “abs”})
Return type:: Set[str]

skfeaturellm.transformations.register_transformation(name: str)[source]

Decorator to register a transformation class with a type name.

Parameters:: name (str) – The type name used in JSON/YAML configs (e.g., “add”, “div”)