Architecture

How the reasoner turns an FDL knowledge base into an answer. The pipeline has four stages: parse -> knowledge base -> MILP encoding -> solver dispatch. This page is a conceptual map of the fuzzydl package; for the public how-to see Usage.

Pipeline overview

 .fdl source
     |  DLParser / DLParserFast
     v
 KnowledgeBase  --(axioms, assertions, TBox)
     |  kb.solve_kb()      preprocess once
     v
 tableau expansion  --(lazy unfolding of concepts/roles into constraints)
     |  via MILPHelper
     v
 MILP model  (Variable / Term / Expression / Inequation)
     |  query.solve(kb) -> MILPHelper.optimize(objective)
     v
 solver backend  (Gurobi | MIP | PuLP: CBC/GLPK/HiGHS/CPLEX)
     |
     v
 Solution

1. Parsing

DLParser (pyparsing) or DLParserFast (hand-rolled tokenizer + recursive-descent; see Optional: Compile the Fast Parser in Installation and Configuration) reads the source and populates two class-level fields shared by both implementations:

DLParser.kb — a KnowledgeBase being filled with concepts, axioms, and assertions.
DLParser.queries_list — the parsed queries.

get_kb(path) returns (kb, queries).

2. KnowledgeBase

fuzzy_dl_owl2.fuzzydl.knowledge_base.KnowledgeBase is the central object (the single largest module). It stores the ABox (assertions about individuals) and TBox (concept/role axioms) and owns a MILPHelper instance (self.milp).

kb.solve_kb() performs one-time preprocessing before any query runs:

defaults the semantics to Łukasiewicz if none was set;
computes the description-logic language of the KB;
interns symbolic strings to integers for speed;
resolves role axioms — inverse, role-inclusion, reflexive, functional;
preprocesses the TBox and computes the blocking strategy;
marks the KB loaded (KB_LOADED = True).

After this, each query reasons against the prepared KB.

3. Tableau to MILP encoding

Reasoning is tableau-style but the branching is replaced by constraints: each logical construct is lazily unfolded into rows of a Mixed-Integer Linear Program rather than expanded into an explicit completion tree. The unfolding rules live on KnowledgeBase (e.g. solve_gci, solve_lukasiewicz_gci, rule_domain_lazy_unfolding, …) and emit constraints through the MILP layer.

The MILP class hierarchy

The model is built from four classes in fuzzy_dl_owl2.fuzzydl.milp:

Class	Role
`Variable`	A decision variable. Its `VariableType` is one of `BINARY`, `CONTINUOUS`, `INTEGER`, `SEMI_CONTINUOUS`. Degrees, role memberships and nominals each become a variable.
`Term`	A coefficient times a `Variable` (a `coefficient * Variable` product).
`Expression`	A constant plus a sum of `Term`s — a linear expression.
`Inequation`	An `Expression` compared against zero with an `InequalityType` (`GREATER_THAN`, `LESS_THAN`, `EQUAL`).

MILPHelper

MILPHelper is the only sanctioned way to build the model. Key entry points:

get_variable(*args) — the single allocator for degree / role / nominal variables. It is heavily overloaded (by assertion, relation, individual+concept, created individual, …) so callers ask for the variable that represents a given domain entity and get a stable handle back.
add_new_constraint(expr, op, degree) — adds a constraint. Constraints are normalised so the right-hand side is zero: the helper shifts the expression by the Degree before storing the Inequation.
optimize(objective) — solves the current model for an objective Expression and returns a Solution.

The fuzzy-operator encodings (the t-norm / t-conorm / implication rows for Łukasiewicz and Zadeh) are produced only by the dedicated solver classes (LukasiewiczSolver, ZadehSolver); other code does not inline these.

4. Solver dispatch

MILPHelper.optimize selects the backend from ConfigReader.MILP_PROVIDER:

`MILP_PROVIDER`	Method	Backend
`GUROBI`	`solve_gurobi`	Gurobi
`MIP`	`solve_mip`	Python-MIP
`PULP`, `PULP_GLPK`, `PULP_HIGHS`, `PULP_CPLEX`	`solve_pulp`	PuLP (CBC / GLPK / HiGHS / CPLEX)

An unrecognised provider raises ValueError. The chosen backend also fixes the numerical limit \(k_{\infty}\) (MAXVAL) — higher limits accumulate floating-point error, so each solver gets a tuned value (see the MILP Solver Constraints table in Grammatics). Install/config per backend is in Installation and Configuration.

5. Answering a query

A query holds the objective it wants optimised. query.solve(kb):

clones the prepared MILP model (so each query is independent);
applies any query-specific assertions (e.g. solve_assertions);
calls cloned.optimize(self.obj_expr) to get the optimum;
wraps the outcome in a Solution.

If the KB is inconsistent, the Solution reports is_consistent_kb() == False and every query trivially answers 1.0.

Native acceleration

The whole pipeline runs as pure Python, but the hot paths are also shipped as compiled extensions. When the package is built from source (pip install . / poetry build), the build.py Poetry hook produces native .so (.pyd on Windows) modules in place; Python automatically prefers a compiled module over its .py twin when both are present, so importing is unchanged. If the build step is skipped, every module falls back to pure Python — same results, lower speed. See Optional: Compile the Fast Parser in Installation and Configuration for the toolchain.

Cython-compiled modules

build.py cythonizes the modules that dominate runtime — knowledge_base, general_concept_inclusion, primitive_concept_definition, graph/digraph, util/constants, and everything under concept/, individual/, query/, milp/, and parser/ (except the pure-data tokens.py, generate.py, tokenizer_handler.py, and the legacy dl_parser.py). They are compiled with aggressive directives (boundscheck=False, wraparound=False, initializedcheck=False, cdivision=True, nonecheck=False, infer_types=True) and -O3. PEP 484 annotation enforcement is off (annotation_typing=False) so Optional/Union arguments still accept None at the C level. The extensions are built in place next to their .py sources, so the package stays importable either way.

C tokenizer (re2c / flex)

The fast parser’s tokenizer has a C scanner core, generated from a single master table (generate.py) into one of two equivalent backends:

lexer_re2c.re compiled by re2c (preferred), or
lexer_flex.l compiled by flex (fallback).

The scanner exposes fdl_count_tokens / fdl_tokenize, which fill three int32 spans per token (kind, start, length) and build no Python objects — the expensive per-token tuple construction is kept out of the C pass. Two extensions wrap it inside parser/tokenizer/:

_fdl_lexer — a CFFI wrapper exposing the raw int32-array API used by tokens.py.
_fdl_tuples — a Cython module whose tokenize_tuples(bytes) runs both scanner passes and emits the (kind, value, lower, offset) 4-tuples the recursive-descent parser consumes directly, replacing the Python tuple-building loop. (COMMA is dropped; a trailing EOF sentinel is appended.)

At runtime the tokenizer picks the fastest backend available, in order:

Cython _fdl_tuples — tuples built entirely in C (fastest).
CFFI _fdl_lexer — C scan into int32 arrays, tuples assembled in Python.
Pure-Python regex (PythonRegexTokenizer) — the master regex; always available, used when neither extension was built.

_fdl_ok reports whether the C backend is importable; large files are streamed form-by-form (FdlFileTokenizer) so peak token memory stays bounded regardless of backend.

Performance notes

get_kb / main disable Python’s cyclic garbage collector during the run: KB construction builds a large acyclic object graph with nothing to reclaim mid-parse, so the collector would only scan the growing graph repeatedly. It is restored afterwards.
Building the concept_individual_list blocking index appends in ascending individual-id order, so membership uses an in-place sorted insert rather than rebuilding the set per addition.