Architecture
How the reasoner turns an FDL knowledge base into an answer. The pipeline has
four stages: parse -> knowledge base -> MILP encoding ->
solver dispatch. This page is a conceptual map of the fuzzydl package; for
the public how-to see Usage.
Pipeline overview
.fdl source
| DLParser / DLParserFast
v
KnowledgeBase --(axioms, assertions, TBox)
| kb.solve_kb() preprocess once
v
tableau expansion --(lazy unfolding of concepts/roles into constraints)
| via MILPHelper
v
MILP model (Variable / Term / Expression / Inequation)
| query.solve(kb) -> MILPHelper.optimize(objective)
v
solver backend (Gurobi | MIP | PuLP: CBC/GLPK/HiGHS/CPLEX)
|
v
Solution
1. Parsing
DLParser (pyparsing) or DLParserFast (hand-rolled tokenizer +
recursive-descent; see Optional: Compile the Fast Parser in
Installation and Configuration) reads the source and
populates two class-level fields shared by both implementations:
DLParser.kb— aKnowledgeBasebeing filled with concepts, axioms, and assertions.DLParser.queries_list— the parsed queries.
get_kb(path) returns (kb, queries).
2. KnowledgeBase
fuzzy_dl_owl2.fuzzydl.knowledge_base.KnowledgeBase is the central object (the
single largest module). It stores the ABox (assertions about individuals) and
TBox (concept/role axioms) and owns a MILPHelper instance (self.milp).
kb.solve_kb() performs one-time preprocessing before any query runs:
defaults the semantics to Łukasiewicz if none was set;
computes the description-logic language of the KB;
interns symbolic strings to integers for speed;
resolves role axioms — inverse, role-inclusion, reflexive, functional;
preprocesses the TBox and computes the blocking strategy;
marks the KB loaded (
KB_LOADED = True).
After this, each query reasons against the prepared KB.
3. Tableau to MILP encoding
Reasoning is tableau-style but the branching is replaced by constraints:
each logical construct is lazily unfolded into rows of a Mixed-Integer Linear
Program rather than expanded into an explicit completion tree. The unfolding
rules live on KnowledgeBase (e.g. solve_gci, solve_lukasiewicz_gci,
rule_domain_lazy_unfolding, …) and emit constraints through the MILP layer.
The MILP class hierarchy
The model is built from four classes in fuzzy_dl_owl2.fuzzydl.milp:
Class |
Role |
|---|---|
|
A decision variable. Its |
|
A coefficient times a |
|
A constant plus a sum of |
|
An |
MILPHelper
MILPHelper is the only sanctioned way to build the model. Key entry points:
get_variable(*args)— the single allocator for degree / role / nominal variables. It is heavily overloaded (by assertion, relation, individual+concept, created individual, …) so callers ask for the variable that represents a given domain entity and get a stable handle back.add_new_constraint(expr, op, degree)— adds a constraint. Constraints are normalised so the right-hand side is zero: the helper shifts the expression by theDegreebefore storing theInequation.optimize(objective)— solves the current model for an objectiveExpressionand returns aSolution.
The fuzzy-operator encodings (the t-norm / t-conorm / implication rows for
Łukasiewicz and Zadeh) are produced only by the dedicated solver classes
(LukasiewiczSolver, ZadehSolver); other code does not inline these.
4. Solver dispatch
MILPHelper.optimize selects the backend from ConfigReader.MILP_PROVIDER:
|
Method |
Backend |
|---|---|---|
|
|
Gurobi |
|
|
Python-MIP |
|
|
PuLP (CBC / GLPK / HiGHS / CPLEX) |
An unrecognised provider raises ValueError. The chosen backend also fixes the
numerical limit \(k_{\infty}\) (MAXVAL) — higher limits accumulate floating-point error,
so each solver gets a tuned value (see the MILP Solver Constraints table in
Grammatics). Install/config per backend is in
Installation and Configuration.
5. Answering a query
A query holds the objective it wants optimised. query.solve(kb):
clones the prepared MILP model (so each query is independent);
applies any query-specific assertions (e.g.
solve_assertions);calls
cloned.optimize(self.obj_expr)to get the optimum;wraps the outcome in a
Solution.
If the KB is inconsistent, the Solution reports is_consistent_kb() == False
and every query trivially answers 1.0.
Native acceleration
The whole pipeline runs as pure Python, but the hot paths are also shipped as
compiled extensions. When the package is built from source (pip install . /
poetry build), the build.py Poetry hook produces native .so (.pyd on
Windows) modules in place; Python automatically prefers a compiled module over
its .py twin when both are present, so importing is unchanged. If the build
step is skipped, every module falls back to pure Python — same results, lower
speed. See Optional: Compile the Fast Parser in
Installation and Configuration for the toolchain.
Cython-compiled modules
build.py cythonizes the modules that dominate runtime — knowledge_base,
general_concept_inclusion, primitive_concept_definition, graph/digraph,
util/constants, and everything under concept/, individual/, query/,
milp/, and parser/ (except the pure-data tokens.py, generate.py,
tokenizer_handler.py, and the legacy dl_parser.py). They are compiled with
aggressive directives (boundscheck=False, wraparound=False,
initializedcheck=False, cdivision=True, nonecheck=False, infer_types=True)
and -O3. PEP 484 annotation enforcement is off (annotation_typing=False)
so Optional/Union arguments still accept None at the C level. The
extensions are built in place next to their .py sources, so the package stays
importable either way.
C tokenizer (re2c / flex)
The fast parser’s tokenizer has a C scanner core, generated from a single master
table (generate.py) into one of two equivalent backends:
The scanner exposes fdl_count_tokens / fdl_tokenize, which fill three
int32 spans per token (kind, start, length) and build no Python objects —
the expensive per-token tuple construction is kept out of the C pass. Two
extensions wrap it inside parser/tokenizer/:
_fdl_lexer— a CFFI wrapper exposing the raw int32-array API used bytokens.py._fdl_tuples— a Cython module whosetokenize_tuples(bytes)runs both scanner passes and emits the(kind, value, lower, offset)4-tuples the recursive-descent parser consumes directly, replacing the Python tuple-building loop. (COMMAis dropped; a trailing EOF sentinel is appended.)
At runtime the tokenizer picks the fastest backend available, in order:
Cython
_fdl_tuples— tuples built entirely in C (fastest).CFFI
_fdl_lexer— C scan into int32 arrays, tuples assembled in Python.Pure-Python regex (
PythonRegexTokenizer) — the master regex; always available, used when neither extension was built.
_fdl_ok reports whether the C backend is importable; large files are streamed
form-by-form (FdlFileTokenizer) so peak token memory stays bounded regardless
of backend.
Performance notes
get_kb/maindisable Python’s cyclic garbage collector during the run: KB construction builds a large acyclic object graph with nothing to reclaim mid-parse, so the collector would only scan the growing graph repeatedly. It is restored afterwards.Building the
concept_individual_listblocking index appends in ascending individual-id order, so membership uses an in-place sorted insert rather than rebuilding the set per addition.
See also
Fuzzy Concepts — the concept objects that get unfolded.
Queries and Solutions — the query/result objects.
Modifiers and Degrees — how degrees enter the MILP.
Grammatics — FDL syntax and the per-solver \(k_{\infty}\) table.