Architecture

How the reasoner turns an FDL knowledge base into an answer. The pipeline has four stages: parse -> knowledge base -> MILP encoding -> solver dispatch. This page is a conceptual map of the fuzzydl package; for the public how-to see Usage.

Pipeline overview

 .fdl source
     |  DLParser / DLParserFast
     v
 KnowledgeBase  --(axioms, assertions, TBox)
     |  kb.solve_kb()      preprocess once
     v
 tableau expansion  --(lazy unfolding of concepts/roles into constraints)
     |  via MILPHelper
     v
 MILP model  (Variable / Term / Expression / Inequation)
     |  query.solve(kb) -> MILPHelper.optimize(objective)
     v
 solver backend  (Gurobi | MIP | PuLP: CBC/GLPK/HiGHS/CPLEX)
     |
     v
 Solution

1. Parsing

DLParser (pyparsing) or DLParserFast (hand-rolled tokenizer + recursive-descent; see Optional: Compile the Fast Parser in Installation and Configuration) reads the source and populates two class-level fields shared by both implementations:

  • DLParser.kb — a KnowledgeBase being filled with concepts, axioms, and assertions.

  • DLParser.queries_list — the parsed queries.

get_kb(path) returns (kb, queries).

2. KnowledgeBase

fuzzy_dl_owl2.fuzzydl.knowledge_base.KnowledgeBase is the central object (the single largest module). It stores the ABox (assertions about individuals) and TBox (concept/role axioms) and owns a MILPHelper instance (self.milp).

kb.solve_kb() performs one-time preprocessing before any query runs:

  • defaults the semantics to Łukasiewicz if none was set;

  • computes the description-logic language of the KB;

  • interns symbolic strings to integers for speed;

  • resolves role axioms — inverse, role-inclusion, reflexive, functional;

  • preprocesses the TBox and computes the blocking strategy;

  • marks the KB loaded (KB_LOADED = True).

After this, each query reasons against the prepared KB.

3. Tableau to MILP encoding

Reasoning is tableau-style but the branching is replaced by constraints: each logical construct is lazily unfolded into rows of a Mixed-Integer Linear Program rather than expanded into an explicit completion tree. The unfolding rules live on KnowledgeBase (e.g. solve_gci, solve_lukasiewicz_gci, rule_domain_lazy_unfolding, …) and emit constraints through the MILP layer.

The MILP class hierarchy

The model is built from four classes in fuzzy_dl_owl2.fuzzydl.milp:

Class

Role

Variable

A decision variable. Its VariableType is one of BINARY, CONTINUOUS, INTEGER, SEMI_CONTINUOUS. Degrees, role memberships and nominals each become a variable.

Term

A coefficient times a Variable (a coefficient * Variable product).

Expression

A constant plus a sum of Terms — a linear expression.

Inequation

An Expression compared against zero with an InequalityType (GREATER_THAN, LESS_THAN, EQUAL).

MILPHelper

MILPHelper is the only sanctioned way to build the model. Key entry points:

  • get_variable(*args) — the single allocator for degree / role / nominal variables. It is heavily overloaded (by assertion, relation, individual+concept, created individual, …) so callers ask for the variable that represents a given domain entity and get a stable handle back.

  • add_new_constraint(expr, op, degree) — adds a constraint. Constraints are normalised so the right-hand side is zero: the helper shifts the expression by the Degree before storing the Inequation.

  • optimize(objective) — solves the current model for an objective Expression and returns a Solution.

The fuzzy-operator encodings (the t-norm / t-conorm / implication rows for Łukasiewicz and Zadeh) are produced only by the dedicated solver classes (LukasiewiczSolver, ZadehSolver); other code does not inline these.

4. Solver dispatch

MILPHelper.optimize selects the backend from ConfigReader.MILP_PROVIDER:

MILP_PROVIDER

Method

Backend

GUROBI

solve_gurobi

Gurobi

MIP

solve_mip

Python-MIP

PULP, PULP_GLPK, PULP_HIGHS, PULP_CPLEX

solve_pulp

PuLP (CBC / GLPK / HiGHS / CPLEX)

An unrecognised provider raises ValueError. The chosen backend also fixes the numerical limit \(k_{\infty}\) (MAXVAL) — higher limits accumulate floating-point error, so each solver gets a tuned value (see the MILP Solver Constraints table in Grammatics). Install/config per backend is in Installation and Configuration.

5. Answering a query

A query holds the objective it wants optimised. query.solve(kb):

  1. clones the prepared MILP model (so each query is independent);

  2. applies any query-specific assertions (e.g. solve_assertions);

  3. calls cloned.optimize(self.obj_expr) to get the optimum;

  4. wraps the outcome in a Solution.

If the KB is inconsistent, the Solution reports is_consistent_kb() == False and every query trivially answers 1.0.

Native acceleration

The whole pipeline runs as pure Python, but the hot paths are also shipped as compiled extensions. When the package is built from source (pip install . / poetry build), the build.py Poetry hook produces native .so (.pyd on Windows) modules in place; Python automatically prefers a compiled module over its .py twin when both are present, so importing is unchanged. If the build step is skipped, every module falls back to pure Python — same results, lower speed. See Optional: Compile the Fast Parser in Installation and Configuration for the toolchain.

Cython-compiled modules

build.py cythonizes the modules that dominate runtime — knowledge_base, general_concept_inclusion, primitive_concept_definition, graph/digraph, util/constants, and everything under concept/, individual/, query/, milp/, and parser/ (except the pure-data tokens.py, generate.py, tokenizer_handler.py, and the legacy dl_parser.py). They are compiled with aggressive directives (boundscheck=False, wraparound=False, initializedcheck=False, cdivision=True, nonecheck=False, infer_types=True) and -O3. PEP 484 annotation enforcement is off (annotation_typing=False) so Optional/Union arguments still accept None at the C level. The extensions are built in place next to their .py sources, so the package stays importable either way.

C tokenizer (re2c / flex)

The fast parser’s tokenizer has a C scanner core, generated from a single master table (generate.py) into one of two equivalent backends:

  • lexer_re2c.re compiled by re2c (preferred), or

  • lexer_flex.l compiled by flex (fallback).

The scanner exposes fdl_count_tokens / fdl_tokenize, which fill three int32 spans per token (kind, start, length) and build no Python objects — the expensive per-token tuple construction is kept out of the C pass. Two extensions wrap it inside parser/tokenizer/:

  • _fdl_lexer — a CFFI wrapper exposing the raw int32-array API used by tokens.py.

  • _fdl_tuples — a Cython module whose tokenize_tuples(bytes) runs both scanner passes and emits the (kind, value, lower, offset) 4-tuples the recursive-descent parser consumes directly, replacing the Python tuple-building loop. (COMMA is dropped; a trailing EOF sentinel is appended.)

At runtime the tokenizer picks the fastest backend available, in order:

  1. Cython _fdl_tuples — tuples built entirely in C (fastest).

  2. CFFI _fdl_lexer — C scan into int32 arrays, tuples assembled in Python.

  3. Pure-Python regex (PythonRegexTokenizer) — the master regex; always available, used when neither extension was built.

_fdl_ok reports whether the C backend is importable; large files are streamed form-by-form (FdlFileTokenizer) so peak token memory stays bounded regardless of backend.

Performance notes

  • get_kb / main disable Python’s cyclic garbage collector during the run: KB construction builds a large acyclic object graph with nothing to reclaim mid-parse, so the collector would only scan the growing graph repeatedly. It is restored afterwards.

  • Building the concept_individual_list blocking index appends in ascending individual-id order, so membership uses an in-place sorted insert rather than rebuilding the set per addition.

See also