FSM SOLID analysis and recommendations¶

This document reviews the finite state machine (FSM) design in TA.Utils.Core.StateMachine with respect to the SOLID principles, and proposes pragmatic improvements. It assumes familiarity with Finite State Machine as implemented in this repository.

Summary - The design is clean and pragmatic: small, composable abstractions, with a clear split between "state" and "machine" and sensible use of observability. - The main areas to tighten are: concurrency safety and visibility across threads, transition error handling (especially exceptions from a state’s RunAsync), resource disposal, and API shape.

Strengths (per SOLID) - Single Responsibility - IState encapsulates a state’s lifecycle: enter, run, exit, with a clear cancellation boundary. - IFiniteStateMachine\<TState\> focuses on orchestration and observation (start/stop/transition + observability). - FiniteStateMachine\<TState\> centralises transition sequencing: cancel current, wait, exit, enter, run next. - Open–Closed - Virtual methods permit extension via subclassing without modifying core code. - Logging depends on ILog and remains pluggable. - Liskov Substitution - The IState contract is straightforward; implementations can substitute so long as they honour cancellation and lifecycle hooks. - Interface Segregation - IState and IFiniteStateMachine\<TState\> are small and coherent. - Dependency Inversion - Depends on abstractions (IState, ILog, IObservable) rather than concrete frameworks.

Weaknesses and risks - Scope creep in the core class (SRP pressure) - FiniteStateMachine\<TState\> handles orchestration, concurrency/cancellation, signalling, event publication and logging. This increases the reasons for the class to change. - Limited extension points (OCP pressure) - No injection point for custom scheduling, transition guards, timeout policies, or exception handling strategies. Subclassing risks invariant breakage. - Invariants are not formalised (LSP risk) - Transition semantics are not strictly documented; overrides could violate cancellation/ordering guarantees. A public ManualResetEvent leaks an implementation detail a subclass might change. - Interface details - DisplayName on IState is UI‑oriented; some domains may not wish to couple display concerns into the core interface. - API leakage and synchronisation primitive - Exposing ManualResetEvent couples consumers to a particular synchronisation model; an awaitable API would be a better abstraction. - Concurrency/thread safety - No explicit locking around CurrentState/currentStateTask/stateCancellation. Concurrent TransitionTo calls can interleave. - Subject<T> emissions occur on background threads; consumers may incorrectly assume a context. - Error handling of state failures - Only OperationCanceledException is caught when waiting for the prior state; unexpected exceptions from RunAsync could become unobserved or skip completion signalling. - Resource disposal - The previous CancellationTokenSource is replaced but never disposed; repeated transitions can leak handles. - Logging responsibility - Logging inside the core class conflicts with the repository preference for a logging decorator. - Test synchronisation ergonomics - Exposing ManualResetEvent invites blocking APIs; async/await would be more idiomatic and composable.

Recommendations 1) Serialise transitions and protect shared state - Use a private lock or a serial queue to ensure TransitionTo is processed one at a time and to protect CurrentState, currentStateTask, stateCancellation and the stopped flag.

2) Dispose the previous CancellationTokenSource - Swap, cancel, and dispose the old CTS to avoid leaking OS handles.

// inside CancelCurrentState()
var old = Interlocked.Exchange(ref stateCancellation, new CancellationTokenSource());
try { old.Cancel(); } catch (ObjectDisposedException) { }
old.Dispose();

3) Catch and log all exceptions from state tasks and waits - Ensure the transition path cannot fault and skip completion signalling; log unexpected exceptions at Error and proceed.

try
{
    currentStateTask?.Wait();
}
catch (OperationCanceledException ex)
{
    log.Warn().Exception(ex).Message("State cancelled while awaiting completion").Write();
}
catch (Exception ex)
{
    log.Error().Exception(ex).Message("Unhandled exception from state RunAsync: {message}", ex.Message).Write();
}

4) Replace ManualResetEvent with an awaitable API - Add TransitionAsync(next, ct) that completes when the new state is fully activated (after OnEnter and RunAsync is scheduled). Retain the event internally for tests if required.

public Task TransitionAsync(TState next, CancellationToken ct = default)
{
    var tcs = new TaskCompletionSource();
    TransitionTo(next);
    Task.Run(() =>
    {
        StateChanged.WaitOne();
        tcs.TrySetResult();
    }, ct);
    return tcs.Task;
}

5) Introduce extension points (favour composition) - ITransitionScheduler – how transitions are queued/executed. - IExceptionPolicy – how to deal with exceptions from OnExit/RunAsync (log, rethrow, transition to fallback). - ITransitionGuard – optional policy to allow/deny a transition.

6) Seal the core class and add a logging decorator - Seal FiniteStateMachine\<TState\> to protect invariants. - Move logging into a FiniteStateMachineLoggingDecorator\<TState\> that wraps IFiniteStateMachine\<TState\> and logs around calls.

7) Provide async‑first APIs - Add StartAsync, TransitionAsync, StopAsync (with optional timeouts). Keep synchronous counterparts for convenience, but prefer async in usage and examples.

8) Document or parameterise the observer context - Clearly document that ObservableStates emits on background threads, or accept an IScheduler to marshal notifications.

9) Optional: make state publication replayable - Provide a way for late subscribers to receive the current state immediately (e.g., cache last activation and emit it on subscription).

10) Consider decoupling display concerns - Make DisplayName optional or move UI‑oriented concerns to a separate optional interface.

11) Add transition timeouts and fallbacks - Ensure a misbehaving state that ignores cancellation cannot block shutdown indefinitely; log at Error and move to a safe state after a timeout.

12) Extend the test suite for race and failure modes - Add specs for rapid sequential transitions, Stop during a transition, exceptions thrown by RunAsync, and OnExit throwing. Assert that invariants hold and notifications always complete.

Expected benefits - Stronger invariants and simpler reasoning about concurrency. - Cleaner API surface (async/await rather than synchronisation primitives). - Alignment with repository preferences (sealed by default; logging via decorator). - Improved testability and resilience, with pluggable policies. - Fewer leaks and unobserved exceptions; more graceful behaviour under failure.