Skip to content

System Design

FenLiu's architecture is built around a clean separation of concerns with async I/O, type-safe models, and a modular API design.

High-Level Architecture

┌─────────────────────────────────────────────────────┐
│          Web Browser (Jinja2 Templates)             │
└──────────────────┬──────────────────────────────────┘
                   │ HTTP
┌──────────────────▼──────────────────────────────────┐
│   PyView Application (Starlette-based LiveView)     │
│  ┌────────────────────────────────────────────────┐ │
│  │ REST API Layer (/api/v1/*)                    │ │
│  │ ├─ Streams (hashtag management)                │ │
│  │ ├─ Posts (fetch, review, filter)               │ │
│  │ ├─ Review (approval workflow)                  │ │
│  │ ├─ Reblog Controls (export filters)            │ │
│  │ └─ Curated Queue (export API)                  │ │
│  └────────────────────────────────────────────────┘ │
│  ┌────────────────────────────────────────────────┐ │
│  │ Services Layer (Business Logic)                │ │
│  │ ├─ Spam Scoring (rule-based + ML ready)        │ │
│  │ ├─ Fediverse Client (ActivityPub fetching)     │ │
│  │ ├─ Export Eligibility (reblog filters)         │ │
│  │ └─ API Key Management                          │ │
│  └────────────────────────────────────────────────┘ │
└──────────────────┬──────────────────────────────────┘
                   │ SQL
┌──────────────────▼──────────────────────────────────┐
│   SQLAlchemy ORM + Alembic Migrations              │
│   ├─ Post (content from Fediverse)                 │
│   ├─ HashtagStream (monitored hashtags)            │
│   ├─ ReviewFeedback (approval history)             │
│   ├─ BlockedUser & BlockedHashtag (filters)        │
│   ├─ AppSetting (configuration)                    │
│   ├─ QueueStats (aggregate deletion counters)      │
│   └─ ErrorHistory (audit trail for error posts)   │
└──────────────────┬──────────────────────────────────┘
                   │
┌──────────────────▼──────────────────────────────────┐
│          SQLite Database (fenliu.db)                │
└──────────────────────────────────────────────────────┘

Core Concepts

Hashtag Streams

A stream monitors a specific hashtag on a specific Fediverse instance. When you fetch, FenLiu retrieves recent posts with that hashtag and stores them in the database.

Spam Scoring

Each post is automatically scored 0-100 using rule-based detection: - Link count - Mention abuse - Repetitive content - Hashtag abuse - All-caps text - Suspicious patterns

Higher scores indicate likely spam.

Manual Review

You manually approve or reject posts. Approved posts enter the export queue; rejected posts are excluded. Your decisions are recorded for future ML training.

Export Queue (Curated Queue)

Approved posts are held in a queue with states: pending → reserved → delivered (or error). External tools (like Zhongli bot) fetch posts via API, process them, and acknowledge with ack/nack/error callbacks.

Reblog Controls

Export filters applied after approval: - Blocked users: author is in blocklist - Blocked hashtags: post contains blocklisted hashtag - Attachments-only: post must have media - Auto-reject: automatically reject matching posts before review

Data Models

Post

Represents a post fetched from Fediverse.

Key fields: - post_id: ActivityPub URI (unique identifier) - url: Human-readable web URL - content: Post text - author_username: Account that posted - instance: Fediverse instance source - spam_score: Automatic 0-100 score - manual_spam_score: Your adjusted score (if any) - approved: True if you approved it - reviewed: True if you made a decision - created_at: When post was published on Fediverse - reviewed_at: When you approved/rejected it - queue_status: pending/reserved/delivered/error - error_reason: Why export failed (if applicable)

HashtagStream

Represents a hashtag you're monitoring.

Key fields: - hashtag: The hashtag name - instance: Mastodon instance - active: Whether fetching is enabled - posts: Relationship to posts from this stream

ReviewFeedback

Records your approval/rejection decisions for ML training. Includes a snapshot of post features captured at review time so training data survives post deletion.

Key fields: - post_id: Which post (cascade-deleted with the post) - decision: "approved", "rejected", or "score_adjusted" - manual_score: Score you chose - created_at: When you reviewed - Snapshot fields: content_snippet, spam_score_at_review, hashtag_count, hashtags_snapshot, attachment_count, has_video, boosts, likes, replies, author_is_bot, instance, stream_id

BlockedUser & BlockedHashtag

Export filters.

BlockedUser: - account_identifier: @user@instance.social format or pattern - pattern_type: exact, suffix, prefix, or contains - notes: Reason for blocking - created_at: When added to blocklist

BlockedHashtag: - hashtag: Hashtag to exclude (lowercase, without #) - notes: Reason for blocking - created_at: When added to blocklist

API Design

All endpoints follow REST semantics: - GET - retrieve data - POST - create or action - PATCH - partial update - DELETE - remove

Base URL: /api/v1

Resource Patterns

Streams: /hashtags - GET /hashtags - list - POST /hashtags - create - GET /hashtags/{id} - detail - PATCH /hashtags/{id} - update - DELETE /hashtags/{id} - delete - POST /hashtags/{id}/fetch - fetch posts

Posts: /posts - GET /posts - list with filtering - GET /posts/{id} - detail - PATCH /posts/{id} - update (approve/reject/adjust score) - POST /posts/batch-score - batch scoring

Curated Queue: /curated - GET /curated/next - get next post (reserves it) - POST /curated/{id}/ack - confirm export - POST /curated/{id}/nack - return to queue - POST /curated/{id}/error - mark permanently failed - POST /curated/{id}/requeue - manually requeue - POST /curated/cleanup - delete delivered posts older than N days - POST /curated/trim-pending - trim excess pending posts

Technology Stack

Web Framework: PyView (Starlette-based LiveView) - Real-time capable - Minimal JavaScript required - Server-side rendering

Database: SQLAlchemy ORM + SQLite - Type-safe queries - Relationship management - Eager loading for performance

Migrations: Alembic - Schema versioning - Automatic on startup - Reproducible across environments

Async Runtime: asyncio - Async/await throughout - Non-blocking I/O - Sync-only for SQLite (acceptable)

Type Safety: Pydantic + Type Hints - Comprehensive type annotations - Request/response validation - IDE support

Testing: pytest - 389 tests - Coverage across all major functionality - Regression test suite

Key Design Patterns

Database Session Management

All database access uses context managers:

with get_db() as db:
    # Do database operations
    db.commit()
# Session automatically closed
Guarantees cleanup even on exceptions.

Async I/O

HTTP calls to Fediverse use async:

async with httpx.AsyncClient() as client:
    response = await client.get(url)
Prevents blocking on network I/O.

Service Layer

Business logic separated from API:

API Handler → Service → Database
Services handle spam scoring, Fediverse fetching, export eligibility checks.

Export Eligibility

Filters applied consistently whether exporting or auto-rejecting:

result = check_reblog_filters(post, db)
if result.eligible:
    # Export or approve
else:
    # Skip with reason

Performance Optimizations

Eager Loading: Relationships pre-loaded to prevent N+1 queries

posts = select(Post).options(
    selectinload(Post.stream),
    selectinload(Post.review_feedback)
)

SQL Aggregates: Use database count() instead of loading all rows

count = db.scalar(select(func.count(Post.id)))

Index Strategy: Indexes on frequently filtered columns (spam_score, created_at, queue_status).

Security

API Key Authentication: All protected endpoints require X-API-Key header Session Management: Proper cleanup prevents resource leaks Input Validation: Pydantic schemas validate all requests Type Safety: Type hints catch many errors at development time

Deployment

Development:

fenliu --reload --debug
Auto-reloading, debug logging to file.

Production:

fenliu --host 0.0.0.0 --port 8000
Can be proxied behind nginx/Cloudflare for HTTPS and rate limiting.

Queue Lifecycle Management

Delivered posts accumulate over time. FenLiu manages their lifecycle to keep the database lean while preserving all-time statistics.

Cleanup (Delivered Posts)

POST /api/v1/curated/cleanup?retention_days=7

Deletes delivered posts older than retention_days. Before deletion, the count is added to QueueStats.total_deleted_delivered so all-time stats remain correct. Runs daily via APScheduler and is also available on demand.

Trim Pending

POST /api/v1/curated/trim-pending?lookback_days=3

Trims the pending queue when it exceeds twice the recent daily consumption rate. Uses weighted random selection — older posts, posts with fewer likes, and posts from prolific authors are more likely to be trimmed. Removed count is added to QueueStats.total_deleted_pending.

Error History

When error posts are deleted (via cleanup), their error_reason is saved to ErrorHistory before deletion. This preserves the "Most Frequent Error" metric across the full lifetime of the system.

Stats Preservation

  • QueueStats — singleton row with cumulative deletion counts
  • ErrorHistory — append-only log of deleted error reasons
  • Queue Preview and Statistics pages combine active + historical counts for all-time figures

Future Extensibility

Machine Learning: ReviewFeedback model ready for training data Database: Easy to migrate from SQLite to PostgreSQL Export Formats: Queue API is tool-agnostic (not tied to specific consumer) Custom Rules: Framework ready for user-defined spam scoring rules