System Design¶
FenLiu's architecture is built around a clean separation of concerns with async I/O, type-safe models, and a modular API design.
High-Level Architecture¶
┌─────────────────────────────────────────────────────┐
│ Web Browser (Jinja2 Templates) │
└──────────────────┬──────────────────────────────────┘
│ HTTP
┌──────────────────▼──────────────────────────────────┐
│ PyView Application (Starlette-based LiveView) │
│ ┌────────────────────────────────────────────────┐ │
│ │ REST API Layer (/api/v1/*) │ │
│ │ ├─ Streams (hashtag management) │ │
│ │ ├─ Posts (fetch, review, filter) │ │
│ │ ├─ Review (approval workflow) │ │
│ │ ├─ Reblog Controls (export filters) │ │
│ │ └─ Curated Queue (export API) │ │
│ └────────────────────────────────────────────────┘ │
│ ┌────────────────────────────────────────────────┐ │
│ │ Services Layer (Business Logic) │ │
│ │ ├─ Spam Scoring (rule-based + ML ready) │ │
│ │ ├─ Fediverse Client (ActivityPub fetching) │ │
│ │ ├─ Export Eligibility (reblog filters) │ │
│ │ └─ API Key Management │ │
│ └────────────────────────────────────────────────┘ │
└──────────────────┬──────────────────────────────────┘
│ SQL
┌──────────────────▼──────────────────────────────────┐
│ SQLAlchemy ORM + Alembic Migrations │
│ ├─ Post (content from Fediverse) │
│ ├─ HashtagStream (monitored hashtags) │
│ ├─ ReviewFeedback (approval history) │
│ ├─ BlockedUser & BlockedHashtag (filters) │
│ ├─ AppSetting (configuration) │
│ ├─ QueueStats (aggregate deletion counters) │
│ └─ ErrorHistory (audit trail for error posts) │
└──────────────────┬──────────────────────────────────┘
│
┌──────────────────▼──────────────────────────────────┐
│ SQLite Database (fenliu.db) │
└──────────────────────────────────────────────────────┘
Core Concepts¶
Hashtag Streams¶
A stream monitors a specific hashtag on a specific Fediverse instance. When you fetch, FenLiu retrieves recent posts with that hashtag and stores them in the database.
Spam Scoring¶
Each post is automatically scored 0-100 using rule-based detection: - Link count - Mention abuse - Repetitive content - Hashtag abuse - All-caps text - Suspicious patterns
Higher scores indicate likely spam.
Manual Review¶
You manually approve or reject posts. Approved posts enter the export queue; rejected posts are excluded. Your decisions are recorded for future ML training.
Export Queue (Curated Queue)¶
Approved posts are held in a queue with states: pending → reserved → delivered (or error). External tools (like Zhongli bot) fetch posts via API, process them, and acknowledge with ack/nack/error callbacks.
Reblog Controls¶
Export filters applied after approval: - Blocked users: author is in blocklist - Blocked hashtags: post contains blocklisted hashtag - Attachments-only: post must have media - Auto-reject: automatically reject matching posts before review
Data Models¶
Post¶
Represents a post fetched from Fediverse.
Key fields:
- post_id: ActivityPub URI (unique identifier)
- url: Human-readable web URL
- content: Post text
- author_username: Account that posted
- instance: Fediverse instance source
- spam_score: Automatic 0-100 score
- manual_spam_score: Your adjusted score (if any)
- approved: True if you approved it
- reviewed: True if you made a decision
- created_at: When post was published on Fediverse
- reviewed_at: When you approved/rejected it
- queue_status: pending/reserved/delivered/error
- error_reason: Why export failed (if applicable)
HashtagStream¶
Represents a hashtag you're monitoring.
Key fields:
- hashtag: The hashtag name
- instance: Mastodon instance
- active: Whether fetching is enabled
- posts: Relationship to posts from this stream
ReviewFeedback¶
Records your approval/rejection decisions for ML training. Includes a snapshot of post features captured at review time so training data survives post deletion.
Key fields:
- post_id: Which post (cascade-deleted with the post)
- decision: "approved", "rejected", or "score_adjusted"
- manual_score: Score you chose
- created_at: When you reviewed
- Snapshot fields: content_snippet, spam_score_at_review, hashtag_count, hashtags_snapshot, attachment_count, has_video, boosts, likes, replies, author_is_bot, instance, stream_id
BlockedUser & BlockedHashtag¶
Export filters.
BlockedUser:
- account_identifier: @user@instance.social format or pattern
- pattern_type: exact, suffix, prefix, or contains
- notes: Reason for blocking
- created_at: When added to blocklist
BlockedHashtag:
- hashtag: Hashtag to exclude (lowercase, without #)
- notes: Reason for blocking
- created_at: When added to blocklist
API Design¶
All endpoints follow REST semantics:
- GET - retrieve data
- POST - create or action
- PATCH - partial update
- DELETE - remove
Base URL: /api/v1
Resource Patterns¶
Streams: /hashtags
- GET /hashtags - list
- POST /hashtags - create
- GET /hashtags/{id} - detail
- PATCH /hashtags/{id} - update
- DELETE /hashtags/{id} - delete
- POST /hashtags/{id}/fetch - fetch posts
Posts: /posts
- GET /posts - list with filtering
- GET /posts/{id} - detail
- PATCH /posts/{id} - update (approve/reject/adjust score)
- POST /posts/batch-score - batch scoring
Curated Queue: /curated
- GET /curated/next - get next post (reserves it)
- POST /curated/{id}/ack - confirm export
- POST /curated/{id}/nack - return to queue
- POST /curated/{id}/error - mark permanently failed
- POST /curated/{id}/requeue - manually requeue
- POST /curated/cleanup - delete delivered posts older than N days
- POST /curated/trim-pending - trim excess pending posts
Technology Stack¶
Web Framework: PyView (Starlette-based LiveView) - Real-time capable - Minimal JavaScript required - Server-side rendering
Database: SQLAlchemy ORM + SQLite - Type-safe queries - Relationship management - Eager loading for performance
Migrations: Alembic - Schema versioning - Automatic on startup - Reproducible across environments
Async Runtime: asyncio - Async/await throughout - Non-blocking I/O - Sync-only for SQLite (acceptable)
Type Safety: Pydantic + Type Hints - Comprehensive type annotations - Request/response validation - IDE support
Testing: pytest - 389 tests - Coverage across all major functionality - Regression test suite
Key Design Patterns¶
Database Session Management¶
All database access uses context managers:
with get_db() as db:
# Do database operations
db.commit()
# Session automatically closed
Async I/O¶
HTTP calls to Fediverse use async:
async with httpx.AsyncClient() as client:
response = await client.get(url)
Service Layer¶
Business logic separated from API:
API Handler → Service → Database
Export Eligibility¶
Filters applied consistently whether exporting or auto-rejecting:
result = check_reblog_filters(post, db)
if result.eligible:
# Export or approve
else:
# Skip with reason
Performance Optimizations¶
Eager Loading: Relationships pre-loaded to prevent N+1 queries
posts = select(Post).options(
selectinload(Post.stream),
selectinload(Post.review_feedback)
)
SQL Aggregates: Use database count() instead of loading all rows
count = db.scalar(select(func.count(Post.id)))
Index Strategy: Indexes on frequently filtered columns (spam_score, created_at, queue_status).
Security¶
API Key Authentication: All protected endpoints require X-API-Key header Session Management: Proper cleanup prevents resource leaks Input Validation: Pydantic schemas validate all requests Type Safety: Type hints catch many errors at development time
Deployment¶
Development:
fenliu --reload --debug
Production:
fenliu --host 0.0.0.0 --port 8000
Queue Lifecycle Management¶
Delivered posts accumulate over time. FenLiu manages their lifecycle to keep the database lean while preserving all-time statistics.
Cleanup (Delivered Posts)¶
POST /api/v1/curated/cleanup?retention_days=7
Deletes delivered posts older than retention_days. Before deletion, the count is added to QueueStats.total_deleted_delivered so all-time stats remain correct. Runs daily via APScheduler and is also available on demand.
Trim Pending¶
POST /api/v1/curated/trim-pending?lookback_days=3
Trims the pending queue when it exceeds twice the recent daily consumption rate. Uses weighted random selection — older posts, posts with fewer likes, and posts from prolific authors are more likely to be trimmed. Removed count is added to QueueStats.total_deleted_pending.
Error History¶
When error posts are deleted (via cleanup), their error_reason is saved to ErrorHistory before deletion. This preserves the "Most Frequent Error" metric across the full lifetime of the system.
Stats Preservation¶
QueueStats— singleton row with cumulative deletion countsErrorHistory— append-only log of deleted error reasons- Queue Preview and Statistics pages combine active + historical counts for all-time figures
Future Extensibility¶
Machine Learning: ReviewFeedback model ready for training data Database: Easy to migrate from SQLite to PostgreSQL Export Formats: Queue API is tool-agnostic (not tied to specific consumer) Custom Rules: Framework ready for user-defined spam scoring rules