Skip to content

Database Models

FenLiu uses SQLAlchemy ORM with these core models.

Post

Represents a post fetched from the Fediverse.

Fields: - id (int): Primary key - post_id (str): ActivityPub URI (unique) - url (str, optional): Human-readable web URL - content (str): Post text (may include HTML) - author_username (str, optional): Account username - author_display_name (str, optional): Display name - author_url (str, optional): Author profile URL - author_is_bot (bool): Whether the account is a bot - instance (str): Fediverse instance the post came from - hashtags (JSON list, optional): Hashtags in the post - media_attachments (JSON list, optional): Images/videos - boosts (int): Number of boosts/reblogs - likes (int): Number of favourites/likes - replies (int): Number of replies - created_at (datetime, optional): Post creation time on Fediverse - fetched_at (datetime): When FenLiu fetched the post - processed (bool): Whether spam scoring has run - spam_score (int): Automatic 0-100 score - reviewed (bool): Has been reviewed - reviewed_at (datetime, optional): When reviewed - reviewer_notes (str, optional): Notes from reviewer - manual_spam_score (int, optional): User-adjusted score (overrides automatic) - approved (bool, optional): True = approved, False = rejected, None = not reviewed - curated_exported (bool): Whether exported to curated queue - export_date (datetime, optional): When first exported - training_data (bool): Whether used as ML training data - queue_status (str, optional): pending / reserved / delivered / error - reserved_at (datetime, optional): When reserved for delivery - delivered_at (datetime, optional): When successfully delivered (ack'd) - error_reason (str, optional): Export failure reason - errored_at (datetime, optional): When the error occurred

Relationships: - stream: Parent HashtagStream - review_feedback: Related ReviewFeedback records

HashtagStream

Configuration for monitoring a hashtag.

Fields: - id (int): Primary key - hashtag (str): Hashtag name (without #, unique) - instance (str): Mastodon instance to fetch from - active (bool): Monitoring enabled - last_check (datetime, optional): Last successful fetch time - created_at (datetime): Creation time - updated_at (datetime, optional): Last modification time - enable_scheduling (bool): Whether auto-fetch is enabled (default: true) - fetch_interval_minutes (int): Minutes between scheduled fetches (default: 60) - next_scheduled_fetch (datetime, optional): When the next auto-fetch will run

Relationships: - posts: Related Post records (cascade delete)

ReviewFeedback

Records user review decisions for ML training. Snapshot fields capture post features at review time so training data survives the queue cleanup job that deletes old posts.

Fields: - id (int): Primary key - post_id (int, FK): Foreign key to Post (cascade-deleted when Post is deleted) - decision (str): "approved", "rejected", or "score_adjusted" - manual_score (int, optional): Score set during review - reviewer_notes (str, optional): Notes - created_at (datetime): When the decision was made

Snapshot fields (captured at review time, survive post deletion): - content_snippet (str, optional): First 500 chars of stripped post content - spam_score_at_review (int, optional): Heuristic spam score at time of review - hashtag_count (int, optional): Number of hashtags - hashtags_snapshot (JSON, optional): List of hashtags - attachment_count (int, optional): Number of media attachments - has_video (bool, optional): Whether any attachment is video - boosts (int, optional): Boost count at review time - likes (int, optional): Like count at review time - replies (int, optional): Reply count at review time - author_is_bot (bool, optional): Whether author is flagged as a bot - instance (str, optional): Fediverse instance the post came from - stream_id (int, optional): Which hashtag stream the post belongs to

Relationships: - post: Parent Post

BlockedUser

Blocklist of accounts to exclude from export. Supports pattern matching.

Fields: - id (int): Primary key - account_identifier (str): Account or pattern (e.g. @user@instance.social, *.bsky.app) - pattern_type (str): Match mode — exact, suffix, prefix, or contains - notes (str, optional): Reason for blocking - created_at (datetime): When added

BlockedHashtag

Blocklist of hashtags to exclude from export.

Fields: - id (int): Primary key - hashtag (str): Hashtag name (lowercase, without #, unique) - notes (str, optional): Reason for blocking - created_at (datetime): When added

AppSetting

Configuration storage (key-value pairs, JSON-encoded values).

Fields: - key (str): Setting name (primary key) - value (str): JSON-encoded value - updated_at (datetime): Last update time

Common Keys: - attachments_only: "true" or "false" - auto_reject_blocked: "true" or "false" - auto_reject_bots: "true" or "false"

QueueStats

Singleton row tracking aggregate counts for posts that have been deleted (to preserve all-time statistics).

Fields: - id (int): Primary key (always 1) - total_deleted_delivered (int): Delivered posts removed by cleanup - total_deleted_reserved (int): Reserved posts removed by cleanup - total_deleted_pending (int): Pending posts removed by trim

Used by the Queue Preview and Statistics pages to show all-time delivered/pending counts.

ErrorHistory

Audit trail of deleted error posts. Preserves the error reason after cleanup so "Most Frequent Error" statistics remain accurate over time.

Fields: - id (int): Primary key - error_reason (str, optional): The error reason recorded before deletion - deleted_at (datetime): When the error post was deleted

Entity Relationships

HashtagStream
  └── Post (one-to-many, cascade delete)
      └── ReviewFeedback (one-to-many)

BlockedUser (standalone)
BlockedHashtag (standalone)
AppSetting (standalone key-value)
QueueStats (singleton — id always 1)
ErrorHistory (append-only audit log)

Queue State Machine

Posts transition through states:

pending → reserved → delivered (ack)
               ↓
            pending  (nack or 5-min timeout)

pending → reserved → error (permanent failure)
                          ↓
                       pending  (manual requeue)

Spam Score Fields

Two fields track spam scoring:

  • spam_score: Automatic score (0-100)
  • manual_spam_score: Your override (optional)

When a manual score exists it is used for export decisions.

Database Schema

Created via SQLAlchemy and Alembic migrations. See alembic/versions/ for schema history.

Current schema version automatically applied on startup via alembic upgrade head.

Indexing

Indexes created for performance:

  • Post.post_id (unique)
  • Post.queue_status
  • Post.delivered_at
  • HashtagStream.hashtag (unique)

Next Steps