[Feature] Automated stream creation #3

Open
opened 2025-03-09 10:18:16 +00:00 by CJ_Clippy · 0 comments
Owner

This is a key part of Futureporn's success. In order to be able to get statistics on archive status, we need to know about every stream that has ever happened. In order to do that, we need a combination of automation and crowdsourcing.

This is the plan for automation. Each component does a task which results in maximum data ingestion into the db.

Crawler component

  • For each vtuber we know about, we crawl their social media posts
  • Act on posts which contain CB/Fansly/OF links
  • Create stream (Oban task)
  • X.com (strictly only X for now. Bsky/other social medias can get added in V3 or V4)

API Component (/streams/new)

  • accepts a X post URL
  • fails when X post URL is a duplicate

Parser components (common)

  • parses X post body, extracting
    • Title (LLM parser, maybe)
    • X post URL
    • UTC Date
    • Lewdtuber (reference to our db)
  • Categorization and acceptance
    • Ignore socials reminder tweets (linktrees)
    • Ignore SFW stream announcements
    • Ignore retweets
    • Ignore links to vods
    • Ignore misc. tweets
    • Ask a human if unsure

Scrape & Cache

We need a x_posts database schema so we can cache posts in the db. The db is the source of truth, rather than nitter. This allows nitter to be an ephemeral thing rather than a point of failure (nitter data loss should not cause problems)

  • Nitter
    • user accounts
    • proxies
  • rss.app
  • XPost database type
This is a key part of Futureporn's success. In order to be able to get statistics on archive status, we need to know about every stream that has ever happened. In order to do that, we need a combination of automation and crowdsourcing. This is the plan for automation. Each component does a task which results in maximum data ingestion into the db. # Crawler component * [x] For each vtuber we know about, we crawl their social media posts * [x] Act on posts which contain CB/Fansly/OF links * [x] Create stream (Oban task) * [x] X.com (strictly only X for now. Bsky/other social medias can get added in V3 or V4) # API Component (/streams/new) * [x] accepts a X post URL * [x] fails when X post URL is a duplicate # Parser components (common) * [x] parses X post body, extracting * [ ] Title (LLM parser, maybe) * [x] X post URL * [x] UTC Date * [x] Lewdtuber (reference to our db) * [ ] Categorization and acceptance * [ ] Ignore socials reminder tweets (linktrees) * [ ] Ignore SFW stream announcements * [ ] Ignore retweets * [ ] Ignore links to vods * [ ] Ignore misc. tweets * [ ] Ask a human if unsure # Scrape & Cache We need a x_posts database schema so we can cache posts in the db. The db is the source of truth, rather than nitter. This allows nitter to be an ephemeral thing rather than a point of failure (nitter data loss should not cause problems) * [ ] Nitter * [ ] user accounts * [ ] proxies * [x] rss.app * [x] XPost database type
CJ_Clippy added the
enhancement
label 2025-03-09 10:18:16 +00:00
CJ_Clippy changed title from Automated stream creation to [Feature] Automated stream creation 2025-03-09 10:21:35 +00:00
CJ_Clippy added this to the 2.0 milestone 2025-03-09 10:23:36 +00:00
CJ_Clippy pinned this 2025-03-09 11:31:46 +00:00
Sign in to join this conversation.
No Milestone 2.0
No project
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: futureporn/fp#3
No description provided.