news_at_12 module¶
- setup_logging(log_file, error_log_file, log_max_bytes, log_backup_count)[source]¶
Configure the root logger with file and console handlers.
Sets up three handlers:
A rotating file handler writing INFO and above to
log_file.A rotating file handler writing ERROR and above to
error_log_file.A stream handler writing INFO and above to the terminal.
Both file handlers rotate at
log_max_bytesand keeplog_backup_countbackup copies so logs never grow unbounded.
- url_hash(url)[source]¶
Return a stable SHA-256 hex digest for a URL.
Used as a unique key in the database to deduplicate headlines without storing or comparing full URL strings on every insert.
- parse_date(entry)[source]¶
Extract a publication date from a feed entry and return it as ISO-8601.
Tries
published_parsedfirst, then falls back toupdated_parsed. Both attributes are time-tuples supplied by feedparser.- Parameters:
entry – A feedparser entry object.
- Returns:
An ISO-8601 datetime string (e.g.
'2026-04-10T12:00:00'), orNoneif no parseable date attribute is found.- Return type:
str or None
- load_config(filename)[source]¶
Load and validate configuration from a TOML file.
Reads the file at
filename, checks that the required[settings]and[[feeds]]sections exist, and filters out any feeds whoseenabledkey is set tofalse.- Parameters:
filename (str) – Path to the TOML configuration file.
- Returns:
A dict with two keys on success:
'settings'(dict): The[settings]table from the TOML file.'feeds'(list[dict]): Only the feeds whereenabledistrue(or omitted, which defaults totrue).
Returns
Noneif the file is missing, contains invalid TOML, or is missing required sections.- Return type:
dict or None
- get_db(db_file)[source]¶
Open (or create) the SQLite database and ensure the schema exists.
Enables WAL journal mode for better concurrent read performance and creates the
feeds,headlines, andrunstables along with their indexes if they do not already exist.- Parameters:
db_file (str) – Path to the SQLite database file. The file is created if it does not exist.
- Returns:
An open database connection with
row_factoryset tosqlite3.Rowfor dict-style column access.- Return type:
- log_run_summary(conn, started_at, finished_at, elapsed_sec, feeds_fetched, feeds_failed, articles_total, articles_new)[source]¶
Write a single row to the
runstable summarising a completed run.Uses its own explicit commit so it is not part of any feed transaction.
- Parameters:
conn (sqlite3.Connection) – An open database connection.
started_at (str) – ISO-8601 timestamp when the run began.
finished_at (str) – ISO-8601 timestamp when the run completed.
elapsed_sec (float) – Total wall-clock time for the run in seconds.
feeds_fetched (int) – Number of feeds successfully fetched.
feeds_failed (int) – Number of feeds that failed to fetch.
articles_total (int) – Total number of articles processed.
articles_new (int) – Number of articles that were new this run.
- upsert_feed(conn, url, title, site_link)[source]¶
Insert a feed row if it does not exist, or update its metadata if it does.
Uses an
ON CONFLICTclause to updatetitleandlast_fetchedwhen the URL already exists. The caller is responsible for committing the surrounding transaction.- Parameters:
conn (sqlite3.Connection) – An open database connection.
url (str) – The RSS feed URL (used as the unique key).
title (str) – The feed’s display title.
site_link (str) – The feed’s associated website URL.
- Returns:
The integer primary key (
id) of the feed row.- Return type:
- upsert_headline(conn, feed_id, title, url, published, summary)[source]¶
Insert a headline if it is new, or bump its seen count if it already exists.
Keyed by a SHA-256 hash of the article URL so deduplication is fast and does not rely on string comparisons. The caller is responsible for committing the surrounding transaction.
- Parameters:
conn (sqlite3.Connection) – An open database connection.
feed_id (int) – The primary key of the parent feed row.
title (str) – The article headline.
url (str) – The article URL (hashed for deduplication).
published (str or None) – ISO-8601 publication date, or
None.summary (str) – A plain-text article summary (HTML already stripped).
- Returns:
A two-element tuple containing:
A dict of the headline row as it exists in the database after the upsert.
Trueif the headline was newly inserted,Falseif it already existed and was updated.
- Return type:
- fetch_feed(feed_url, summary_limit=300)[source]¶
Fetch and parse a single RSS feed. No database access.
Pure network function — safe to call from multiple threads simultaneously. Strips HTML from titles and summaries, truncates summaries to
summary_limitcharacters at a word boundary, and normalises dates to ISO-8601 strings.- Parameters:
- Returns:
A dict containing raw feed metadata and parsed entries on success:
{ 'feed_url': str, 'feed_title': str, 'feed_link': str, 'raw_entries': list[dict], # title, url, published, summary }
Returns
Noneif the feed could not be fetched or parsed.- Return type:
dict or None
- store_feed(conn, raw)[source]¶
Write the output of
fetch_feedto the database.Called sequentially — one feed at a time — so SQLite is never touched by more than one thread at once. Uses a single
with conntransaction per feed so all writes are committed in one disk flush and any failure rolls back the entire feed atomically.- Parameters:
conn (sqlite3.Connection) – An open database connection.
raw (dict) – The dict returned by
fetch_feed().
- Returns:
A fully resolved feed dict ready for HTML/JSON rendering:
{ 'feed_title': str, 'feed_url': str, 'feed_link': str, 'new_count': int, 'entries': list[dict], }
- Return type:
- async fetch_all(feed_urls, conn, max_workers=10, summary_limit=300)[source]¶
Fetch all feeds concurrently, then store results sequentially.
Runs all
fetch_feed()calls in a thread pool simultaneously, then callsstore_feed()for each result one at a time on the main thread to keep SQLite writes safe.- Parameters:
conn (sqlite3.Connection) – An open database connection passed through to
store_feed().max_workers (int) – Maximum number of concurrent fetch threads. Defaults to 10.
summary_limit (int) – Maximum characters per article summary, passed through to
fetch_feed(). Defaults to 300.
- Returns:
A list of resolved feed dicts as returned by
store_feed(), one per successfully fetched feed. Failed feeds are silently omitted.- Return type:
- export_json(all_feeds, filename)[source]¶
Write a clean JSON snapshot of all feeds and articles to disk.
The output is structured for easy LLM ingestion, including only human-readable fields (no internal database IDs or hashes).
- Parameters:
all_feeds (list[dict]) – The list of resolved feed dicts returned by
fetch_all().filename (str) – Path to the output JSON file. Created or overwritten.
- build_html(all_feeds, elapsed_seconds, db_file='headlines.db')[source]¶
Render all feeds and their headlines to a self-contained HTML string.
Produces a styled, responsive HTML page with per-feed cards, NEW/repeat badges, clickable article links, and a summary header. No external dependencies — all CSS is inlined.
- Parameters:
all_feeds (list[dict]) – The list of resolved feed dicts returned by
fetch_all().elapsed_seconds (float) – Total fetch duration, displayed in the page header.
db_file (str) – Path to the database file, displayed in the footer. Defaults to
'headlines.db'.
- Returns:
A complete HTML document as a string.
- Return type:
- save_html(all_feeds, filename, elapsed_seconds, db_file, auto_open_browser=True)[source]¶
Write the rendered HTML to disk and optionally open it in the browser.
- Parameters:
all_feeds (list[dict]) – The list of resolved feed dicts returned by
fetch_all().filename (str) – Path to the output HTML file. Created or overwritten.
elapsed_seconds (float) – Total fetch duration passed through to
build_html()for display in the page header.db_file (str) – Database file path passed through to
build_html()for display in the page footer.auto_open_browser (bool) – If
True, opens the saved file in the default web browser after writing. Defaults toTrue.