news_at_12 module¶

setup_logging(log_file, error_log_file, log_max_bytes, log_backup_count)[source]¶

Configure the root logger with file and console handlers.

Sets up three handlers:

A rotating file handler writing INFO and above to log_file.
A rotating file handler writing ERROR and above to error_log_file.
A stream handler writing INFO and above to the terminal.

Both file handlers rotate at log_max_bytes and keep log_backup_count backup copies so logs never grow unbounded.

Parameters:

log_file (str) – Path to the main log file.
error_log_file (str) – Path to the error-only log file.
log_max_bytes (int) – Maximum size in bytes before a log file rotates.
log_backup_count (int) – Number of rotated backup files to keep.

strip_html(text)[source]¶

Remove HTML tags from a string.

Parameters:: text (str) – Raw text that may contain HTML markup.
Returns:: The input string with all HTML tags removed and whitespace stripped. Returns an empty string if text is None or empty.
Return type:: str

url_hash(url)[source]¶

Return a stable SHA-256 hex digest for a URL.

Used as a unique key in the database to deduplicate headlines without storing or comparing full URL strings on every insert.

Parameters:: url (str) – The article URL to hash.
Returns:: A 64-character lowercase hexadecimal SHA-256 digest.
Return type:: str

parse_date(entry)[source]¶

Extract a publication date from a feed entry and return it as ISO-8601.

Tries published_parsed first, then falls back to updated_parsed. Both attributes are time-tuples supplied by feedparser.

Parameters:: entry – A feedparser entry object.
Returns:: An ISO-8601 datetime string (e.g. '2026-04-10T12:00:00'), or None if no parseable date attribute is found.
Return type:: str or None

pretty_date(iso)[source]¶

Format an ISO-8601 datetime string for human-readable display.

Parameters:: iso (str or None) – An ISO-8601 datetime string, or None.
Returns:: A formatted string such as 'April 10, 2026 12:00'. Returns 'Date unknown' if iso is falsy, or the original string unchanged if it cannot be parsed.
Return type:: str

load_config(filename)[source]¶

Load and validate configuration from a TOML file.

Reads the file at filename, checks that the required [settings] and [[feeds]] sections exist, and filters out any feeds whose enabled key is set to false.

Parameters:

filename (str) – Path to the TOML configuration file.

Returns:

A dict with two keys on success:

'settings' (dict): The [settings] table from the TOML file.
'feeds' (list[dict]): Only the feeds where enabled is true (or omitted, which defaults to true).

Returns None if the file is missing, contains invalid TOML, or is missing required sections.

Return type:

dict or None

get_db(db_file)[source]¶

Open (or create) the SQLite database and ensure the schema exists.

Enables WAL journal mode for better concurrent read performance and creates the feeds, headlines, and runs tables along with their indexes if they do not already exist.

Parameters:: db_file (str) – Path to the SQLite database file. The file is created if it does not exist.
Returns:: An open database connection with row_factory set to sqlite3.Row for dict-style column access.
Return type:: sqlite3.Connection

log_run_summary(conn, started_at, finished_at, elapsed_sec, feeds_fetched, feeds_failed, articles_total, articles_new)[source]¶

Write a single row to the runs table summarising a completed run.

Uses its own explicit commit so it is not part of any feed transaction.

Parameters:

conn (sqlite3.Connection) – An open database connection.
started_at (str) – ISO-8601 timestamp when the run began.
finished_at (str) – ISO-8601 timestamp when the run completed.
elapsed_sec (float) – Total wall-clock time for the run in seconds.
feeds_fetched (int) – Number of feeds successfully fetched.
feeds_failed (int) – Number of feeds that failed to fetch.
articles_total (int) – Total number of articles processed.
articles_new (int) – Number of articles that were new this run.

upsert_feed(conn, url, title, site_link)[source]¶

Insert a feed row if it does not exist, or update its metadata if it does.

Uses an ON CONFLICT clause to update title and last_fetched when the URL already exists. The caller is responsible for committing the surrounding transaction.

Parameters:

conn (sqlite3.Connection) – An open database connection.
url (str) – The RSS feed URL (used as the unique key).
title (str) – The feed’s display title.
site_link (str) – The feed’s associated website URL.

Returns:

The integer primary key (id) of the feed row.

Return type:

int

upsert_headline(conn, feed_id, title, url, published, summary)[source]¶

Insert a headline if it is new, or bump its seen count if it already exists.

Keyed by a SHA-256 hash of the article URL so deduplication is fast and does not rely on string comparisons. The caller is responsible for committing the surrounding transaction.

Parameters:

conn (sqlite3.Connection) – An open database connection.
feed_id (int) – The primary key of the parent feed row.
title (str) – The article headline.
url (str) – The article URL (hashed for deduplication).
published (str or None) – ISO-8601 publication date, or None.
summary (str) – A plain-text article summary (HTML already stripped).

Returns:

A two-element tuple containing:

A dict of the headline row as it exists in the database after the upsert.
True if the headline was newly inserted, False if it already existed and was updated.

Return type:

tuple[dict, bool]

fetch_feed(feed_url, summary_limit=300)[source]¶

Fetch and parse a single RSS feed. No database access.

Pure network function — safe to call from multiple threads simultaneously. Strips HTML from titles and summaries, truncates summaries to summary_limit characters at a word boundary, and normalises dates to ISO-8601 strings.

Parameters:

feed_url (str) – The RSS feed URL to fetch.
summary_limit (int) – Maximum number of characters to keep per article summary. Defaults to 300.

Returns:

A dict containing raw feed metadata and parsed entries on success:

{
    'feed_url':    str,
    'feed_title':  str,
    'feed_link':   str,
    'raw_entries': list[dict],  # title, url, published, summary
}

Returns None if the feed could not be fetched or parsed.

Return type:

dict or None

store_feed(conn, raw)[source]¶

Write the output of fetch_feed to the database.

Called sequentially — one feed at a time — so SQLite is never touched by more than one thread at once. Uses a single with conn transaction per feed so all writes are committed in one disk flush and any failure rolls back the entire feed atomically.

Parameters:

conn (sqlite3.Connection) – An open database connection.
raw (dict) – The dict returned by fetch_feed().

Returns:

A fully resolved feed dict ready for HTML/JSON rendering:

{
    'feed_title': str,
    'feed_url':   str,
    'feed_link':  str,
    'new_count':  int,
    'entries':    list[dict],
}

Return type:

dict

async fetch_all(feed_urls, conn, max_workers=10, summary_limit=300)[source]¶

Fetch all feeds concurrently, then store results sequentially.

Runs all fetch_feed() calls in a thread pool simultaneously, then calls store_feed() for each result one at a time on the main thread to keep SQLite writes safe.

Parameters:

feed_urls (list[str]) – List of RSS feed URLs to fetch.
conn (sqlite3.Connection) – An open database connection passed through to store_feed().
max_workers (int) – Maximum number of concurrent fetch threads. Defaults to 10.
summary_limit (int) – Maximum characters per article summary, passed through to fetch_feed(). Defaults to 300.

Returns:

A list of resolved feed dicts as returned by store_feed(), one per successfully fetched feed. Failed feeds are silently omitted.

Return type:

list[dict]

export_json(all_feeds, filename)[source]¶

Write a clean JSON snapshot of all feeds and articles to disk.

The output is structured for easy LLM ingestion, including only human-readable fields (no internal database IDs or hashes).

Parameters:

all_feeds (list[dict]) – The list of resolved feed dicts returned by fetch_all().
filename (str) – Path to the output JSON file. Created or overwritten.

build_html(all_feeds, elapsed_seconds, db_file='headlines.db')[source]¶

Render all feeds and their headlines to a self-contained HTML string.

Produces a styled, responsive HTML page with per-feed cards, NEW/repeat badges, clickable article links, and a summary header. No external dependencies — all CSS is inlined.

Parameters:

all_feeds (list[dict]) – The list of resolved feed dicts returned by fetch_all().
elapsed_seconds (float) – Total fetch duration, displayed in the page header.
db_file (str) – Path to the database file, displayed in the footer. Defaults to 'headlines.db'.

Returns:

A complete HTML document as a string.

Return type:

str

save_html(all_feeds, filename, elapsed_seconds, db_file, auto_open_browser=True)[source]¶

Write the rendered HTML to disk and optionally open it in the browser.

Parameters:

all_feeds (list[dict]) – The list of resolved feed dicts returned by fetch_all().
filename (str) – Path to the output HTML file. Created or overwritten.
elapsed_seconds (float) – Total fetch duration passed through to build_html() for display in the page header.
db_file (str) – Database file path passed through to build_html() for display in the page footer.
auto_open_browser (bool) – If True, opens the saved file in the default web browser after writing. Defaults to True.

main()[source]¶

Entry point for running the aggregator as a standalone script.

Loads configuration from config.toml, sets up logging, connects to the database, fetches all enabled feeds concurrently, stores results, logs the run summary, and writes HTML and JSON output files.