How does HVTracker rank AI agents?

HVTracker ranks AI agents using an evidence-weighted HVTrust score computed from public, verifiable signals: supply-chain integrity (OSSF Scorecard, build provenance, signed commits), transparency (license, documentation), maintenance activity, and community adoption. The score is scaled by an evidence-confidence factor so agents with more checkable signals rank higher.

What is the HVTrust score?

HVTrust is a 0-100 composite trust score that measures how verifiably trustworthy an open-source AI agent is. Unlike popularity metrics like GitHub stars, HVTrust is based on concrete security and quality signals: OSSF Scorecard results, build provenance attestation, commit signing, license permissiveness, maintenance freshness, and adoption breadth.

What signals does HVTracker use?

HVTracker evaluates seven signal categories: Activity (commit recency, release frequency), Adoption (GitHub stars, forks, dependents), Transparency (license, README quality), Safety (OSSF Scorecard score), Identity (signed commits, verified publishers), Provenance (SLSA build attestation), and Evidence Quality (how many signals are actually checkable for a given agent).

How often is HVTracker data updated?

HVTracker data is refreshed daily via an automated pipeline that fetches the latest signals from GitHub, OSSF Scorecard, and other public sources. The leaderboard, agent profiles, and all comparison pages update with each refresh cycle.

Methodology

v3.1 · Last updated 2026-06-04

What HVTracker Measures

HVTracker measures open-source AI agent projects using public, independently checkable signals. The goal is to show project health, maintenance momentum, community adoption, and basic supply-chain trust without relying on vendor claims or self-reported submissions.

The leaderboard currently tracks GitHub activity, package downloads, Hacker News discussion, category ranking, rank movement, and trust/provenance signals where they are available in public APIs.

Health Score And Trust Score

The leaderboard ranks by the HVTrust score, not by popularity. HVTrust is a 0–100 composite designed so that the rank and the evidence grade tell the same story. It is built on four principles:

Trust is not just popularity — but adoption matters. Widely used software receives more scrutiny, bug reports, and security review. Adoption is log-scaled and capped, but carries meaningful weight (20%).
Harder-to-fake signals weigh more. Supply-chain integrity (provenance, signed commits, OSSF Scorecard) carries the highest weight.
Missing evidence lowers trust. The raw score is multiplied by a confidence factor based on how many independent signal types are available. An agent we know little about cannot reach the top tier.
Known-bad states are penalised and gated. Staleness subtracts points that adoption cannot offset; deprecated or unverified listings are capped below the verified tier.
Proprietary tools are scored honestly. A lower score for a proprietary tool reflects fewer public supply-chain artifacts, not a security finding. The leaderboard includes a license filter to compare like with like.

HVTrust = gate( confidence × [ Safety(25) + Identity(18) + Transparency(17) + Maintenance(20) + Adoption(20) ] − penalties )

Safety / Integrity · 25 — OSSF Scorecard, build provenance, and signed-commit ratio. The hardest signals to fake.
Identity / Provenance · 18 — verified listing status and build provenance binding the package to its source.
Transparency · 17 — a declared license and OSSF transparency checks.
Maintenance · 20 — recency of the last push and log-scaled recent commit activity (halved when commit data is low-confidence).
Adoption · 20 — log-scaled, capped stars and package downloads (npm, PyPI, crates.io, Docker Hub, VS Code Marketplace).

Confidence = present ÷ applicable signal types (floored at 0.4) and is shown alongside each score. Signals that cannot apply to an agent (for example, package downloads for a project that ships no package) are excluded rather than counted as missing — "not applicable" is not the same as "unverified".

Evidence Grade is derived from the trust score band: A ≥ 80, B ≥ 65, C ≥ 50, D < 50. It summarizes the overall trust level at a glance.

License Type classifies each project as Open, Source-available, Proprietary, or Unlicensed. Proprietary and source-available tools are scored on the same scale but are flagged so users can filter and compare like with like.

Displayed Signals

GitHub repository data: Stars, forks, last push date, recent commits, language, description, and open issue count come from the GitHub REST API.; GitHub's commit activity endpoint can return stale or delayed results. When possible, HVTracker falls back to recent commit counts and flags low-confidence commit cells with a question mark.
Package downloads: Weekly downloads are fetched from npm and PyPI for projects that have package names configured in agents.json. If a project has both package ecosystems configured, the values are summed and labeled by source.; Downloads are install events, not unique users. They can include CI, mirrors, bots, and automated environments.
Hacker News mentions: HN mentions count matching stories from the last 30 days using the Algolia Hacker News API and curated search terms.; Generic project names can create false positives or false negatives, so not every project has an HN query configured.
Rank movement: Rank deltas compare the current run with the most recent prior daily snapshot in output/history. The biggest-movers views are pinned to the most recent completed daily snapshot so they do not drift during intra-day batch refreshes.

Trust And Provenance Signals

HVTracker surfaces supply-chain signals separately from the health score. These indicators help readers judge release hygiene and verifiability, but they do not currently affect rank.

npm provenance: For npm packages, HVTracker checks whether the latest published version exposes provenance attestations in the npm registry's dist.attestations field.
PyPI provenance: For PyPI packages, HVTracker checks whether latest-release files expose PEP 740 provenance metadata through PyPI's Simple API JSON response.
OSSF Scorecard: Where available through deps.dev, HVTracker displays the OpenSSF Scorecard overall score and individual checks. Scorecard coverage is not guaranteed for every repository.
Signed commit ratio: HVTracker samples recent commits and reports the percentage that GitHub marks as verified through GPG, SSH, S/MIME, or GitHub's own signing flow.; A verified signature confirms that GitHub considers the commit signed; it does not prove code quality, maintainer intent, or release safety.

How Often Data Updates

The GitHub Actions workflow runs six staggered batches per day, one every 4 hours. Each batch refreshes roughly one-sixth of the tracked agents, so any given agent's signals update approximately once per day. Each successful run regenerates the leaderboard, agent pages, public JSON endpoints, feed.json, sitemap.xml, and a dated history snapshot.

Known Limitations

Stars are imperfect. Stars can reflect popularity, hype, age, or marketing, not necessarily production quality.
Commit counts are noisy. A high commit count can mean active development, churn, imports, generated files, or repository maintenance work.
Downloads are not users. Package download numbers can include automation and duplicate installs.
HN mentions are approximate. Curated search terms reduce noise but cannot perfectly capture discussion.
Trust signals are partial. Missing provenance or Scorecard data can mean the signal is unavailable, not necessarily that a project is unsafe.
No qualitative review yet. HVTracker does not currently score documentation quality, API stability, model/provider compatibility, benchmark performance, or real-world adoption.
No formal SLSA level. HVTracker displays observable provenance and Scorecard signals but does not claim an authoritative SLSA build level.

Corrections And Project Submissions

To suggest a correction, submit a missing package name, propose a category change, or request a new project, open a GitHub issue or pull request. Include the project repository, the preferred display name, the category you believe fits best, and any npm or PyPI package names that should be tracked.

New projects should be open-source AI agent projects or closely related infrastructure. Categories are curated manually to keep the leaderboard useful and comparable.

Versioning

Methodology changes are versioned explicitly. Every revision is recorded in the changelog below. Raw data snapshots are preserved on each build so past leaderboard states remain auditable — see the historical snapshots in the repository.

Changelog

v3.1 (current) — HVTrust now uses the live v3.1 weighting: Safety/Integrity 25, Identity/Provenance 18, Transparency 17, Maintenance 20, and Adoption 20. Adoption includes stars and package downloads, while supply-chain and provenance signals remain the strongest verifiability inputs.
v3.0 — The leaderboard began ranking by HVTrust instead of the popularity-based health score. HVTrust is gated and confidence-scaled: missing evidence lowers the score via a confidence multiplier, staleness incurs a penalty, and deprecated or unverified listings are capped below the verified tier.
v2.0 — Added supply chain trust signals: npm provenance, PyPI attestations (PEP 740), OSSF Scorecard (via deps.dev), signed commit ratio. These are displayed independently, not folded into the composite score.
v1.1 — Added npm, PyPI, and Hacker News data sources. Daily historical snapshots now archived.
v1.0 (May 2026) — Initial methodology. GitHub-only signals: stars, freshness, activity, community (forks).