# Technical Documentation: Inferred Price Calculation Logic

## 1. Overview

This document details the system used to calculate the **List at** (Peak), **Trough**, and **1-Year Average** prices for a given product (ASIN). The entire system is predicated on the concept of an "inferred sale," which is a historical moment where data strongly suggests a transaction occurred.

The primary logic is housed in:
- `keepa_deals/stable_calculations.py`
- `keepa_deals/new_analytics.py`

The process follows three main stages:
1.  **Inferring Sale Events** (Finding the data points).
2.  **Sanitization** (Removing noise).
3.  **Calculation & AI Validation** (Determining the final prices).

------

## ⚠️ Critical Warning: The Dangers of Fallback Data

**Do NOT attempt to "fill in the blanks" with unverified data.**

A critical lesson learned in January 2026 is that fallback mechanisms—attempts to provide a price when the primary logic finds none—are extremely dangerous. They often result in the system confidently presenting garbage data, which then triggers downstream rejections (like the AI Reasonableness Check) or, worse, leads users to make bad buying decisions.

**Specific Failure Case: The "High Velocity" Fallback (Deprecated)**
The system previously contained an *unsafe* fallback logic:
> *If no sales are inferred, but `monthlySold > 20`, use the `Used - 90 days avg` price.*
This caused massive deal rejection rates because it often grabbed stale, high-priced Used listings for books that only sold as New. This logic has been **REMOVED**.

**The "Safe Fallback" Compromise and Subsequent Removal (Mar 2026)**
To address data sparsity without sacrificing safety, we briefly introduced a **Validated Fallback** (the "Silver Standard"). It attempted to use the **Minimum** of `stats.avg90` and `stats.avg365` for **Standard Used** conditions only when inferred sales < 3.

**REMOVAL REASONING:**
The user subsequently observed that this fallback logic—while safely preventing astronomical profits via `min()`—still essentially relied on *listing prices* rather than *true inferred sale prices*. This tactic, originally designed to increase the volume of deals found, compromised the core promise of only providing "true deals."

**Current Principle:** Fallbacks to listing averages are **strictly prohibited**. We now ONLY rely on inferred sale prices (derived from offer drops correlating with rank drops) to calculate profits. Sparse inferred sales (1-2 events) are still permitted as they represent true historical sales, but Keepa stats averages are not.

------

## 2. Stage 1: Inferring Sale Events

This is the foundational step handled by `infer_sale_events(product)` in `keepa_deals/stable_calculations.py`.

### a. The "Sale" Trigger and Confirmation
A sale is inferred by correlating two distinct events within a **240-hour** (10-day) window over the last two years:

1.  **The Trigger:** A drop in the offer count for either **New** or **Used** listings.
    -   Source: `csv[11]` (New Count) and `csv[12]` (Used Count).
    -   Mechanism: `diff()` detects negative changes.
2.  **The Confirmation:** A drop in the product's **Sales Rank** (`csv[3]`).
    -   Rationale: A rank drop indicates Amazon registered a sale.
    -   Window: If a rank drop occurs within 240 hours *after* an offer count drop, it is flagged as a confirmed sale.

### b. Price Association
When a sale is confirmed, the system associates a price with it using `pandas.merge_asof`. It finds the nearest listing price from the history (`new_price_history` or `used_price_history`) at the exact time of the sale.

------

## 2.5 Stage 1.5: XAI Rescue Mechanism ("Hidden Sales")

**Introduced:** Feb 2026 (`xai_sales_inference.py`)

If the algorithmic approach (Stage 1) finds **0 confirmed sales** or detects **no offer drops** (which is mathematically impossible for a sold item unless stock depth > 1), the system triggers a "Rescue" attempt.

1.  **Context Assembly:** The system constructs a markdown table representing ~365 days of history, aligning Rank, Price, and Offer Count time-series data.
2.  **AI Analysis:** This table is sent to **xAI (Grok)** with a specific prompt to identify "Hidden Sales"—instances where Sales Rank improved (dropped) significantly without a corresponding drop in Offer Count (implying the seller had multiple units).
3.  **Integration:** Sales identified by the AI are injected back into the pipeline as valid "Inferred Sales," allowing the deal to proceed to analysis instead of being rejected.
4.  **Safety:** To preserve tokens, this rescue is skipped if the item's current Sales Rank is > 2,000,000 ("Dead Inventory").

------

## 3. Stage 2: Data Sanitization

After collecting raw events, the data is sanitized to remove statistical outliers.

### Symmetrical Outlier Rejection
To prevent anomalous prices (e.g., penny books or repricer errors) from skewing the results:
1.  Calculates **Q1** (25th percentile) and **Q3** (75th percentile) of all inferred prices.
2.  Calculates **IQR** (Interquartile Range).
3.  Removes any sale price outside the range `[Q1 - 1.5*IQR, Q3 + 1.5*IQR]`.
4.  **Result:** A list of "sane" sale events.

------

## 4. Stage 3: Price Calculation

### A. The "List at" Price (Peak Season)
This determines the recommended listing price.

1.  **Seasonality Identification:** Groups sane sales by month. Identifies the **Peak Month** (highest median price).
2.  **Price Determination:**
    -   **Primary:** Calculates the **Mode** (most frequent price) during the Peak Month.
    -   **Fallback 1:** If no distinct mode exists, uses the **Median**.
    -   **Rescue (Sparse Sales):** If Inferred Sales < **3** (insufficient data), the system uses the **Median** of any available inferred sales (1-2 events) because they still represent *true* sales.
    -   *(Note: The previous "Keepa Stats Fallback" to listing averages was entirely removed in March 2026 to guarantee all profits are based on true sales.)*
3.  **Amazon Ceiling Logic:**
    -   To ensure competitiveness, the "List at" price is capped at **90%** of the lowest Amazon "New" price.
    -   Comparator: `Min(Amazon Current, Amazon 180-day Avg, Amazon 365-day Avg)`.
    -   If `List at > Ceiling`, it is reduced to the Ceiling value.
4.  **AI Reasonableness Check:**
    -   **Primary Check:** For standard inferred prices, the calculated price is sent to **xAI (Grok)** along with the book's title, category, **Binding**, **Page Count**, **Image URL**, and **Rank**.
    -   **Prompt Context:** The prompt explicitly instructs the AI that for seasonal items (especially Textbooks), a Peak Season price can validly be **200-400% higher** than the 3-Year Average to prevent false positive rejections.
    -   **Fallback Exception (Feb 2026):** If the price source is **"Inferred Sales (Sparse)"**, the AI Reasonableness Check is conditionally **SKIPPED** to prevent false rejections.
        -   **Suspiciously High Markup Check (Mar 2026):** If the calculated price (from *any* source, not just fallbacks) is **> 300% (3x)** of the current Used price, the deal is flagged as "Suspiciously High". The AI Reasonableness Check is **FORCED** (not skipped) to prevent accepting inflated prices caused by sparse data or market manipulation.
        -   **Hard Ceiling Safety (Mar 2026):** To prevent astronomical fake profits (e.g., a $4,000 "List At" price), any calculated list price exceeding **$1,500** is automatically and immediately rejected without even querying the AI.
        -   *Safety:* The AI prompt explicitly instructs the LLM that any used book price over $500 requires intense scrutiny, and prices over $1,000 are almost always unreasonable.
    -   If the AI rejects a price (either a standard one or a forced fallback check), the deal is invalidated (and subsequently persisted as incomplete data).

### B. 1-Year Average (`1yr. Avg.`)
Used for the "Percent Down" and "Trend" calculations.

1.  Filters the sane sales list to include only those from the **last 365 days**.
2.  Calculates the **Mean** of these prices.
3.  **Threshold:** Requires at least **1** inferred sale.
4.  **Fallback:** If 0 inferred sales are found, the system attempts to use **`stats.avg365`** (Used). If that also fails, it returns `None` and the deal is persisted as **incomplete data** (filtered from the UI).

------

## Key Evolution & "Hard-Won" Lessons

1.  **Mean vs Median:** We switched from Median to **Mean** for the 1-Year Average to better reflect the true market value across all transactions, after outlier removal proved effective.
2.  **Mode for Peak:** We use **Mode** for the "List at" price because arbitrage sellers often target a specific "standard" market price that occurs frequently, rather than an average of fluctuations.
3.  **Strict Validation with Persistence:** The AI check and the "Missing List at" exclusion are the primary filters. If the system cannot confidently determine a safe listing price, it **persists the deal as incomplete** (for potential future recovery) but filters it from the user dashboard to maintain a clean experience.
4.  **240-Hour Window:** Expanding the correlation window from 168h to 240h significantly improved capture rates for "Near Miss" sales events where rank reporting lagged behind offer drops.

------

## 5. Verification Case Study: The "Missing Data" Investigation (Feb 2026)

In February 2026, users reported that several deals appeared on the dashboard with missing data (e.g., `1yr Avg: -`) or negative profit, despite Keepa data seemingly being available. An in-depth investigation was conducted to determine if the *calculation logic* was flawed.

### Methodology
A diagnostic script (`tests/trace_1yr_avg.py`) was created to trace the exact execution of the logic on ASIN `1455616133`, one of the reported "missing data" items.

### Findings
1.  **Raw History:** The script found 1286 rank history points and 100 offer count points (valid data availability).
2.  **Inference Logic:**
    *   Detected **46** raw offer drops.
    *   Successfully correlated **28** of them with a Rank Drop within the 240-hour window.
3.  **Sanitization:** 8 outliers were removed using the IQR method.
4.  **Result:**
    *   **Sales in Last 365 Days:** 28 confirmed sales.
    *   **Calculated 1yr Avg:** **$54.38**.

### Conclusion
The calculation logic is **sound**. The data *does* exist, and the algorithm *can* find it. The reason these deals appeared broken on the dashboard was **Data Ingestion Stagnation** (deals getting stuck in a "lightweight update" loop that never re-fetched the full history needed for the calculation), not a flaw in the math itself.

### Resolution
We implemented a **"Zombie Data Defense"** strategy in the `Smart Ingestor`. The system now detects these "Zombie" deals (missing data) and forces a full re-fetch (heavy update) to attempt to repair them. Additionally, deals that truly lack data or have zero profit are now **persisted** (filtered from the UI) to allow for lightweight updates, rather than being rejected and entering an infinite re-fetch loop.
