AI Invoice Classification: How BillyBox Automatically Filters Real Invoices From Noise
March 2026
Your email inbox doesn't just contain invoices. It contains logos, marketing banners, shipping notifications, terms of service PDFs, promotional flyers, and dozens of other attachments that look like they could be invoices — but aren't. When you connect your email to an invoice management tool, all of these get pulled in. The result? A review queue full of noise that defeats the purpose of automation.
This is exactly what happened to one of our earliest users. They connected their email and found hundreds of images and logos mixed in with their real invoices. Their feedback was blunt: "What's the point if I have to go one by one and remove them?" They were right. So we built AI classification to fix it.
The Problem: Email Attachments Are Messy
Most businesses receive 50-200+ emails with attachments per month. Only a fraction of those attachments are actual invoices. The rest includes:
- Company logos and banners — embedded images in email signatures and marketing emails
- Shipping labels and tracking PDFs — logistics documents that aren't invoices
- Terms of service updates — legal documents attached as PDFs
- Marketing materials — product catalogs, promotional flyers, event invitations
- Receipts for free services — $0.00 "invoices" from free-tier tools
- Duplicate attachments — the same PDF forwarded or replied to multiple times
Rule-based filtering catches some of this — you can filter by file type, size, or sender domain. But logos are PNGs just like scanned invoices. Marketing PDFs have the same file extension as real invoices. Rules alone can't tell the difference because the difference is in the content, not the format.
How BillyBox Uses AI to Classify Attachments
BillyBox now runs every attachment through an AI classification pipeline before it reaches your review queue. The system uses a dual-layer approach:
Layer 1: Rule-Based Pre-Filtering
Before AI even runs, BillyBox applies deterministic rules: known invoice domains (50+ vendor patterns), file type checks, size thresholds, and email metadata analysis. Attachments from known invoice senders like Stripe, AWS, or Hetzner skip AI entirely and go straight to your queue. This keeps things fast and cheap.
Layer 2: AI Classification Gate
For attachments that don't match known patterns, BillyBox sends the extracted text and metadata to an AI model that determines whether the document is a real invoice, a receipt, a statement of account, or something else entirely (a logo, a marketing PDF, a notification). Non-invoice documents are automatically marked as "ignored" so they never clutter your review queue.
The AI doesn't just check if a document mentions money. It understands context: a promotional email saying "Save $50 on your next order" is not an invoice. A PDF with a company logo, an invoice number, line items, and a total — that's an invoice. The model evaluates the full document structure, not just keywords.
What Gets Filtered Out
In testing with real user inboxes, AI classification typically filters out 40-70% of attachments that would have previously cluttered the review queue:
What Gets Kept
The AI is intentionally conservative about filtering. When in doubt, it keeps the document in your queue rather than hiding a real invoice. These always pass through:
The philosophy is simple: it's better to review one extra document than to miss a real invoice. AI handles the obvious noise; you make the final call on everything else.
Why Not Just Use OCR or Simple Keyword Matching?
Keyword matching ("does it contain the word invoice?") fails because marketing emails routinely contain words like "invoice", "payment", and "receipt" without being actual invoices. A newsletter saying "View your invoice" with a link is not an invoice — it's a notification. A promotional PDF titled "Invoice template" is not your invoice.
OCR (Optical Character Recognition) solves a different problem — converting images to text. It doesn't help you decide whether that text represents an invoice or a logo. You need understanding of document structure and intent, which is what AI provides.
Privacy and Cost
AI classification only processes document metadata and extracted text — not the original PDF files. The text extraction happens locally on our servers first, then only the extracted content is sent to the AI model for classification. Your original documents stay within BillyBox's EU-hosted infrastructure.
The cost per classification is fractions of a cent, which is why this feature is included in all plans — including the free tier. There's no per-page or per-document charge for AI classification.
The Result: A Clean Review Queue
Before AI classification, connecting an email with a busy inbox meant reviewing hundreds of irrelevant attachments. Now, you see mostly real invoices and receipts. The noise is gone. Classification takes minutes instead of the better part of an hour.
Combined with swipe-to-classify on mobile and keyboard shortcuts on desktop, the entire flow from email to accountant-ready export is now fast enough to do on a coffee break.
Related Articles
Try It Free
BillyBox's AI classification is included on all plans, including free. Connect your email, fetch a month, and see the difference — a clean queue of real invoices, not a wall of logos and marketing PDFs.