🕸️ WebSense: Web Scraper for Structured Data

WebSense is a modular, AI-powered web scraper that turns raw webpages into structured data with minimal code. Instead of brittle CSS selectors or XPath rules, it uses LLMs to understand page content semantically, making your scrapers far more resilient to layout changes and easier to maintain. (GitHub)

:brain: Key Highlights

  • Semantic Extraction: Uses large language models to interpret and extract meaningful data.
  • Simple API: Extract structured results in just a few lines of Python.
  • Flexible: Support for JSON schemas or example-based inference.
  • Modular Pipeline: Clear fetch → clean → parse stages for customization.
  • CLI Included: Run quick extractions directly from the command line.

:rocket: Quick Example

from websense import Scraper

scraper = Scraper()
data = scraper.scrape(
    "https://github.com/atasoglu/websense",
    example={"project_name": "string", "description": "string", "stars": 0}
)
print(data)

…all without hand-crafting selectors!

1 Like