Projects

Meta Ads Project - AI-Powered Ad Analytics Pipeline

image
November 12, 2024
Meta Ads Project is an advanced AI-powered pipeline designed for collecting and analyzing Meta Ads data, specifically for the skincare and beauty industry. It utilizes natural language processing (NLP), vector search (FAISS), and a Streamlit-based dashboard to provide valuable insights into ad content, effectiveness, and trends.
  • Automated Ad Collection: Scrapes Meta Ads Library with proxy support.
  • AI-Powered Analysis: Uses OpenAI (GPT-4) and Anthropic (Claude) for in-depth ad content analysis.
  • Vector Search Engine: Implements FAISS for similarity-based ad retrieval.
  • Real-Time Dashboard: Interactive Streamlit UI for ad visualization and keyword-based searches.
  • Efficient Data Storage: Uses MongoDB for structured ad storage and quick retrieval.
  • Keyword-Based Targeting: Allows users to configure keyword-based ad tracking with CSV files.
  • Media Processing: Compresses and processes images and videos for efficient storage and retrieval.
  • Python: Core language for pipeline processing and AI integrations.
  • Streamlit: Interactive UI for exploring and visualizing ad insights.
  • MongoDB: NoSQL database for structured ad storage.
  • FAISS: Vector search for similarity-based ad recommendations.
  • OpenAI & Anthropic APIs: AI-powered ad content analysis.
Building this project required overcoming several technical challenges:
  • Handling Large Data Volumes: Optimized MongoDB indexing and FAISS embeddings to ensure quick searches.
  • Proxy Management: Implemented better proxy handling for stable ad scraping.
  • Real-Time Data Processing: Balanced speed and accuracy in AI-driven content analysis.
These challenges provided deep insights into AI-driven data processing, full-stack development, and efficient search implementations.
The Meta Ads Project is a fully automated AI-powered pipeline that collects, processes, and analyzes Meta (Facebook) ad data, specifically for the skincare and beauty industry. This system is designed for large-scale ad data ingestion, AI-based enrichment, and advanced vector search for high-relevance ad recommendations.
  • Utilizes requests and BeautifulSoup with dynamic proxy handling to avoid IP bans.
  • Implements async scraping (via aiohttp) to efficiently gather ad data in parallel.
  • Filters ads based on predefined industry-specific keywords (extracted dynamically using NLP techniques) to focus on relevant data.
  • Extracts structured metadata like ad copy, engagement metrics, image/video URLs, and advertiser details.
  • Uses LangChain's document loaders to process text-heavy ad descriptions efficiently.
  • Each ad is stored as a document in MongoDB with fields:
    {
      "ad_id": "12345",
      "company": "Brand XYZ",
      "text": "Introducing our new skincare formula...",
      "image_url": "https://fbcdn.com/ad_image.jpg",
      "video_url": "https://fbcdn.com/ad_video.mp4",
      "engagement": { "likes": 1200, "shares": 340, "comments": 450 },
      "timestamp": "2024-03-10T12:30:00Z"
    }
    
  • Uses Indexing (TTL + Compound Indexes) for efficient querying.
  • Ad descriptions are vectorized and stored as embeddings.
  • Identifies the parent company of an ad and fetches additional details.
  • Uses GPT-4 or Claude API to generate a company summary, ensuring consistency in brand information.
  • Applies zero-shot classification (via OpenAI API) to categorize ad intent:
    • Brand Awareness
    • Product Promotion
    • Discount/Offer Campaigns
    • User-Generated Content (UGC) Promotions
  • Sentiment analysis using TextBlob or VADER to determine audience reaction.
  • OpenCV + PIL for basic media preprocessing.
  • FFmpeg for handling video compression and frame extraction.
  • CLIP (Contrastive Language–Image Pretraining) for multimodal ad analysis, linking text captions to image content.
  • Text embeddings are generated via OpenAI’s text-embedding-ada-002 model.
  • Image embeddings via CLIP to ensure multimodal searchability.
  • Each ad’s text and media content are converted into 1280-dimensional embeddings.
  • FAISS (Facebook AI Similarity Search) is used for efficient nearest-neighbor search.
  • Ad embeddings are indexed using IVF (Inverted File Index) + HNSW (Hierarchical Navigable Small World) graph for fast lookup.
  • Approximate Nearest Neighbor (ANN) Search enables ultra-fast retrieval even with millions of ads.
import faiss
import numpy as np

# Load stored embeddings
index = faiss.read_index("ads_faiss.index")
query_embedding = model.embed("best anti-aging serum for dry skin")
d, I = index.search(np.array([query_embedding]), k=5)  # Top 5 matches
  • Users can search by keyword, brand, or ad type.
  • Dynamic charts (Plotly/Matplotlib) visualize ad trends over time.
  • Supports semantic search ("Show me trending skincare ads") using FAISS + OpenAI.
  • Filters ads by sentiment, engagement, or region.
The Meta Ads Project is an end-to-end AI-powered ad analytics solution integrating scraping, AI enrichment, vector search, and real-time analytics into a seamless pipeline. By leveraging advanced NLP, multimodal embeddings, and FAISS-based retrieval, it offers deep insights into Meta Ads while maintaining high efficiency and scalability.