Data Annotation for E-Commerce and Retail AI: Use Cases and Best Practices

From visual search to fake review detection, AI in retail runs on labeled data. Here is how Southeast Asian e-commerce teams are building their training datasets.

9 min readBy the DataX Power team
Data annotation for e-commerce retail AI – products being cataloged and labeled on a laptop

Why retail AI has a data annotation problem

E-commerce platforms in Southeast Asia process hundreds of millions of product listings. The AI systems that power search relevance, recommendation engines, visual search, counterfeit detection, and review moderation all depend on training data that reflects the actual product catalog – diverse, multilingual, inconsistently photographed, and continuously changing.

Unlike healthcare AI (where medical imaging datasets have clear structure and regulatory precedent) or autonomous vehicle AI (where sensor data formats are relatively standardized), retail AI faces an annotation challenge that is as much about cultural context and market knowledge as technical labeling precision.

A bounding box annotator who does not know the difference between a Vietnamese áo dài and a formal Western dress cannot correctly label a fashion attribute dataset for a Southeast Asian platform. This is why region-specific annotation teams are not just cheaper for APAC e-commerce work – they are more accurate.

Use case 1: product image annotation for catalog AI

Product catalog annotation is the highest-volume annotation task in retail AI. Every new listing on a major platform requires some combination of: category classification, attribute extraction, image quality scoring, and background removal or segmentation for presentation standardization.

The scale is significant: Shopee processes approximately 200 million active listings across six markets. At any given time, tens of thousands of new listings are submitted daily. AI catalog management systems require ongoing training data to maintain accuracy across expanding and evolving product catalogs.

  • Category classification: multi-level taxonomy labeling (e.g., Electronics > Mobile Phones > Smartphones > Android). Accuracy at level 3+ requires annotators familiar with the actual product category.
  • Attribute tagging: extracting structured attributes from unstructured listings – color, material, brand, size, condition – for faceted search and recommendation features.
  • Image quality scoring: scoring product photos on dimensions including background cleanliness, lighting quality, subject visibility, and compliance with platform image policies.
  • Semantic segmentation for background removal: pixel-level segmentation of product from background, enabling AI-powered background standardization at scale.
  • Duplicate and variant grouping: identifying product variants (same item in different colors/sizes) and genuinely duplicate listings for deduplication and grouping models.

Use case 2: visual search and product similarity

Visual search – the ability to photograph a product and find matching or similar items in a catalog – is one of the highest-value AI features for fashion, home decor, and electronics categories. Training visual search models requires image-level similarity annotation rather than object-level labeling.

Similarity annotation is more subjective than object detection and requires annotators who understand the end user's intent. A user photographing a brown leather sofa wants to find similar sofas, not all brown leather items. This requires annotators to evaluate products through a shopper's lens, not a technical one.

  • Triplet annotation: for each anchor image, annotators identify positive matches (similar enough to be a valid search result) and negative examples (superficially similar but not the same product category or style intent).
  • Style similarity vs. exact match: annotators must distinguish between "visually identical product" and "same style/aesthetic" – two different task types that train different model objectives.
  • Cross-category confusion: some visual searches span category boundaries (a belt that also appears to be a bag handle). Annotation guidelines must address these deliberately.
  • Culture-specific style knowledge: for fashion and home decor in Southeast Asian markets, annotation teams with local market knowledge produce significantly more accurate similarity judgments.

Use case 3: review and content moderation

Customer reviews contain the richest training signal for recommendation, search relevance, and trust systems – and also the highest fraud risk. Annotation for review AI involves multiple parallel tasks: sentiment classification, topic extraction, fake review detection, and policy violation classification.

Review annotation in APAC markets is complicated by multilingual content. A single platform may receive reviews in Vietnamese, Thai, Bahasa Indonesia, Tagalog, Mandarin, and English – often in the same review. This requires multilingual annotation teams or language-specific annotation queues, not just Google Translate pre-processing.

  • Sentiment annotation: document-level and aspect-level sentiment (e.g., "packaging good but product poor") with fine-grained labels beyond simple positive/negative/neutral.
  • Fake review classification: training data for fake review detection requires both confirmed fake examples (from verified enforcement actions) and confirmed genuine examples, with careful data handling for legal reasons.
  • Policy violation detection: categorizing reviews that violate platform policies (spam, irrelevant content, personal attacks, misinformation) into specific violation types for enforcement model training.
  • Helpfulness ranking: annotating which reviews are most useful to prospective buyers, for ranking model training.
  • Multilingual labeling: all annotation tasks above require language-matched annotators, not machine-translated text presented to English-only teams.

Use case 4: fraud and counterfeit detection

Counterfeit product detection is an annotation-intensive problem for every major APAC marketplace. The challenge is that counterfeit products are designed to look like genuine ones – the visual differences are subtle and require annotators with genuine product knowledge to identify.

Training data for counterfeit detection models involves annotating pairs of genuine and counterfeit product images with labels identifying the specific visual cues that distinguish them: logo placement, stitching quality, packaging typography, material texture. This is a task where domain expertise (brand knowledge, product authentication experience) directly determines annotation quality.

  • Binary genuine/counterfeit classification: high-volume, relatively fast per item, but requires continuous model feedback to avoid annotator drift.
  • Tamper detection for packaging: identifying digitally manipulated product images (price tags removed, logo replaced, certification stamps added) requires annotators trained on specific manipulation artifacts.
  • IP violation listing detection: identifying listings that use brand imagery without authorization, which is a different task from counterfeit product detection.
  • Cross-listing pattern annotation: tagging listing attributes (price, description patterns, seller history signals) for models that detect fraudulent seller behavior rather than individual listing fraud.

Use case 5: demand forecasting and price optimization training data

A less obvious but growing annotation use case is labeling structured data for demand forecasting and dynamic pricing models. These models require human judgment on seasonal event significance, promotional discount categorization, and product lifecycle stage labeling that automated systems cannot reliably produce.

  • Promotional event annotation: labeling sales periods by type (flash sale, seasonal, mega campaign like 11.11) with region-specific significance scores.
  • Product lifecycle tagging: annotating products as new launch, peak season, end-of-season, or clearance, which feeds temporal demand models.
  • Competitive price context: for price optimization models, annotating which competitor price changes represent genuine market signals vs. noise.

Building your retail annotation program: what works in APAC

E-commerce annotation programs in Southeast Asia have distinctive characteristics that differ from Western market equivalents. The product catalog diversity is broader (more categories, more local product types), the multilingual requirement is more complex (6+ languages across SEA), and the volume scale is extreme (top platforms process tens of millions of listings).

Annotation programs that work for this context share several characteristics: dedicated annotation teams per language cluster rather than pooled multilingual teams; pilot runs per category before full-scale production (apparel annotation requires different expertise than electronics); and staged quality review that catches cultural context errors before they scale.

The annotation partner you choose for APAC retail work needs to demonstrate two things: capacity to scale to peak demand (double or triple normal volume during campaign periods like 11.11 or Tết) and market knowledge depth sufficient to label culturally specific content accurately. These are harder to evaluate than standard quality metrics but matter more in practice.

Data Annotation Service

Looking to operationalise the dataset thinking in this post? Our data annotation services Vietnam pod handles collection, cleaning, processing, and pixel-precise annotation across image, video, text, audio, document, and 3D point-cloud data.

Let's build what's next

Share your challenge – AI, data, or infrastructure. We'll scope your project and put the right team on it.