Classify support complaints into 4 priority levels using AI. Trained across 8 domains with sentiment-aware feature engineering.
Complaint0/500
Quick examples
Recent
Processing Pipeline
How it works
The v2 pipeline fuses 10,000-dim TF-IDF features with 20 hand-engineered signals — capturing slang, sentiment, and domain signals that bag-of-words alone misses.
01
Raw text input
Accept any plain-text complaint string. No encoding requirements, no preprocessing on the caller side.
predict_complaint_v2("Server is down, urgent!")
02
Slang normalization
50+ slang tokens are mapped to canonical phrases before any other processing, preserving semantic signal that TF-IDF would silently drop.
Lowercase, strip URLs and emails, remove punctuation and digits, drop tokens shorter than 2 characters. Stopwords removed using a built-in list — no NLTK required.
04
TF-IDF vectorization
Cleaned text is transformed into a 10,000-dim sparse vector using a pre-fitted vectorizer. Unigrams and bigrams capture phrases like "system down".
20 hand-crafted features are extracted in parallel and scaled with StandardScaler — urgency signals, sentiment intensity, domain flags, caps ratio, negation density, and more.
06
Feature fusion
Sparse TF-IDF matrix and dense engineered features are stacked into a 15,020-dim fused vector.
Fused features go into the chosen model (LR or SVM). Both are probability-calibrated with CalibratedClassifierCV so confidence scores are meaningful, not raw logits.