Renwei Meng
Back

PetLingo AI: A Pet Behavior Understanding and Annotation Platform

An AI-native product built around pet vocalization understanding, multimodal behavior analysis, and data flywheel design, covering an internal annotation platform, a consumer-facing app, agent workflows, RAG pipelines, and monetization strategy.

PetLingo AI: A Pet Behavior Understanding and Annotation Platform
Mar 1, 2025Ongoing
AIFull StackAgentRAGMobileSaaSProduct Design
Next.jsTypeScriptReact NativeFastAPISpring BootLangGraphLlamaIndexPostgreSQLRedisPinecone

PetLingo AI: A Pet Behavior Understanding and Annotation Platform

1. Project Overview

PetLingo AI is an AI-native product designed around pet vocalization understanding, multimodal behavior analysis, and continuous data iteration.
Rather than framing it as a simple “pet translator,” I structured it as a complete AI product composed of two major systems:

  1. An internal annotation and workforce management platform
  2. A consumer-facing pet language AI app

The first system is responsible for producing, validating, and managing high-quality training data.
The second system is responsible for serving real user scenarios, including audio upload, AI analysis, explanation generation, and user feedback collection.

Together, the two systems form a full AI product flywheel:

User-generated data → annotation platform → human labeling and manager auditing → model training and iteration → deployment to the app → continuous user usage and more data generation


2. Background

The starting point of this project was not to create an entertainment-style “pet sound translator,” but to explore a more meaningful question:

Can pet sounds, body movements, behavioral rhythms, and environmental context be modeled through multimodal AI and presented in a way that users can actually understand?

When taken seriously, this becomes a complex AI product problem rather than a simple front-end implementation challenge.
It requires solving at least three key problems:

1. Where does the training data come from?

Pet sound and behavior data is not naturally structured, and high-quality labeled datasets are extremely scarce.
That means a sustainable data production system must be built first.

2. How does the model keep improving?

Pet behavior understanding is not a standard NLP task.
It involves audio, vision, text, and contextual information, which requires multimodal modeling and a continuous feedback loop.

3. Why would users keep using the product?

If the AI output is only a one-time result like “your dog is happy,” the product will not become sticky.
The AI output must be transformed into a long-term, explainable, feedback-driven user experience.


3. What I Worked On

In this project, I mainly worked on two major parts:

  • Full-stack development of the annotation platform
  • Full-stack development of the consumer-facing AI app and AI workflow integration

1. Annotation Platform

I built an internal annotation system for employees to produce labels for pet audio clips, video snippets, and textual behavior descriptions.
This was not just a lightweight admin panel. It was designed as a real data production and operations system.

Its key capabilities included:

  • Task assignment and workflow management
  • Annotation workbench for employees
  • Automatic payroll calculation
  • Manager auditing and quality inspection
  • Workforce status management
  • Exception feedback and second-pass labeling
  • Label consistency control
  • Productivity and quality analytics

2. Consumer-facing AI App

I also contributed to the full-stack development of the pet language AI app, especially the connection between the front-end experience, backend services, and AI workflows.

The app was designed not to produce a naive anthropomorphic translation, but to give users a more useful and sustainable pet understanding experience.

Its major features included:

  • Pet audio upload and real-time analysis
  • Multimodal behavior recognition
  • AI-generated explanation
  • Pet growth profile
  • Historical behavior timeline
  • Personalized care suggestions
  • Subscription and premium analytics capability

4. Product Positioning

When packaging this project as a stronger portfolio piece, I positioned it as:

A data-driven AI platform built for pet behavior understanding.

It is not a single-point feature, but a complete product system consisting of:

  • a data platform
  • an AI platform
  • an inference and orchestration system
  • a mobile product
  • an operational feedback loop

Its target audiences include:

  • everyday pet owners
  • highly engaged pet community users
  • households that care about long-term pet health and behavior
  • pet training organizations
  • pet healthcare and insurance partners

5. Overall Architecture

1. System Layers

The system can be broken down into five layers.

Layer 1: Client Applications

This layer includes the mobile app for users and the web-based admin console for operators.

  • Consumer app: React Native / Expo
  • Admin console: Next.js + React + TypeScript
  • UI system: Tailwind CSS + shadcn/ui
  • State management: Zustand + TanStack Query

Layer 2: Business Service Layer

This layer handles users, billing, tasks, payroll, auditing, roles, and permissions.

  • FastAPI as the AI gateway and service aggregation layer
  • Spring Boot 3 for enterprise-style business modules such as payroll rules, permissions, and audit logs
  • Go for high-concurrency processing and middleware scheduling
  • GraphQL Gateway for unified multi-client data access

Layer 3: AI Orchestration and Inference Layer

This layer controls model invocation, agent orchestration, retrieval-augmented generation, and inference routing.

  • LangGraph for multi-step agent workflows
  • LlamaIndex for knowledge indexing and RAG pipelines
  • vLLM for high-throughput LLM inference
  • Triton Inference Server for unified model serving

Layer 4: Data Storage Layer

This layer manages structured data, caching, vector storage, and media object storage.

  • PostgreSQL for user data, pet profiles, behavioral logs, and billing records
  • MySQL for annotation platform operations data
  • Redis for cache, counters, task status, and rate limiting
  • Pinecone for vector search
  • S3 / R2 / OSS for audio, video, and attachments

Layer 5: Training and Feedback Loop

This layer brings user-generated data back into the annotation platform and supports iterative model improvement.

  • task generation
  • human labeling
  • manager audit
  • exception relabeling
  • dataset refresh
  • evaluation and redeployment

6. Technology Stack

To make the project more complete and aligned with modern AI product engineering, I designed it with a forward-looking stack.

Frontend Stack

  • Next.js 15 for the admin console and content-driven web surfaces
  • React 19 for component-based modern front-end development
  • TypeScript for maintainability and API contract safety
  • Tailwind CSS for scalable design system implementation
  • shadcn/ui for high-quality composable UI primitives
  • Zustand for lightweight state management
  • TanStack Query for server-state fetching and caching
  • Framer Motion for polished interaction design
  • MDX / Contentlayer for blogs, experiment logs, and changelogs

Mobile Stack

  • React Native + Expo
  • TypeScript
  • Expo Router
  • NativeWind
  • React Hook Form
  • Zod

Backend Stack

  • FastAPI as the AI gateway
  • Spring Boot 3 + GraalVM for core business modules in the admin and operations layer
  • Go for middleware, concurrent task execution, and system-level services
  • Rust for audio preprocessing, safe computation, and performance-critical modules
  • GraphQL Gateway / tRPC for unified data orchestration
  • Celery / Temporal for asynchronous workflow execution

AI / LLM Stack

  • LangGraph for agent workflow orchestration
  • LlamaIndex for knowledge retrieval and RAG
  • Pinecone as the vector database
  • vLLM for high-throughput model inference
  • Whisper / Distil-Whisper for audio recognition and structure extraction
  • Qwen2.5 / Llama 3 / GPT-4o-class models for explanation generation and conversational intelligence
  • CLIP / BEATs / AST for multimodal representation learning
  • Triton Inference Server for unified model deployment

Data and Infrastructure

  • PostgreSQL
  • MySQL
  • Redis
  • Kafka / Redpanda
  • Docker
  • Kubernetes
  • GitHub Actions
  • OpenTelemetry
  • Grafana + Prometheus
  • Sentry
  • Vercel

7. Annotation Platform Design

This was the most infrastructure-oriented and operationally valuable part of the project.

1. Annotation Task Flow

The platform automatically generated different task types based on sample structure, such as:

  • pet vocal emotion classification
  • behavior intent labeling
  • contextual metadata completion
  • video motion tagging
  • second-pass review for ambiguous samples

Tasks were assigned dynamically to different worker groups according to role, proficiency, and historical QA scores.

2. Automatic Payroll Calculation

One of the most practical business modules I built was the payroll system.
Compensation was not simply based on task count. Instead, it was calculated through a weighted rule system including:

  • completed task volume
  • difficulty coefficient
  • task duration
  • QA pass rate
  • audit score
  • rework penalty

This made the system much closer to a real operational platform instead of a demo or academic tool.

3. Manager Audit System

Managers could randomly sample completed tasks or target specific workers and task types for focused inspection.
Once issues were found, the platform allowed them to:

  • reject tasks
  • request relabeling
  • adjust labels
  • record QA scores
  • trigger stricter permissions or retraining

4. Workforce Change Management

The platform supported employee onboarding, offboarding, freezing, team transfer, and deactivation.
When workforce changes happened, unfinished tasks could be reassigned automatically so that the data pipeline would not break.

5. Quality Control Loop

To improve data quality, the platform introduced multiple control mechanisms:

  • cross-annotator consistency comparison
  • automatic feedback for low-confidence samples
  • escalation of controversial samples to managers
  • backward updates to annotation rules based on re-review results

The core idea behind this system was to turn data production into an operational, scalable, and continuously optimizable infrastructure.


8. Consumer-facing AI App Design

Beyond technical implementation, I focused on one central question:

How can AI become a product experience that users genuinely want to return to?

1. Main User Flow

A typical usage flow looked like this:

  1. The user records or uploads a pet sound or short video
  2. The system performs segmentation, feature extraction, and multimodal analysis
  3. The AI produces emotion probabilities, possible behavioral intent, and contextual interpretation
  4. The system combines the result with the pet’s historical profile
  5. The user can provide feedback such as “accurate” or “not accurate”
  6. Feedback is fed back into the data system for further optimization

2. Core Product Modules

Audio and Video Intake

Supports real-time recording, media import, and short-form clip processing.

AI Explanation Screen

Instead of showing a simplistic sentence like “your cat is saying it is hungry,” the app presents a more trustworthy format:

  • current emotional probability
  • possible behavioral intent
  • likely environmental triggers
  • suggested actions for the owner

Pet Growth Profile

Each pet has a long-term behavioral record that forms a timeline and an individual profile.

Personalized Suggestion System

The app combines history, current context, and behavioral signals to produce practical care suggestions.

Premium Membership Layer

Premium users can unlock:

  • long-term behavior trend analytics
  • weekly and monthly reports
  • refined pet profiling
  • abnormality alerts
  • deeper AI explanations

9. AI Workflow Design

One of the most interesting parts of the project was introducing an Agent + RAG architecture to improve explanation quality.

1. Why not just let one model generate the answer?

Because pet behavior understanding is not a plain classification problem.
If a single model simply outputs a sentence, several issues appear immediately:

  • unstable quality
  • weak explainability
  • high hallucination risk
  • poor integration with user history

So I designed a more complete AI pipeline.

2. Agent Workflow

A typical workflow was:

  1. receive user-uploaded audio or video
  2. perform segmentation and multimodal feature extraction
  3. call classification models to generate candidate labels
  4. retrieve relevant pet behavior knowledge and historical examples
  5. enrich the context with the individual pet profile
  6. generate natural language explanation through an LLM
  7. output suggested actions and risk warnings
  8. collect user feedback

3. Why RAG mattered

RAG was not used because it was trendy.
It solved two very practical product problems:

  • reducing hallucinations
  • improving consistency and professionalism of explanations

I segmented pet behavior knowledge, example cases, prior Q&A, and pet-specific memory into retrievable chunks managed in a vector database.
That allowed the LLM to generate answers with grounded context instead of free-form guessing.


10. Monetization Strategy

If developed into a real product, the monetization path would be quite clear.

1. To-C Model

For everyday pet owners, the product could follow a subscription model:

  • Free tier: basic upload and standard analysis
  • Premium tier: deeper interpretation, trend analysis, and AI-generated reports
  • Family tier: multi-pet management, shared household access, long-term health archives

2. To-B Model

There is also strong potential in industry-facing scenarios:

  • pet training organizations: behavior tagging and training support analytics
  • pet clinics: abnormal behavior trend support
  • pet insurance providers: health and anomaly monitoring signals
  • smart hardware brands: integration with collars, feeders, and cameras

3. Core Moat

The real moat of this project would not just be the app itself, but:

  • scarce multimodal pet behavior data
  • an operational annotation platform
  • a real user feedback flywheel
  • long-term per-pet memory and profiles
  • an AI workflow that can continuously improve

That means the project has the potential to evolve from a feature product into a data-driven platform.


11. Key Challenges

1. Pet behavior semantics are inherently ambiguous

Pets do not express meanings in the same deterministic way as human language, so system output should emphasize probabilistic interpretation rather than absolute translation.

2. Annotation consistency is hard

Different annotators may interpret the same pet sound differently, so platform rules, auditing, and feedback loops are essential.

3. AI output must be productized

The result should not only be technically plausible, but also understandable, trustworthy, and actionable for real users.

4. Practical usefulness matters more than technical flashiness

Users do not care how many models are used behind the scenes.
They care whether the result is stable, helpful, and worth coming back for.


12. What I Would Improve Next

If I continued developing this project further, I would focus on three directions.

Product Layer

  • multi-pet household collaboration
  • community content and short-video sharing
  • better structured user feedback collection

AI Layer

  • stronger multimodal foundation models
  • preference optimization and personalization
  • long-term memory agents for each pet

Business Layer

  • hardware collaboration packages
  • behavior analytics APIs for B2B clients
  • a combined business model of subscription + data services + industry partnerships

13. Final Reflection

What makes PetLingo AI meaningful to me is not just that I built a feature, but that I tried to shape it into a complete AI product prototype.

In this project, I worked on two highly valuable layers:

  • the internal annotation platform, which addressed the most important problem in AI products: reliable data production
  • the consumer-facing app and AI workflow, which connected real user scenarios with model capability delivery

From a project perspective, it demonstrates:

  • full-stack engineering ability
  • AI product design thinking
  • data flywheel thinking
  • business system design
  • monetization awareness

That is exactly why I believe this project works well in a portfolio or technical blog:
it is complete, narrative-rich, and representative of both engineering and product capability.