Subject: Ansh Sharma

Code: P-33

Log Date: --.--.----

Observer: Self

fingerprint Professional Persona

Subject demonstrates an unusual tendency to over-think solutions to problems. Specializes in data engineering and ML systems with a particular interest in how infrastructure decisions affect the people actually using the software — which sometimes leads to questionable rabbit holes. Has a knack for this, which explains the architectural depth and lack of sleep schedule. Targeting roles in data engineering, ML, and AI/LLM systems.

Current Status

Experimental Overview (Projects)

menu_book Academia

Data Annotation and Collection Project

Contributed in a academic project on code-mixed Hinglish text classification. Work involved multi-platform data collection across Reddit, Twitter, Instagram, Facebook, YouTube, and Telegram — each with its own access constraints and workarounds.

Also handled corpus annotation and proposed a confidence-threshold based annotation pipeline to reduce manual bottleneck on a ~25k record dataset.

build Technical Core

  • Languages: Python, R, SQL, NoSQL
  • Databases: PostgreSQL, MySQL, MongoDB, Supabase
  • Data: Pandas, NumPy, Apache Arrow, Apache Parquet, DuckDB
  • AI / ML: Scikit-learn, NLTK
  • LLM Tooling: LangChain, Hugging Face Transformers, Ollama, RAG, Vector Embeddings
  • Infrastructure: Docker, Redis, Apache Superset, Flask, Git