AI Engineer
City Detect
City Detect AI Engineer Remote · Full time Company website
Build and deploy multi-modal AI systems that extend City Detect's computer vision platform with generative and language-driven capabilities.
About City Detect
Description
We're seeking an AI Engineer with deep experience in transformers, generative models, and vision-language models (VLMs) to push City Detect's products beyond traditional object detection. You'll fine-tune, deploy, and maintain multi-modal models that combine visual and language understanding to deliver intelligent, scalable solutions across heterogeneous real-world environments.
What You'll Do
- Fine-tune and deploy vision-language models (VLMs) and large language models for production use cases
- Design and maintain end-to-end pipelines for multi-modal model training, evaluation, and inference in Python
- Develop prompt engineering strategies, RAG architectures, and other techniques to maximize model performance
- Evaluate model outputs systematically and build feedback loops for continuous improvement
- Quantize large transformer models to improve model efficiency
- Stay current with rapid advances in transformer architectures, fine-tuning methods, and multi-modal research
Requirements
- 3+ years of professional experience working with transformer-based architectures
- 2+ years of hands-on experience fine-tuning and deploying multi-modal models (VLMs)
- Strong experience with LLMs — fine-tuning, inference optimization, and production deployment
- Proficiency in Python for model development, training, and deployment (2+ years)
- Experience with deep learning frameworks such as PyTorch or TensorFlow
- Solid understanding of attention mechanisms, tokenization, transfer learning, and generative model fundamentals
- Proven experience taking models from experimentation through production-ready deployment
Nice to Have
- SQL proficiency for querying detection results, labeling metrics, or model performance data
- Experience with roadside or infrastructure object detection (signs, signals, debris, pavement markings)
- Background in GovTech, public sector, or smart city projects
- Experience in automated driving, ADAS, or autonomous vehicle perception systems
- Familiarity with model-assisted labeling, active learning, or human-in-the-loop workflows
- Experience with edge deployment or model optimization (TensorRT, ONNX, quantization)
Benefits
- Eligible for company equity incentive plan
- Fully remote position
- Unlimited PTO
- Health, vision, and dental insurance
- $100 monthly wellness stipend
- Bi-annual team retreat
- Professional development opportunities
What to Expect in Our Hiring Process
Our hiring process is designed to be thoughtful, efficient, and human. Candidates typically move through a short series of interviews over 2–3 weeks, starting with a 30-minute phone screening, followed by one or two technical conversations and a final interview with our CEO.
We focus on cultural alignment, real-world technical understanding, and career goals—not coding puzzles or LeetCode-style tests. You’ll hear back within 24 hours after each stage whenever possible. If an offer is extended, the role begins with a 30-day trial period where you’ll take ownership of a meaningful project and receive clear, ongoing feedback to ensure mutual fit.
Due to regulatory and operational requirements, we are currently only considering candidates based in the United States.
Salary
$100,000 - $135,000 per year