Ebenezer Tarubinga — AI/ML Engineer

01 — Building

Projects

The systems I've actually shipped at government scale — plus open-source tools and research side-projects.

💧Production

AI-DMS — Water-Utility Intelligence

A production RAG system for a city's water utility — hybrid retrieval + reranking over 2.7M+ meter readings, tuned to 98.8% accuracy on our eval set. It reasons over a TypeDB knowledge graph that's flagged thousands of leaks and contamination risks.

FastAPITypeDBRAGReact 19

Gractor · production

🚰Production

OMS — Anyang Smart-Water Agent

A tool-calling agent for the Anyang water grid (26,000+ meters). I rebuilt a sprawling LangGraph stack far simpler — a third less code, ~12× faster, and 100% on a red-team eval harness — reading the live system read-only so it can't break it.

FlaskTypeDBAgentsEval Harness

Gractor · production

🏙️Production

Spain Smart-Mobility 3D Digital Twin

A live 3D digital twin of a 215 km motorway (Zaragoza–Barcelona) — a ₩5.1B Korea–Spain program, Grand Prize at the 2024 City Design Awards. I took it from demo to certifiable on PostgreSQL/PostGIS, streaming real-time traffic onto a Three.js map.

FastAPIReact 19Three.jsPostGIS

Code

🛰️Production

River CMS — Smart-City Edge Platform

Edge AI for a city of 460,000 — YOLOv5 on-device on smart-pole cameras, with MQTT and serial control to the hardware. It's offline-first: survives cloud outages and reboots a device if it goes quiet.

PythonYOLOv5OpenVINOMQTT

Gractor · production

📄Open Source

hwpkit

Read, fill & edit Korean HWP (Hancom Office) documents in Python — text extraction for LLM/RAG, programmatic form-filling, and corruption-free binary rewrite.

PythonOLE/CFBRAG

Code Docs

🎯Research

Backdoor Attacks on CLIP (BadCLIP)

Stealthy backdoor triggers in CLIP-style models via dual-embedding alignment in the joint visual-textual representation space.

Adversarial MLMultimodal

Code Slides

🧬Research

Semantic-Aware Multi-Label Adversarial Attacks

Targeted perturbations for multi-label classifiers that exploit semantic label co-occurrence dependencies.

Adversarial MLMulti-Label

Code Slides

🧩Research

Semi-Supervised Segmentation Baselines

Systematic evaluation of weak-to-strong consistency (UniMatch) and self-training (ST++) — the foundation for CW-BASS and FARCLUSS.

SSLSegmentation

Code Slides

🌀Research

Monocular Depth Estimation (Depth Anything V2)

Depth estimation and point-cloud generation for autonomous driving, with an improved depth-to-pointcloud pipeline on KITTI.

3D VisionAutonomous Driving

Code

📏Research

Effective Context Length of LLMs (STRING)

Why declared context length overstates the effective one — analyzing positional-encoding bias and the STRING shifting method across Llama, Mistral & Qwen.

NLPTransformers

Code Slides

🌱Full-Stack

AgriLet — Crop Disease Detection & Education

AI-powered disease identification with severity scoring, structured learning modules, community data contributions, and downloadable datasets.

ReactTypeScriptPrisma

Code

🎓Full-Stack

eLearn — E-Learning Platform

20+ subjects, 70+ lessons, progress tracking with study streaks, a learning-analytics dashboard, full auth, and a responsive UI.

ReactExpressPrisma

Code

02 — Research

Publications

First-author work on learning dense predictions from very few labels. CW-BASS & FARCLUSS were the former #3 and #2 global SSSS SOTA on ResNet-101; PixCon is the current #2 on foundation-model features.

#2 · Global SSSS SOTAPreprint · under review2026

PixCon: Clean-Positive Contrastive Learning for Foundation-Model Semi-Supervised Segmentation

Ebenezer Tarubinga · independent

A clean-positive pixel memory bank that admits only labeled pixels, keeping contrastive positives free of pseudo-label noise on DINOv2-scale features. Ranks #2 in semi-supervised segmentation, matching a strong UniMatch V2 baseline at lower cost.

arXiv Code Project

Former #2 · Global SSSS SOTANeural Networks · under review2026

FARCLUSS: Fuzzy Adaptive Rebalancing & Contrastive Uncertainty Learning for Semi-Supervised Segmentation

Ebenezer Tarubinga, Jenifer Kalafatovich, Seong-Whan Lee

Fuzzy top-K pseudo-labeling, entropy-normalized uncertainty weighting, per-batch class rebalancing, and prototype contrastive regularization. Formerly #2 globally on Cityscapes & Pascal VOC (78.8% / 78.2% mIoU, ResNet-101).

arXiv Code Project

Former #3 · Global SSSS SOTAIJCNN 2025 · IEEE2025

CW-BASS: Confidence-Weighted Boundary-Aware Learning for Semi-Supervised Semantic Segmentation

Ebenezer Tarubinga, Jenifer Kalafatovich, Seong-Whan Lee

Confidence-weighted cross-entropy with Sobel boundary regularization and bounded-sigmoid dynamic thresholding. Formerly #3 globally on Pascal VOC 2012 (77.15% mIoU) and top-10 on Papers With Code.

IEEE arXiv Code Project

03 — Path

Experience

AI/ML Engineer

Sept 2025 — present

Gractor · Smart-city AI · Seoul, Korea

The main engineer on four live government platforms. Most of my work is the RAG and agent systems a city's water utility runs on — retrieval at 98.8% accuracy over millions of live readings, and a tool-calling agent I rebuilt to a perfect eval score with a third less code. I also took a ₩5.1B Korea–Spain digital twin from demo to certifiable and shipped YOLOv5 edge inference on smart-city poles.

Research Engineer (MSc)

Aug 2023 — Feb 2026

Korea University · Pattern Recognition & ML Lab

Advised by Prof. Seong-Whan Lee (IEEE Fellow). Three first-author segmentation papers ranked #2 and #3 globally on Cityscapes and Pascal VOC; ~10K LOC of PyTorch multi-GPU training infrastructure; Korean patent filed (autonomous-driving perception).

AI Software Engineer

Jan 2019 — Jan 2021

GliT (GLITEC) · EdTech

Built offline-first mobile learning products reaching 500+ students and 80,000+ learning sessions, and led STEM-education initiatives.

🎓 Education

MSc, Artificial Intelligence

Korea University · 2023 – 2026

Global Korea Scholarship (sole Zimbabwe awardee) · BK21 Research Fellowship · Advised by Prof. Seong-Whan Lee (IEEE Fellow).

🏆 Recognition

GINCON Global Committee 2025

Recognized at the Korean National Assembly

04 — Toolkit

Skills

AI / LLM

RAG (hybrid retrieval)Agents & Tool-CallingModel Context ProtocolEvaluation HarnessesGuardrailsLangGraphOpenAIAnthropic ClaudeHyperCLOVA XHugging Face

ML / CV

PyTorchTensorFlowOpenCVscikit-learnNumPyPandasSciPyYOLOv5OpenVINOONNXCUDAWeights & Biases

Backend & Data

FastAPIFlaskNode.jsExpressSQLAlchemyPrismaTypeDBPostgreSQL + PostGISMongoDBRedisOpenSearchSQLite

Frontend

React 19TypeScriptNext.jsThree.jsdeck.glMapLibreEChartsTailwindRadix UIFramer MotionZustandVite

Infra / IoT

DockerKubernetesAWSNginxsystemdPrometheusGrafanaMosquitto MQTTModBus RTUVercelCloudflareGitHub Actions

Languages

PythonTypeScriptJavaScriptC++C#JavaSQLBash

05 — Credentials