CREATIVE TECHNOLOGY . AI . COMPUTER VISION . DEEP LEARNING
Skills
Amazon Web Services (AWS)
TensorFlow | Keras | Media Pipe
Python | REST APIs
Jupyter Notebook | Google Colab
OpenCV | Raspberry Pi
Numpy | Pandas | Scikit-Learn | Matplotlib
Tableau | KIbana | Elastic Search
Alexa | ChatGPT | Meta AI | Gemini
ML Models
Convolutional Neural Nets (CNNs)
Transfer Learning
LSTM | RNN
Auto-Encoders | Style Transfer
Large Language Models (LLMs)
Generative AI | GANs
Summary
Hands on experience building practical AI applications
Applying Deep Learning, Machine Learning algorithms on Multimedia datasets containing Audio, Video, Image, Text, Dialogs
Leadership Roles: Sr Manager at Cognizant | Associate Director at Happiest Minds | Principal Data Scientist at Make My Trip | Project Lead at Persistent
Notable Clients: ESPN, Warner Music, Godrej
Key Areas: Computer Vision, Deep Neural Networks, Image Recognition, Video Understanding
Academics
University of Minnesota, Masters in Computer Science
Pune University, Bachelors in Computer Engineering (Best Outgoing Student Award)
Research Intern at Sony – Tokyo, Japan | Summer Intern at Amazon.com – Seattle, WA | Summer Intern at University of Southern California (USC) | Research Staff at Singapore Management University – Singapore
Research Paper on Humor Analysis in FRIENDS Dialogs at EMNLP, Sydney
Job Functions
Mentoring & guiding junior team members, to resolve any issues or challenges faced during development and deployment
Preparing & presenting project proposals by outlining the scope of work, project timeline, budget, team structure, solution architecture and technology stack
Design Solution Architecture by researching latest coding platforms, tools & libraries to solve given use-cases efficiently & accurately
Work closely with business development, sales, marketing teams to ensure projects are aligned with overall corporate strategy
Media Pipe: Extract key landmark points using Media Pipe’s hand, face mesh and body pose APIs. ML model is then built using scikit-learn and neural network in keras to
(a) capture facial expressions (happy, angry, scared, sad etc) using facial landmarks
(c) detect hand gestures like ๐ ๐ ๐ โ๏ธ ๐ค
(d) detect activity in sports videos (cycling, skating, jogging, climbing, boxing etc) by analyzing body posture of players across multiple video frames using sequential model based on RNN
Transfer Learning: generic transfer learning based ML model that can be easily applied to variety of datasets, to solve number of image recognition tasks and use-cases, without changing a single line of code:
(a) Food Classification: classify food images by identifying type of dish (pizza, burger, noodles, sandwich etc)
(b) Landmark Detection: identify famous historic landmarks and architectures (Taj Mahal, Burj Al Arab, Hampi Stone Chariot, Jaipur’s Hawa Mahal etc) in photos
(c) Optical Character Recognition (OCR): recognize indian language characters in hand written text
(d) Face Recognition: identify famous Indian celebrities (politicians, bollywood actors, sports figures)
(e) Cartoon Characters: detect popular cartoon characters like Mickey Mouse, Simpsons, Minions, Hello Kitty, Spider-man, Batman etc printed on kids articles like backpack, water bottle, pillow, t-shirts
(f) Wild Life: identify animals (gorilla, elephant, lion, tiger, giraffe, deer) in wild life documentary films
Auto-Encoder: encoder – decoder model to to colorize B&W images
Computational Music: Audio classification model to
(a) detect musical instruments played in the audio (violin, guitar, flute, piano etc)
(b) detect musical notes (C, C#, G, A#, F etc) played on piano
(c) voice recognition to extract lyrics (words) from songs