Nischit Kumar
Aspiring Machine Learning Researcher
Passionate about building scalable and intelligent systems.
BITS Pilani - Goa Campus
B.E. (Hons) Electronics and Communication
M.Sc. Economics
Goa, India
Building the future of intelligent systems
I'm currently exploring Reinforcement Learning, ML for Systems, and Language Models. I'm fascinated by RL algorithms, their applications, and designing efficient systems within hardware constraints.
I believe the next frontier of AI isn't in larger models, but in interdisciplinary systems that manage resources intelligently. My goal is to contribute to accessible and sustainable AI.
Beyond academics, I enjoy reading about tech and sports, listening to music, playing Cricket and Basketball, and staying curious.
Research Interests
Skills & Expertise
Core Areas
Systems
Languages
Frameworks
Tools
University Coursework
Linear Algebra
Probability & Statistics
Differential Equations
Control Theory
Computer Programming
Digital Design
Operating Systems
Econometric Methods
Online Certifications
Stanford CS224R: Deep Reinforcement Learning
Stanford [YouTube]
Andrew Ng: Deep Learning Specialization
Coursera
Foundations of Machine Learning
Udemy
Computer Networks
YouTube
Research & Professional Journey
Research Assistant
Optimizing the Traveling Thief Problem (TTP) using Deep Reinforcement Learning (PPO and SAC) and Combinatorial Optimization techniques (POMO).
- Applying DRL algorithms (PPO, SAC) to combinatorial optimization
- Supervised by Dr. Abhay Sobhanan
Undergraduate Researcher
Privacy Preserving Federated Learning research, implementing Homomorphic Encryption (HE) and Differential Privacy (DP) while optimizing privacy-accuracy trade-offs.
- Integrated RSPN pipeline in C++ and studied Mutable DB codebase
- Supervised by Dr. Arnab K. Paul
Undergraduate Researcher
Collaborated with University of Cambridge to optimize an LLM-based generator for drug discovery using graph search algorithms.
- Collaboration with University of Cambridge researchers
- Supervised by Dr. Ashwin Srinivasan, Dr. Tirtharaj Dash, and Dr. Raviprasad Aduri
Building & learning in public
Paper implementations and hands-on explorations of ML concepts
Twin Delayed DDPG (TD3)
Implemented TD3 in PyTorch within the Hopper-v5 environment to address systematic overestimation bias by integrating Clipped Double Q-Learning and Target Policy Smoothing.
Impact: 25-35% higher peak reward and substantially more stable learning dynamics compared to baseline DDPG.
Proximal Policy Optimization
Implemented PPO in PyTorch within the Cartpole-v1 environment. Addressed policy gradient variance by integrating clipped objective functions and adaptive KL divergence penalties.
Impact: Stable learning with average episode reward around 9.5-9.7 over 300+ episodes using ε=0.2 clipping.
Let's connect
Open to collaborations, research opportunities, and interesting conversations about RL, LLMs, and beyond.
Prefer a quick chat? Feel free to reach out and I'll respond as soon as possible.