Security & Privacy • Program Analysis
Abdul Haddi Amjad is a Ph.D. candidate in Computer Science at Virginia Tech, advised by Dr. Muhammad Ali Gulzar, and a member of the ProperData research group. His research applies techniques from software debugging and testing to address security & privacy problems on the web. He has twice received the GRDP and John Lee Pratt Graduate Fellowships from Virginia Tech’s Department of Computer Science. His work earned the Distinguished Artifact Award at ACM CCS 2024, and his mentored undergraduate student received the CCI People’s Choice Award. He also co-authored the Privacy and JavaScript chapters in the Web Almanac 2024.
For more: CV
For more: Research and Teaching Statement
IN JOB MARKET!!Uncovering the Usage and Privacy Risks of HTTP Custom Headers
Abdul Haddi Amjad, Umar Iqbal, Muhammad Ali Gulzar
Currently under submission
Large Language Models (LLMs) are increasingly being used in post‑development tasks such as code repair and testing. A key factor in the successful completion of these tasks is a model’s ability to deeply understand code. However, the extent to which LLMs truly understand code remains largely unevaluated. Quantifying code comprehension is challenging due to its abstract nature and the lack of a standardized metric. Prior to LLMs, this was typically assessed through developer surveys, which is not feasible for evaluating LLMs. Existing LLM benchmarks focus primarily on code generation, which differs fundamentally from code comprehension; moreover, fixed benchmarks quickly become obsolete and unreliable as they inevitably become part of training data. This paper presents the first large‑scale empirical investigation into the ability of LLMs to understand code. Inspired by mutation testing, we use an LLM’s ability to find faults as a proxy for deep understanding of code. We inject faults in real‑world programs and ask LLMs to localize them, then apply semantic‑preserving mutations to verify robustness. We evaluate nine LLMs on 600,010 debugging tasks across 670 Java and 637 Python programs and find that LLMs lose the ability to debug the same bug in 78% of programs when mutations are applied, indicating shallow understanding and reliance on non‑semantic features. We also find LLMs understand earlier code more than later code, suggesting current tokenization and modeling overlook program semantics.
Unintended Privacy Risks of Using Assistive Technology in Web Applications
Abdul Haddi Amjad, Bless Jah, Muhammad Ali Gulzar
Currently under submission
Large Language Models (LLMs) are increasingly being used in post‑development tasks such as code repair and testing. A key factor in the successful completion of these tasks is a model’s ability to deeply understand code. However, the extent to which LLMs truly understand code remains largely unevaluated. Quantifying code comprehension is challenging due to its abstract nature and the lack of a standardized metric. Prior to LLMs, this was typically assessed through developer surveys, which is not feasible for evaluating LLMs. Existing LLM benchmarks focus primarily on code generation, which differs fundamentally from code comprehension; moreover, fixed benchmarks quickly become obsolete and unreliable as they inevitably become part of training data. This paper presents the first large‑scale empirical investigation into the ability of LLMs to understand code. Inspired by mutation testing, we use an LLM’s ability to find faults as a proxy for deep understanding of code. We inject faults in real‑world programs and ask LLMs to localize them, then apply semantic‑preserving mutations to verify robustness. We evaluate nine LLMs on 600,010 debugging tasks across 670 Java and 637 Python programs and find that LLMs lose the ability to debug the same bug in 78% of programs when mutations are applied, indicating shallow understanding and reliance on non‑semantic features. We also find LLMs understand earlier code more than later code, suggesting current tokenization and modeling overlook program semantics.
How Accurately Do Large Language Models Understand Code?
Sabaat Haroon, Ahmad Khan, Ahmad Humayun, Waris Gill, Abdul Haddi Amjad, Ali R. Butt, Mohammad T. Khan, Muhammad Ali Gulzar
Currently under submission
Large Language Models (LLMs) are increasingly being used in post‑development tasks such as code repair and testing. A key factor in the successful completion of these tasks is a model’s ability to deeply understand code. However, the extent to which LLMs truly understand code remains largely unevaluated. Quantifying code comprehension is challenging due to its abstract nature and the lack of a standardized metric. Prior to LLMs, this was typically assessed through developer surveys, which is not feasible for evaluating LLMs. Existing LLM benchmarks focus primarily on code generation, which differs fundamentally from code comprehension; moreover, fixed benchmarks quickly become obsolete and unreliable as they inevitably become part of training data. This paper presents the first large‑scale empirical investigation into the ability of LLMs to understand code. Inspired by mutation testing, we use an LLM’s ability to find faults as a proxy for deep understanding of code. We inject faults in real‑world programs and ask LLMs to localize them, then apply semantic‑preserving mutations to verify robustness. We evaluate nine LLMs on 600,010 debugging tasks across 670 Java and 637 Python programs and find that LLMs lose the ability to debug the same bug in 78% of programs when mutations are applied, indicating shallow understanding and reliance on non‑semantic features. We also find LLMs understand earlier code more than later code, suggesting current tokenization and modeling overlook program semantics.
Accessibility Issues in Ad‑Driven Web Applications
Abdul Haddi Amjad, Muhammad Danish, Bless Jah, Muhammad Ali Gulzar
IEEE International Conference on Software Engineering (ICSE), 2025
Blocking Tracking JavaScript at the Function Granularity
Abdul Haddi Amjad, Shaoor Munir, Zubair Shafiq, Muhammad Ali Gulzar
ACM Conference on Computer and Communications Security (CCS), 2024
ACM CCS 2024 Distinguished Artifact Award
Blocking JavaScript without Breaking the Web: An Empirical Investigation
Abdul Haddi Amjad, Zubair Shafiq, Muhammad Ali Gulzar
Proceedings on Privacy Enhancing Technologies Symposium (PETS), 2023
TrackerSift: untangling mixed tracking and functional web resources
Abdul Haddi Amjad, Danial Saleem, Zubair Shafiq, Muhammad Ali Gulzar
Internet Measurement Conference (IMC), 2021