Files
Abstract
Websites remain popular targets for Cross-Site Scripting (XSS) attacks. Although the prevalence of XSS attacks is on the rise, many developers do not have the cybersecurity expertise to secure their web applications against these attacks. Non-security experts are often unfamiliar with writing and understanding exploit code making it difficult for them do web security tasks such as penetration testing and understanding the malicious intentions of an attacker who is targeting their web application. Automated Exploit Generation (AEG) is one solution for preemptively securing web applications against XSS attacks. Additionally, Natural Language Processing (NLP) can allow non-security experts to utilize natural language to generate exploit code and use exploit code to generate natural language descriptions of an attacker's intentions.This thesis presents HIJaX, a novel Natural Language-to-JavaScript generator prototype that combines NLP and AEG to do bi-directional English and code translations. This allows HIJaX to generate XSS attack code from English sentences as well as English sentences that explain the intentions of an attack, from XSS attack code. HIJaX provides non-security experts in the Software Development Life Cycle with a tool that allows them to understand and write XSS attacks without needing to have substantial knowledge in the field of cybersecurity. HIJaX utilizes CodeBERT, a state-of-the-art language model created by Microsoft for the purpose of translating between natural language and programming code in real-time. HIJaX trains on the malicious dataset, a curated collection of intent-snippet pairs where the intent is an English description an XSS attack and the snippet is the XSS attack code. This thesis explores different methods for dataset creation, discusses experiments that measure the usability of HIJaX, and presents the results of a user study that examines how non-security experts view HIJaX as a viable option to secure their web applications.