Go to main content
Formats
Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS
Cite

Files

Abstract

AI-based code generators have transformed offensive security by translating natural language descriptions into executable exploits. However, the semantic variability and implicit assumptions in NL descriptions limit their robustness and usability in this domain. This study evaluates nine state-of-the-art DL models, including fine-tuned models and instruction-tuned LLMs, under varying contextual information conditions to assess their ability to handle ambiguity, leverage useful context, and filter irrelevant information. Using a manually-curated dataset of real-world shellcodes and rigorous evaluations, we find that fine-tuned encoder-decoder models excel with related context, decoder-only indirectly benefit from unrelated context to better comprehend the task at hand, while instruction-tuned LLMs struggle to utilize context effectively, regardless of the prompting setting. These results underline the importance of optimized contextual strategies and task-specific fine-tuning for advancing AI-driven exploit generation for high-stakes applications in software security.

Details

PDF

Statistics

from
to
Export
Download Full History