Research

My research interests lie in the field of security and privacy, and their intersection with machine learning. Over the years, I have had fruitful research collaborations with reputed organizations like Palo Alto Networks, Indian Space Research Organization (ISRO), Facebook AI Research and Princeton University.

Understanding Underground Review Economies

Advisor: Prof. Zubair Shafiq
University of California Davis

[Paper] [Code] [Slides]

While human factors in fraud have been studied by the HCI and security communities, most research has been directed to understanding either the victims' perspectives or prevention strategies, and not on fraudsters, their motivations and operation techniques. Additionally, the focus has been on a narrow set of problems: phishing, spam and bullying. In this work, we seek to understand review fraud on e-commerce platforms through an HCI lens. Through surveys with real fraudsters (N=36 agents and N=38 reviewers), we uncover sophisticated recruitment, execution, and reporting mechanisms fraudsters use to scale their operation while resisting takedown attempts, including the use of AI tools like ChatGPT. We find that countermeasures that crack down on communication channels through which these services operate are effective in combating incentivized reviews. This research sheds light on the complex landscape of incentivized reviews, providing insights into the mechanics of underground services and their resilience to removal efforts. 

Malware Detection with BERT

Advisor: Prof. Dawn Song
University of California Berkeley | Palo Alto Networks

[Paper] [Code] [Slides]

The goal was to detect if an application was malware based on sequence of API calls made to the OS. We explored several supervised methods (logistic regression, deep neural networks, SVM, LSTM and decision trees) as well as unsupervised models (clustering, autoencoders, DAGMM) but none of them showed reasonable results in a high-imbalance scenario (<5% malware). Finally, we developed a technique based on BERT, which leverages cross-domain transfer learning and shows an F-1 score of 0.92 with only 2% malware in the training data.

Unsupervised Malware Detection with Transformers 

Advisor: Dr P. Krishnaiah
Indian Space Research Organization (ISRO)

(Paper and code cannot be released for security reasons)


We explored the use of transformers to detect malware in an unsupervised fashion, in particular, for zero-day attacks. We harvested a novel dataset of malware by sourcing files from different organizations and agencies, simulating them in sandbox environments and collecting various behavioral features. Then, we designed and implemented a new pre-training algorithm that allows a model to be pre-trained on generic known malware, and fine-tuned for classification on a specific family. We also developed a framework that combines malware detection with anomaly detection and enables unsupervised transformer-based malware detection. We benchmarked the performance of the newly designed algorithm and found that it is able to surpass traditional supervised machine learning models, as well as detect zero-day attacks efficiently.


Background Check on Neural Nets

Advisor: Prof. Prateek Mittal
Princeton University

[Paper] [Code] [Slides]

We investigate to what extent the increasing performance of deep neural networks is impacted by background features. In particular, we focus on background invariance (accuracy unaffected by switching background features) and background influence (predictive power of background features itself when foreground is masked). We perform experiments with 32 different neural networks including state-of-the-art models. Our investigations reveal that increasing expressive power of DNNs leads to higher influence of background features, while simultaneously, increases their ability to make correct predictions when background features are altered. 

Lifelong Anomaly Detection

Advisor: Prof. Dawn Song
University of California Berkeley

[Paper] [Code] [Slides]

We developed a method called unlearning which facilitates model update without retraining, and can improve model performance by maximizing loss for false negatives. We propose a new loss function which incorporates learning rate shrinkage and Elastic Weight Consolidation (EWC) so that the model parameters change according to false positive and negative information. We validated the unlearning approach with three models: LSTM, regression and autoencoders to show that our method is model agnostic. We were able to reduce up to 77.3% false positives and 76.6% false negatives. 

AI-Generated News Detection

Advisor: Prof. Hany Farid and Dr. Sadia Afroz
University of California Berkeley | Facebook AI Research

[Paper] [Code] [Slides]

We leveraged language models to contribute an open source dataset consisting of 100k AI-generated news articles. We then explored a variety of semantic (function words, readability) and textual features (word embeddings, contextual embeddings) for classification. Applied classifiers such as SVM, Random Forest, Regression and deep neural networks and anomaly detection methods such as autoencoders to identify AI generated content. Our results show that AI generated text can be detected with high F-1 scores of over 90%.

openRedact

Advisor: Prof. Daniel Aranki
University of California Berkeley | Hasso Platner Institut

We developed an open source tool for automatic document redaction. We contributed a dataset of 1k+ documents with redactions annotated. We developed a tiered framework for redaction which allows to control the degree of sensitivity of information to be redacted. We trained machine learning models (logistic regression, KNN, MLP) with a variety of features (simple tokenization, BERT, embeddings) to automatically produce redation annotations. Our best model had an F-1 score of 0.88. 

Phishing Simulator

Advisor: Prof. Steve Weber
Collaboration with the Center for Long Term Cybersecurity

One of the highest cyber risks to any organization is phishing attacks. I developed a phishing sensitivity program to be used internally as well as with our partner non-profit organizations. I leveraged the tool GoPhish and extended it to be installed on an AWS instance. I also developed comprehensive templates for phishing, designed phishing campaigns and conducted awareness trainings for the same.