We developed an open source tool for automatic document redaction. We contributed a dataset of 1k+ documents with redactions annotated. We developed a tiered framework for redaction which allows to control the degree of sensitivity of information to be redacted. We trained machine learning models (logistic regression, KNN, MLP) with a variety of features (simple tokenization, BERT, embeddings) to automatically produce redation annotations. Our best model had an F-1 score of 0.88.
I worked with the Citizen Clinic, a division of the the Center for Long Term Cybersecurity (CLTC). We partnered with non-profits across the world dealing with sensitive issues such as free speech, reproductive rights and land rights for indigenous peoples. These organizations were at risk from cyber attacks by highly capable threat actors and even nation states. We worked with 4 such organizations to assess their cybersecurity infrastructure. We performed vulnerability analysis and threat assessments and provided appropriate recommendations. We also designed device protocols and communication policies so as to minimize risk of cyber attacks.
One of the highest cyber risks to any organization is phishing attacks. I developed a phishing sensitivity program to be used internally as well as with our partner non-profit organizations. I leveraged the tool GoPhish and extended it to be installed on an AWS instance. I also developed comprehensive templates for phishing, designed phishing campaigns and conducted awareness trainings for the same.
We analyzed the public OPTN organ transplant database for attribute and identity disclosure with various metrics such as k-Anonymity, l-diversity and t-Closeness. We conducted a qualitative and quantitative study which showed that less than 11% people (out of 50 candidates) were aware of these privacy risks. Our analysis showed that children and women are especially vulnerable. Our experiments showed that generalization combined with differential privacy and randomized response maintained the privacy of vulnerable groups, and maintaining the database utility (by measure of the Kidney Donor Risk Index) at the same time.