Experience
Applied Scientist
( June 2020 - Present )
( June 2020 - Present )
I work in the Network Protection and Fraud Prevention team in Microsoft Ads. My work involves:
Conducting large-scale data analysis (over terabytes of data) of network and ads traffic to identify fraud, suspicious trends and malicious extensions.
Leverage multiple data sources to build statistical and machine learning models to detect click fraud and click farms.
Design and implement graph neural networks (GNN) to detect network fraud, click fraud and rewards fraud.
Deep Learning Researcher
( Sep 2019 - Jan 2020 )
( Sep 2019 - Jan 2020 )
I worked with Facebook AI Research for a few months to assist in their efforts to detect hoax news and bot-generated fake news on their platform. I was advised by Prof. Hany Farid from Berkeley and Dr. Ser-Nam Lim from Facebook.
I was mainly involved in two projects:
Fake News Detection: I leveraged advanced language models to contribute an open source dataset consisting of 100k AI-generated news articles. I then explored a variety of semantic (function words, readability) and textual features (word embeddings, contextual embeddings) for classification. Applied classifiers such as SVM, Random Forest, Regression and deep neural networks and anomaly detection methods such as autoencoders to identify AI generated content. Our results show that AI generated text can be detected with high F-1 scores of over 90%.
Zero Shot Learning for Misinformation: Misinformation is a constantly changing, ever-evolving space. We designed a novel embedding space to detect new classes of fake news. We applied topic modeling and out-of-distribution detection algorithms to identify novel fake news categories.
Data Scientist Intern
( May 2019 - Aug 2019 )
( May 2019 - Aug 2019 )
I worked in the IBM Chief Analytics Office, which is their internal division for data science and analytics. I worked on a project called Clarity, which is a tool for data driven product and people insights.
My work included:
Scraping text data from social media, analyst reports and online forums and leverage NLP to derive insights about IBM and competitor offerings.
Develop dashboarding tool (Python + SQL + Tableau) for sentiment analysis, report summaries, computing IBM share of voice in different markets, and identifying key persons to contact for maximizing sales.
Building classifiers to predict the sentiment of a particular analyst towards IBM products, as well as their influence on the market so as to identify the analysts/authors which needed to be contacted to improve IBM's profile.