Data Is Plural Data Visualizations

Data visualizations based on public datasets referenced in the Data Is Plural Newsletter by Jeremy Singer-Vine

Earnings Call Search Tool

Search prototype based on weaviate that enables semantic and literal search over earnings conference call sentences.

Machine Learning Applications in Accounting

The Central Theme of my dissertation entitled “Machine Learning Applications in Accounting”

Custom Prodigy Annotation Recipes

Collection of custom prodigy recipes for various text labeling tasks. I used these recipes across different research projects to annotate training data or iteratively devise regex patterns.

Model Card Analysis on the Hugging Face Hub

Some selective evidence on AI transparency through analyses of the extent and depth of model card disclosures on the Hugging Face Hub, providing insights into the state of reporting practices in the field.

Fuzzy Name Matcher

Gradio app to perform fuzzy name matching on entity names and merge financial datasets in the absence of unique keys. Allows for docker deployment.

The DreamBooth Technique

DreamBooth is a fine-tuning technique for large, pretrained text-to-image models (e.g., DALL-E2, Imagen, Stable Diffusion). Based on a small reference set of training images of a given subject or object (henceforth concept), the DreamBooth technique learns a custom identifier for the given concept and implants the concept embedding into the model’s output domain. It enables the model to synthesize images of the underlying concept in different contexts and settings with very high-quality.

Ungreenwash

This project utilizes OpenAI’s LLMs and publicly available data, including ESG reports, SEC 10-K filings, and earnings call transcripts, to build an app that searches and summarizes these data to empower users with ESG-related information needs to invest responsibly.

SEC EDGAR Scraper

CLI tool for downloading various types of SEC filings from the EDGAR database.

CLIP-Guided Image Synthesis

A write-up that summarizes my personal learnings and experimentations with CLIP-guided image synthesis. It covers VQGAN, CLIP, Inference-by-Optimization, as well as various text-to-image and image-to-image experiments.

Call2Vec

Call2Vec is a fastText word embedding model intended for semantic search in transcripts of quarterly earnings conference calls.

Report Automation for Citizens for Europe

Assisted Citizens for Europe with their data challenges as part of a CorrelAid Data4Good project. We developed a workflow that allows for flexible generation of reports on discrimination and diversity within organizations.

Project