Data Is Plural Data Visualizations
Data visualizations based on public datasets referenced in the Data Is Plural Newsletter by Jeremy Singer-Vine
Data visualizations based on public datasets referenced in the Data Is Plural Newsletter by Jeremy Singer-Vine
Search prototype based on weaviate
that enables semantic and literal search over earnings conference call sentences.
The Central Theme of my dissertation entitled “Machine Learning Applications in Accounting”
Collection of custom prodigy
recipes for various text labeling tasks. I used these recipes across different research projects to annotate training data or iteratively devise regex patterns.
Some selective evidence on AI transparency through analyses of the extent and depth of model card disclosures on the Hugging Face Hub, providing insights into the state of reporting practices in the field.
Gradio app to perform fuzzy name matching on entity names and merge financial datasets in the absence of unique keys. Allows for docker deployment.
DreamBooth is a fine-tuning technique for large, pretrained text-to-image models (e.g., DALL-E2, Imagen, Stable Diffusion). Based on a small reference set of training images of a given subject or object (henceforth concept), the DreamBooth technique learns a custom identifier for the given concept and implants the concept embedding into the model’s output domain. It enables the model to synthesize images of the underlying concept in different contexts and settings with very high-quality.
This project utilizes OpenAI’s LLMs and publicly available data, including ESG reports, SEC 10-K filings, and earnings call transcripts, to build an app that searches and summarizes these data to empower users with ESG-related information needs to invest responsibly.
CLI tool for downloading various types of SEC filings from the EDGAR database.
A write-up that summarizes my personal learnings and experimentations with CLIP-guided image synthesis. It covers VQGAN, CLIP, Inference-by-Optimization, as well as various text-to-image and image-to-image experiments.
Call2Vec is a fastText word embedding model intended for semantic search in transcripts of quarterly earnings conference calls.
Assisted Citizens for Europe with their data challenges as part of a CorrelAid Data4Good project. We developed a workflow that allows for flexible generation of reports on discrimination and diversity within organizations.