Projects

A list of small data science and coding projects, TILs (Today I Learned), and ongoing research projects, alongside links to supplementary resources (e.g., code, models, online appendices, etc.).

2024

Earnings Call Search Tool

Search prototype based on weaviate that enables semantic and literal search over earnings conference call sentences.

February 1, 2024

HNSW graph, Layer 0


By Simon Schölzel in Project

GitHub

Custom Prodigy Annotation Recipes

Collection of custom prodigy recipes for various text labeling tasks. I used these recipes across different research projects to annotate training data or iteratively devise regex patterns.

January 16, 2024

localhost:8080


By Simon Schölzel in Project

GitHub

Model Card Analysis on the Hugging Face Hub

Some selective evidence on AI transparency through analyses of the extent and depth of model card disclosures on the Hugging Face Hub, providing insights into the state of reporting practices in the field.

January 3, 2024

The Hugging Face Hub


By Simon Schölzel in Project

2023

Fuzzy Name Matcher

Gradio app to perform fuzzy name matching on entity names and merge financial datasets in the absence of unique keys. Allows for docker deployment.

2022

The DreamBooth Technique

DreamBooth is a fine-tuning technique for large, pretrained text-to-image models (e.g., DALL-E2, Imagen, Stable Diffusion). Based on a small reference set of training images of a given subject or object (henceforth concept), the DreamBooth technique learns a custom identifier for the given concept and implants the concept embedding into the model’s output domain. It enables the model to synthesize images of the underlying concept in different contexts and settings with very high-quality.

December 30, 2022

Holbox, Mexico


By Simon Schölzel in Project