SCDB Trend Explorer

Built as part of an Introduction to Database Systems course, this project is a database and web app for analyzing trends in the Supreme Court Database (SCDB).

Description

As part of my Introduction to Database Systems course, I led a team of five developers to build a database system and a web application frontend for analyzing the Supreme Court Database (SCDB), which is a comprehensive collection of Supreme Court cases from the court's founding through 2023, hosted by Washington University Law.

Our goal was to design a database system with at least 500,000 tuples and demonstrate at least five complex trend queries. This meant we needed a rich, intersecting dataset that would let us explore a variety of data combinations and analyze trends over time.

After some deliberation, we chose the SCDB because of its strong chronological structure, wealth of data points, and the many possibilities it offered for running meaningful trend queries. I cover some of them in later sections.

The project spanned the entire semester and was broken down into several major tasks. We needed to:

Convert the SCDB CSV files into SQL insert commands
Design a relational schema to support complex trend queries
Write and test those queries
Set up and use an API to connect the frontend to the database for querying and data retrieval
Build a web frontend to visualize and interact with the trends, allowing users to adjust certain parameters
Create a JavaScript tool to turn the database output into custom graphs

We divided these tasks among the five of us. I'll go into detail about my specific contributions in the next section. Despite some bumps along the way, we ended up with a fully functional database of over 800,000 records and a frontend that surfaced genuinely fascinating insights.

My Role

My main responsibilities were designing the relational schema for the database and converting the SCDB CSV files into SQL insert commands.

I started by designing the schema itself. We chose to focus on the “vote” CSVs from the SCDB, where each row represents a single justice's vote on a single case. This helped us easily reach the minimum tuple count, but more importantly, gave us the flexibility for complex trend queries.

One key detail in the SCDB is its “ideology” value: either 2 (Liberal) or 1 (Conservative) for each decision. Because of how I structured our tables, we were able to make full use of this in our queries.

Once the schema was in place, I wrote a Python script to parse each row of the CSVs and generate the corresponding insert commands. This was my first major challenge on the project.

The SCDB actually consists of two datasets: modern and legacy. While both use the same columns, the legacy data is much less consistent. I quickly realized I couldn't assume data would always be present, so I adapted the schema, adjusted which values were keys, and added composite keys where needed. After some iteration, the script was reliably parsing the data and outputting a complete SQL file with insert commands.

But that wasn't the end of it. Some insert commands still failed; my assumptions about data types didn't always hold up, and with a dataset this size, fixing these issues wasn't straightforward. If a mistake slipped through, I'd have to clear out the database and reload all 800,000 rows, which was a slow process. Changing the schema meant even more reloading.

This part of the project was all about trial and error: running each new batch of insert commands, tracking down errors, figuring out the fix, regenerating the script, and starting over.

Eventually, the script was producing a valid insert for every row in the CSVs. With the database complete, I moved on to help my teammates tackle their own parts of the project, which I discuss in the next section.

Deadline Adaption

As the demo deadline crept closer, we hit a serious roadblock with API integration the night before. We could connect and ping the database, but the API simply refused to cooperate with the frontend. Most of my teammates had early classes, so by late night it was just me and one other teammate left to keep working.

With nothing scheduled the next day except the demo, I dug in for the long haul. We spent hours experimenting: tweaking security rules, trying different approaches, and fighting with CORS errors that kept blocking our requests. Around 3 AM, after a lot of dead ends, I finally got a successful call through to the API and was able to wire up a crucial frontend feature: a tuple count query.

I don't think I've ever felt that level of project stress before. With almost no API experience going in, I had to learn fast, troubleshoot on the fly, and keep going despite being exhausted. It turned out to be one of the best crash courses I've had in picking up new tools and pushing through pressure.

Core Accomplishments

Python ETL

Most of my work on this project centered on data preparation, especially writing the Python script that converted SCDB CSV files into SQL insert commands.

My script handled two main tasks: generating all the insert commands and producing a file with the create table commands for our custom schema. As I worked, the script evolved; each time I learned more about the data, I updated the schema and the corresponding commands. This process led to a pretty complex implementation, with plenty of alter table commands to support compound and foreign keys as the project progressed.

The insert commands required some creativity. Because the CSVs were vote-based—each row represented a single justice's vote—there were many repeated values like case ID or docket ID. To avoid duplicate entries, I used sets to check if each related value was unique before creating a new insert command. For some fields, like new justices or courts, I added additional logic to handle those cases specifically. Once everything was verified, the insert was added to the output, and the script continued down the rows.

I also had to do quite a bit of input validation, since the SCDB data wasn't always consistent or in a format Oracle would accept. I built helper functions for everything from date parsing to numeric formatting, as well as a general clean_field function for handling nulls, NaNs, empty strings, and any other oddities that showed up in the CSVs.

I hadn't worked much in Python before this, but I knew it was great for file I/O, so I picked it as both the right tool and a learning opportunity. And learn it I did. The hands-on experience with Python's syntax and workflow made me much more confident in picking up unfamiliar languages, a skill that's carried over to projects like building this very website.

Trend Queries

Working on the trend queries was genuinely fascinating for me. As someone who's always been a bit of a politics nerd, seeing the Supreme Court's history unfold through the data, and being able to analyze it in new ways, made the hard work to get everything running feel especially rewarding.

One of the most interesting queries we built examined the ideological leaning of justices, based on their individual voting patterns. My biggest personal discovery was that, regardless of the court's makeup or the apparent ideological tilt of its members, the aggregate decisions almost always trend moderate - a 1.5 on the ideology scale (with 1 representing conservative and 2 representing liberal). I found it both fascinating and reassuring that, even when one side has a clear majority, the justices tend to settle near the middle rather than sticking rigidly to party lines.

This project also gave me some new perspective on the work I do. I genuinely enjoyed exploring what I'd built, even though it started as a tough class assignment. What stuck with me wasn't the grade, but the process of learning from my own tool. It shifted how I see my work; there's a different kind of satisfaction in building something you can actually use and learn from, beyond just finishing the task.

Challenges & Lessons

This project was a true team effort. With such a full-stack application, every member brought something different to the table, and having that diversity of skills was essential to our success. Even when I had to step in and help move things forward, I always built on the groundwork laid by my teammates. There was never a moment where anyone was completely on their own; we supported each other throughout.

The whole experience was a crash course in pragmatic engineering. I learned how to adapt quickly to unexpected technical challenges and became more comfortable diving into unfamiliar code or picking up new tools on the fly. Whether it was troubleshooting late-night API issues or reworking our database schema for the hundredth time, I learned to stay focused under pressure and find solutions, even when things got messy.

Just as important were the lessons in teamwork and collaboration. Balancing individual contributions with group deadlines, stepping up to help wherever needed, and communicating clearly were every bit as critical as any technical decision. Those habits: clear communication, adaptability, and a willingness to pitch in, are things I've carried into every project since.

Explore the Supreme Court DatabaseSCDB is the official source for the data behind this project.