Act I
The Investigation
It began with a simple hypothesis: What if we could find bugs before users do?
A developer armed with a type-checking tool started scanning Python's most battle-tested libraries.
The target: operations on values that might be None.
The subjects were three titans of the Python ecosystem:
SQLAlchemy
11.4KThe database toolkit that powers Python's data layer
Apache Arrow
16.4KThe backbone of modern columnar data processing
scikit-learn
64.8KMachine learning's gold standard library
The Verification Pipeline
Type Checker Analysis
Automated scan finds potential None-access violations
LLM Filtering
Language model triages obvious false positives. View verification logs →
Manual Review
Human manually examines code context and intent
Reproduction Code
Write minimal code that triggers the actual error
Three bugs made it through. Three issues were filed. What happened next reveals something profound about how open-source communities really work.
Interactive
Predict the Response Time
How long did each maintainer take to respond? Drag the sliders to your guess, then reveal the truth.
Your guess: 1 hour
Your guess: 2 hours
Your guess: 1 day
Interactive
Match the Response
Before we reveal the details, can you guess which project gave which response? Click a response, then click a project.
"Not a bug, use it correctly"
Issue closed, converted to discussion
"We're probably deleting this anyway"
Issue open, file slated for removal
"Let's fix it... wait, let's delete it"
PR submitted, then pivoted to deletion
Act II
The Three Responses
THE RESPONSE SPECTRUM
SQLAlchemy
"What are we doing here exactly?"
"What are we doing here exactly? We work with use cases and problems to be solved here. [...] if the pool is disposed, you should not be using connections from that pool anymore. This is expected behavior."
I pointed out an inconsistency: the same method had None-guards in some places but not others. If it was intentional, why the inconsistency?
"or that just could be old code that did this for no real reason"
THE BUG: INCONSISTENT NONE-GUARDS
# ✗ No None guard (Unsafe)
if self.server_version_info[0] not in list(range(8, 17)): # 💥 TypeError
util.warn("Unrecognized server version info...")
if self.server_version_info >= MS_2008_VERSION:
self.supports_multivalues_insert = True
else:
self.supports_multivalues_insert = False
if self.deprecate_large_types is None:
self.deprecate_large_types = (
self.server_version_info >= MS_2012_VERSION
)
# ✓ Has None guard (Safe)
self._supports_offset_fetch = (
self.server_version_info and self.server_version_info[0] >= 11
)30 hours from issue to lock
Converted to discussion, labeled "expected behavior"
Apache Arrow
The Sound of Silence (Then Pragmatism)
"those jobs have been failing for several weeks now [...] Were you manually using this?"
My honest response revealed the nature of the investigation:
"I was simply auditing the CI scripts and found this issue using static analysis."
THE BUG: MISSING ENVIRONMENT FALLBACK
top_level = os.environ.get("ARROW_HOME")
# No check if top_level is None!
path = os.path.join(top_level, "subdir") # 💥 TypeErrorFile slated for deletion
Issue #48855 • No follow-up as of Jan 25, 2026
scikit-learn
The Plot Twist
THE TIMELINE OF GOOD INTENTIONS
"This script has many other problems... I would rather just delete this file"
What started as a bug fix became an archaeological excavation.
The file had been importing sklearn.metrics.jaccard_similarity_score—a
function that was deprecated in v0.23 (2020) and removed long ago.
The Timeline of Decay
Six years of broken benchmarks, silently failing
How many performance decisions were made based on broken benchmark data?
Act III
The Archaeology of Software Evolution
These aren't just bugs. They're fossils—artifacts of decisions made years ago, preserved in code that "worked" until it didn't.
🦴 SQLAlchemy: Legacy Cruft
Inconsistent None-guards suggest code evolved over time. Some paths were hardened, others forgotten. The maintainer's candid admission: "old code that did this for no real reason."
🦴 Apache Arrow: CI Assumptions
Environment variables that always existed in CI, until they didn't.
🦴 scikit-learn: The Long Tail
A benchmark script importing from a module that no longer exists. Six years of silent failure. The long tail of API deprecation.
The Pattern They All Share
value = get_something_that_might_not_exist()
result = value.do_thing() # What if value is None?The assumption that kills: "This will always be set."
The Final Scoreboard
3 / 3
Projects where the "fix" wasn't fixing code
Reject • Delete • Delete
Interactive
Spot the None-Bug
Can you identify the dangerous line? Click on the line that will crash when the environment variable is missing.
Check Your Own Code
These bugs hid in codebases with 92,600 combined stars. What's hiding in yours?
Run a type checker
$ ty check your_project/
Search for the patterns
Look for: os.environ.get() without fallbacks, Optional types used directly
Check your CI scripts
The code nobody maintains is the code that breaks silently
In Python's None-handling underworld,
every TypeError tells a story.
This one told three.
SQLAlchemy
Closed in 30 hours
→ Discussion
Apache Arrow
Open for 11+ days
→ Pending deletion
scikit-learn
PR in 9 days, pivoted
→ File to be deleted
How many more TypeErrors
are hiding in plain sight in production code?