Week 44 - Last Week In Data
Welcome to the inaugural edition of Last Week In Data, the weekly newsletter for Database Times members. Vying for how new this is, it is unlocked for all, but please consider becoming a member or a limited edition life member (you get weekly deep dives, this newsletter, monthly white papers & score cards, a members only community, and a whole lot more).
It has been a big week in the world of databases, but I have also been collecting notes from the last few weeks, which is why this is also a little longer than expected. Consider it a big month, for this edition!
MariaDB Server turned 15 years old on 29 October 2009, with the release of MariaDB 5.1.38, as a beta. It released with an impressive list of improvement, while still maintaining drop-in compatibility with MySQL 5.1. There have been plenty of changes, and from a technical standpoint, I recommend reading Monty’s retrospective. MariaDB plc also has a nice post about 15 reasons why developers and DBAs love it, though I’m surprised that MariaDB MaxScale made the list. The MariaDB Foundation also celebrated, and we see the CEO of the Foundation, Kaj Arnö, state, “MariaDB is the future of MySQL”. There was recently a gathering in India, and some pictures from the MariaDB booth show some of the vibrancy. Happy Birthday MariaDB!
This is not to say it isn’t worth reading Peter Zaitsev, How Can MySQL Catch Up with PostgreSQL’s Momentum? piece, in where he does mention MariaDB, though this warrants deeper discussion.
Anyway, on to vector search. MariaDB Vector has been around for a few months, in preview, and it is probably worth spending time with some of the documentation. InfoQ collected some quotes, and links, in: MariaDB Introduces Open-Source Vector Preview, Aiming to Become Default MySQL Option. MariaDB Foundation is also running a vector bounty program, awarding €1,500 for each completed project. It is worth noting that there is also a vectors public beta from PlanetScale, with some pretty inspiring documentation. While this isn’t open source, it does give the MySQL world options (like PostgreSQL has with pgvector
, for example).
The PlanetScale website had a redesign, and it was the talk of the town on X, and BlueSky. It only took 2 people less than 2 weeks, and Holly Guevara has said, “Great reception overall and sign up conversions are slightly up so far.” You can’t say no to earned media.
Did you know that RocksDB powers ZippyDB, MyRocks, Laser, as well as WhatsApp’s MsgDB, and others at Meta? “Over 90+ million instances of RocksDB are running in production at Meta serving over 3 EB data” - pretty impressive stats, scroll down to find a job on the RocksDB team!
Releases
- PostgreSQL 17 was released on 26 September 2024, and on the same day, it made its way into the Amazon RDS Preview Environment. Just last week, PostgreSQL 17 landed in Google Cloud SQL. I highly recommend reading Claire Giordano’s slides: What’s in a Postgres major release? An analysis of contributions in the v17 timeframe presented at PGConf EU 2024.
- MySQL 8.4.3 was released on 15 October 2024, featuring InnoDB improvements, in the performance schema InnoDB lock tracking that reduces the impact of querying
data_locks
and thedata_lock_waits
tables by redesigning them to avoid exclusive global mutex usage. The data structure used in tracking binary log transaction dependencies changed fromTree
toankerl::unordered_dense::map
which uses approximately 60% less space (for further reading: Comprehensive C++ Hashmap Benchmarks 2022). A regression in the InnoDB adaptive hash index that impacted JOIN performance was also fixed. Alongside MySQL 8.0.40 was also released, with a similar InnoDB lock tracking improvements, and the fixes to the hash index. Both feature updates to the OpenSSL library, now at version 3.0.15. - SQLite 3.47.0 was released 21 October 2024. The update includes several notable performance optimizations, with the standout features being the introduction of Bloom filters for optimizing IN operator subqueries, improved query planning for complex star queries with many dimension tables, and smart subquery reuse. Other interesting additions include new median/percentile functions in CLI, JavaScript/WASM improvements, and FTS5 enhancements with locale-aware tokenizers. The removal of long double in favor of Dekker’s algorithm for extended precision is also a significant architectural change.
Link List
- P99Conf: How eBPF Could Make Faster Database Systems - using eBPF to reroute database operations around the operating system? Andy Pavlo introduced BPF-DB, an in-memory key-value store that is inserted into the OS via eBPF, removing the need for copying things into userspace, which makes operations faster. We expect to see some open source around this, but in the meantime, you can read the slides and watch the talk, The Next Chapter in the Sordid Love/Hate Relationship Between DBs and OSes, and read a paper related to this: On Embedding Database Management System Logic in Operating Systems via Restricted Programming Environments by Matthew Butrovich, a database engineer at Apple.
Events
Happening this week:
- Postgres Conference Seattle 2024 - there is a huge amount of technical content, can’t wait to look at the presentations, and videos.
Happening next week:
- November 12 2024 - Distributed SQL Summit - Postgres without Limits - in person at KubeCon, but live for free too
- November 13 2024 - “Be Thankful: Logical Replication traps you did not fall into” — Kacey Holston - free, from the San Francisco Bay Area PostgreSQL group, succeeding in online meetups.
- November 14 2024 - IMPACT: Data Observability Summit - data, and AI, in-person, and online.
Call For Papers/Presentations
- 2024 RocksDB EOY Meetup has the call for talks open.
- If you’re into open source, FOSDEM 2025 is a must attend, 1-2 February 2025, in Brussels, Belgium. There are quite a few developer rooms, but of interest to those in databases would be the Cloud Native Databases DevRoom [deadline: 1 December 2024], the Data Analytics DevRoom [deadline: 30 November 2024], PostgreSQL DevRoom [deadline: 29 November 2024 - note that there is also PGDay before], and the MySQL DevRoom [deadline: 1 December 2024 - note that there will also be the usual MySQL Belgian Days on January 30-31 before FOSDEM].
People On The Move In The Database Industry
- Patrick Galbraith, maintainer of DBD::mysql for Perl, and longtime MySQL hacker, is now at Altinity, working on ClickHouse as a Solutions Support Engineer & DBA. He joins Alkin Tezuysal, longtime MySQL expert, now Director of Services there too.
- Danica Fine, Developer Advocate has departed Confluent.
- Andrew Hutchings has departed MariaDB Foundation as Chief Contributions Officer, to move being a Software Engineer at wolfSSL.
Work In Databases
- Salesforce is hiring for database internals in San Francisco, Seattle, and Toronto. Familiarity with logging/recovery, transactions protocols, networking, replication, High Availability, and related fields, contact Sherman Lau, Senior Director, Database Engineering, Salesforce. There is speculation that this might be Project Sayonara related (i.e. replace Oracle with PostgreSQL).
- Altinity is hiring a Build & Release Engineer
- 37signals, the company behind HEY, Basecamp, ONCE, and vocal Ruby on Rails creator, David Heinemeier Hansson, is hiring a Site Reliability Engineer in the APAC timezone (UTC+5-UTC+13), with a salary range of USD165,000-209,458.
- Meta is hiring an experienced Technical Lead for the RocksDB team. More in the description by Danny Chin, Engineering Manager at Facebook.
Have a tip? Don’t hesitate to send it via email.