April 30, 2006
Limits in mysqldump?
Coming away from the MySQL Users Conference I have a short list of questions and/or ideas to research over the next few weeks.
The first is a question posed by an individual after the session on backups. She had been told that mysqldump works good for small installations but once you get above a gigabyte of data it can't handle the dump or restore process.
I've been using mysqlhotcopy for a few years now, but before that was using mysqldump on a few fairly large databases. At the time we switched over to mysqlhotcopy I think our system was around 20 gigs of data, and although the backup/restore process took a good chunk of time, it never failed because of problems with table or database size.
But since it's been awhile I thought I'd do a rudimentary check using a few gigs of data. I grabbed the Sakila sample database and dumped a bunch of data into it, enough to grow it to just over 2 gigabytes (this is all on my MacBook Pro laptop).
At 2 gigs It takes 5m 16s to dump with mysqldump and 11m 49s to restore that database from the SQL statements in that dump file. So it's definitely doable when you get over a gig of data.
How far up can you go? I've used it successfully dumping in the 20-30 gig range. It takes awhile when you get up into these amounts of data but still does the job. It seems like when you get up in this range with MyISAM tables you start looking at mysqlhotcopy as a faster alternative.
Anyone have evidence against this (or in favor)?
Posted by mike at 2:35 PM
April 27, 2006
MySQL Users Conference Winds Down
All of the closing-social ice cream is gone and the conference center has quieted down to just folks putting equipment away. The "MySQL" wireless network is the last remaining indicator of the weeklong activities here in the Santa Clara Convention Center.
(Oh dear, I'm starting to sound like a reporter doing an intro to a technology segment).
It's been a jam-packed 4 days here in Santa Clara at the MySQL 2006 Users Conference. There has been so much that has happens since arriving on Sunday morning it seems more like a long, quickly moving blur.
For me this year was such a different experience from last year. The content was the same high quality as it has been in the past (this is my 3rd conference), but returning this year to familiar faces and having made connections with folks over the past year on planetmysql and other MySQL online gathering places has added a much larger social element to the conference. Also met a lot of new folks in sessions, at meals, during the evening BoF or social events. Even met some folks who had read the book and recognized the name on the tag.
Am so looking forward to the next year with MySQL. Not only have there been a number of new exciting things announced that MySQLers can be looking forward to, I've also got some big hopes for ramping up our system at work and using some of the things I've learned while here for implementing a solid architecture to carry the system at work through the next few years.
Posted by mike at 8:55 PM
MySQL Movie Magic
Closing keynote of MySQL UC 2006 is Milton Ngan from Weta Digital talking about using MySQL in making movies. I've heard Milton speak at OSCON in the past, some impressive movie stuff they do with CG.
Milton starts with a review of the movies that have come out of Weta and talks about the number of shots and number of processors they used to render those shots. King Kong had over 2000 shots and required 4,500 processors. Milton is doing this presentation on a Mac.
Weta now has over a petabyte of data on tapes for storing shots. Has 120 terrabytes worth of disks connected with 10Gb ethernet. Milton goes through some photos of the facilities, one photo has a row of cabinets that holds ~2,500 processors. Milton then goes through a number of snips from King Kong showing camera footage with CG layovers.
What uses MySQL at Weta? Production management. Storing details about shots, who's working on what, and HR system. Also have a user database, access control, system monitoring, booking systems, polls, online crew gear store. Current count is 5 production servers, 10 replicas with ~100 databases.
Why they use MySQL? Simple to setup, reliable, and scalable. To date have not lost any data.
They have a system used for shot info. Tracks thousands of shots, tracks all cuts and edits. Also tracks plates and rolls, and shot assignments.
Posted by mike at 6:43 PM
Jim Starkey: Introducing Falcon (Storage Engine for MySQL)
Last session of the MySQL UC is Jim Starkey giving an introduction to the new Falcon storage engine. Jim is an icon in the database field, was the creator of MVCC and the BLOB data type. There's a can of Falcon beer for anyone who asks a good question.
What is Falcon
- transactional MySQL storage engine
- based on Netfrastructure database engine
- engine has been in mission critical apps for more than 4 years
- extended and integrated into MySQL
Falcon is NOT
- an InnoDB clone
- a Firebird clone
- a standalone database management management system
Jim's been at this for a long time, there have been some changes since he wrote his first database at DEC:
- Uni-processors to multi-core
- CPU performance from 1.7 to 42000 MPS
- Address space from 32-bit to 64-bit
- Memory speed from 1 micro to 7 nanoseconds
- Memory from 2mb to 4GB
- Disk access has gone from 20 ms to 7 ms
- back then 100 users was big, now 100,000 users is a small application
- RDMSs moved from decision support -> OLTP -> Data minint -> Web
- DBAs are smarter and they are harder to find
- Application programmers skill levels are down, using higher-level languages
- Expert design means appearance, not architevture
- Applicactions have much larger databases, more queries per human interaction, fewer rows per results set, latency is more critical, blobs are much larger (more important), and search without context.
What Jim has learned
- CPUs and memory are faster
- disks are much slower
- MVCC (which was invented by Jim) works
- Record versions on disk are problematic
- Web applications are better and are the future (the attention span is low so you have to make good ones and are forced to constantly be making them better)
- People have more important things to do than tune databases - these days machines are powerful enough to be able to tune themselves, people shouldn't have to spend
Falcon is designed for the next 20 years. Jim is comfortable saying that what he's learned over the past 20 years in databases and has put into Falcon will be .
Goals of Falcon
- Exploit large memory for more than just a bigger cache
- Use threads and processors for data migration
- design to elimnate tradeoffs, minimize tuning
- scale gracefuly to very heavy loads
Architebtural OverviewIncomplete in-memory database with backfill from disk that has two caches. The traditional LRU page cache for disk. A larger row cache with age group scavenging. Falcon is multi-version in memory and single version on disk. All transaction state is in memory with automatic overflow to disk. Data and indexes are single file plus log files. In the future Jim would like to create BLOB repositories where the data is stored off to the side. Hoping to provide multiple page spaces in the future.
Falcon uses Btree indexes with prefix compression. There is no data except the key in the index.
Uncommitted row data is staged in memory (can overflow to to scratch file). Indexes are updated immediately. On commit row data is copied to the serial log and written. Post commit dedicated thread copies row data from serial log pages to data pages. Page cache is periodically flushed to disk. BLOB data is scheduled for write at creation.
Data reliability is protected by "careful write" where writes are sequensed to the disk so it's always valid and consistent. There is a repair mechanism but Jim's hope is that noone will ever have to use it.
Falcon has a do/redu/undo log in the serial.
Jim's got a secret agenda of things he'd like to do in the database world, starting with MySQL:
- Jim wants to replace varchar with string, varchar is a throwback to punchcard technology
- Replace tiny, small, medium, and big integers with "number"
- Adopt a security model useful for the web
- Row-level security (filter sets)
- Teach the database world the merits free context search - why do you have to use a SELECT statements
- Foreign keys - have them but not enforced
- Backups - Netfrastrucure has two
- When can we see it - will get a beta in Q3 ( from Robin)
- Filesystem storage - stored in one file broken into fixed-length pages from 4K to 32K
- Memory - everything except BLOBs come into the cache
- Performance . . . 420 GB instance in Massachusetts that is pretty active and is only using 30% of the CPU
Posted by mike at 5:19 PM
Django: The Web Framework for Perfectionists with Deadlines
Afternoon presentation is Jacob Kaplan-Moss, an author on the django framework. I got to hang out with Jacob a bit in the speaker's lounge yesterday. Cool guy who's passionate about the framework and building maintainable applications. He's writing a Django book (with Adrian) for Apress so was fun to compare notes on experiences. Listening to Jacob makes me want to build a web application, even though I've got 100 other things on my plate.
Jasob starts with an introduction to the Lawrence, Kansas scene and newspaper (where he works). Much like the Snakes and Rubies presentation by Adrian Holovaty in Chicago last November. The Django story is pretty cool, and something that is entertaining to listen to.
Django is similar to MVC but is more like MTV (shows the MTV logo), or Model-View-Template.
The model is a simple snip of code that defines ojbects. The Django framework gives you and automatic interface for your databases. Can do pretty much everything from the model including subqueries, but you can also drop into raw SQL mode. Data isn't fetched from the database until you need it. With two lines of code he gets an admin interface to the data.
A view is about what data is available, not what it looks like. Django makes URL design an intentional part of the application. People like to guess URLs, snip pieces of the URL off to get to higher pieces of information.
Templates are about what your data looks like. It's a lot like smarty or other templates. Template inheritance is a cool feature of the Django templating system. A good chunk of the functions that people use require no or very little coding.
Posted by mike at 4:31 PM
Measuring MySQL Server Performance for Sensor Data Stream Processing
Jacom Nikom (MIT Lincoln Lab and Boston MySQL Meetup regular) is presenting about performance when processing sensor data stream.
Where does the data come from? Multiple locations, 10 pictures of different kinds of sattelite and telescope equipment. I get the sense that there is a ton of data that is being collected. Sensors send messages to a publish/subscribe middleware which apply algorithms, sends it through a historian and down to MySQL.
The historian creates a thread for each subscription. The historian processes the message and creates an INSERT query. This is happening from many sensors, each operating at as many as 100 messages/second.
When choosing a database they looked at benchmarks, tried to determine performance based on the predicted databases. Wanted to know how MySQL would respond to moving.
Jacob gets into some serious detail about performance on various versions of Linux, hardware, and MySQL. The slides are the better place to get this info. This is one of the most in-depth performance reviews I've ever seen, lots of variables and plots.
Of note is a look at comparison between 4.1 and 5.0. MySQL 4.1 outperforms 5.0 in all record lengths. MySQL 4.0 is slower than 5.0 for both small and large records. So MySQL 4.1 was the best performance for their application.
Posted by mike at 2:24 PM
MySQL UC Attendee Helps Save Life
This morning Jay mentioned in the MySQL UC keynote introduction that Beat Vontobel had helped save a woman's life earlier this week.
Without getting a lot of detail the story is that Beat was outside on the sidewalk and witnessed this major accident out on the road. He dropped everything and ran to the car where he checked pulse and breathing and talked to the person as she regained conciousness while the emergency response was on the way. Once they arrived he helped by moving to the back seat of the car and holding the driver's neck straight so the emergency response person could perform checks without risking movement of the spine.
Man, that's remarkable.
Posted by mike at 2:23 PM
Mitch Kapor: Learning from the Wikipedia
Second keynote of Thursday morning (last day) at MySQL UC 2006 is Mitch Kapor talking about the Wikipedia. Mitch starts with a survey of the crowd. Most people in the crowd use the wikipedia, 15% of the crowd edits the wikipedia.
The typical response from a person that doesn't know about the wikipedia says "The Wikipedia can't possibly work..."
but it does
The mainstream media is very skeptical. Anyone can edit any article at any time. This radical openness is more open than open source projects. There is no review and anyone can post. The radical openness provides opportunities that have never been tried before.
Ideas that are challeneged by the Wikipedia:
- someone has to be in charge - even the sysadmins are volunteer and do not keep a schedule
- without the experts, how can you trust the information - the experts aren't always the best to answer questions (think of GWB and WMD, which were never found).
Mitch talks about study that compared the Wikipedia and Britannica. The problems that were brought to light were fixed the next day in the Wikipedia which is a testament to the culture and purpose of the wikipedia. Not the only case where this has happened. Katrina hurricane coverage was astounding.
Is Wikipedia the next big thing?
In 1978 Mitch had an Apple II. You have to look beyond the technology to the individual. It's hard to remember how radical the idea of a personal computer was. People laughed at Mitch when he was at MIT looking at huge, room-size computers and mentioned the personal computer. Lotus was the Google of it's time in 1982. In 1992 UUNET started making internet connections available, before the web. Mitch tried to get venture folks involved in looking at a larger distribution of internet connections and couldn't even get a meeting with VCs, the idea was too radical.
Wikipedia is fundamentally a community project, not a technical project. It's a social arrangement. People are there because they want to be there.
Values of the community
- be nice - be respectful - make your opponents case for him or her (find the things about your opponents arguments and put them in along with your article)
- NPOV (neutral point of view) - a criteria of the contributors
- Don't criticize, improve - don't complain, do what you can do to make it better
- Real-time peer review - a feed of all changes goes out and there are many volunteers that watch it
The challenge of alien invaders is something that takes effort to monitor. They found that with some politicians that their staff was coming in and creating spin on the article. Tracked down by IP addresses.
The core engine of the Wikipedia is going to become a problem if it doesn't impove. Wikipedia runs on Mediawiki.
Posted by mike at 12:19 PM
Mark Shuttleworth: Delivering on the Promise of Free Software
The first Thursday morning kenote at MySQL Users Conference is Mark Shuttleworth. Mark has done a ton of stuff (Thawte, space flight), but he's here today to talk about Ubuntu Linux, which he founded. Ubuntu is a fairly universal term across several African tribes that translates to "human-ness" the way you look after other people expresses best who you are.
10 years ago MySQL leveled the playing field. Mark was able to get into development as a kid in a garage many years ago because of MySQL and other open source tools.
Mark says you never want to be trying to make or be the next X, you want to pioneer in new ways.
The new deal of free software has different economics. Software is free globaly, provides free updates, and is supported by MANY commercial companies. There's a new culture with community governance that is open to participation on all levels. Participants get to see the development and decisions.
Ubuntu has set out to build a community-driven distribution that can grow and sustain itself through support and services.
What Ubuntu attempts to provide:
- freedom and flexibility
- no licensing fees
- superb commercial support
- standards compliance
- internationalization and customization
- collaboration to the core
Ubuntu attempts to keep drivers up to date, very important to desktop users of Linux.
To make it sustainable it's important to maintain relationship with upstreams (kernel, Apache, MySQL, Firefox, etc). Also are staying close to other distributions, making sure Ubuntu is using ideas from other distros. Mark suggests we're getting pretty good at internal collaboration on specific project but there needs to be a lot more cross-project collaboration. A person working on one project that discovers a bug in aother should have a channel for free exchange about it.
Rather than trying to look at what people have done before that's good and try repeat it we should be trying to step back and figure out what we can do that other people couldn't do.
Free software enables a different economic paradigm. Embrace it.
Posted by mike at 11:38 AM
April 26, 2006
One Tired Dude
Man I'm wiped out. Have been operating on 5-6 hours of sleep a night since Sunday and it's starting to take it's toll. I should probably resist the urge to stay up hanging out with folks (and then trying to get work done after that).
I suspect there will be no problems sleeping on the red-eye flight home to Boston tomorrow night. Just one more jam-packed day to go, if I can make it through tonight's MySQL UC quiz show without falling over. With Markus, Beat, Giuseppe, Peter Z., and others on the team I think we're in good shape. Markus did a great job getting folks signed up.
I must admit that it has been a lot of fun and worth the late nights.
Posted by mike at 9:47 PM
Conversation with Jim Starkey
As a member of the Boston MySQL meetup I recommended to Sheeri (group organizer) that we try to get Jim Starkey, the author of the MySQL Falcon storage engine (or more accrately the database that will become Falcon) and recent MySQL hire, to come speak at a future meetup. Why? He lives in New Hampshire, within reasonable drive from Boston. Wouldn't that be cool, to have Jim Starkey giving a MySQL meetup session in Boston?
I finally bumped into Jim this morning at the MySQL UC and had a chance to talk with him. The initial conversation was about the meetup, which he gracefully agreed to come to sometime this summer (July/August). Beyond that was a great conversation about where MySQL is now, where Falcon fits in, and some of Jim's ideas about a variety of things including DBAs, stored procedures, data types, and the SQL standard. He's got some refreshing views on database technology and is an enjoyable person to talk with. This interview from 2003 captures a bit of Jim's approach to database problems.
I echo what I've heard other folks say, it's good to have Jim working with MySQL.
So look for Jim Starkey coming to Boston this summer. I'll see what I can do about getting a better video setup going.
Posted by mike at 5:41 PM
MySQL Backup Presentation Slides
Glad to have the backup presentation completed here at MySQL UC 2006. Was a decent gathering, not packed but enough to make the room feel full. Jay and I tried to take a minimalist approach on the session material in order to be able to keep a decent pace have a good chunk of time at the end for Q/A.
I got some positive feedback from the session from a few folks, a few that were very appreciative of the pace and extra time at the end for Q/A and comments.
Here are the slides in PDF format, probably no reason to put up the native presentation file.
Posted by mike at 4:37 PM
Crazy Evenings Juggling Birds of a Feather
After a long day packed with sessions at the MySQL Users Conference it seems like the evening would be a time to relax, right? Wrong.
The evenings seem to be more packed than the days. Four birds of a feather sessions that all were equally interesting but overlapping tonight. Didn't make it to the "Server Failover" BoF because I wanted to check out the Perl BoF. Had to slip out of that to hit the "building a new SHOW command." Just when that was getting underway (and was looking really cool) I had to slip out to attend the "How to have a good MySQL meetup." This week is turning out to be one of the most packed weeks I've had in a long time.
But this is a very good kind of busy.
Oh, and I finally crossed paths with Frank after a lot of hit and miss. So very cool to have met and hung out with many of the folks from Planet MySQL.
Posted by mike at 12:03 AM
April 25, 2006
High Order MySQL (Advanced Stored Routines techniques)
Last presentation of Tuesday at MySQL UC is Giuseppe Maxia talking about cool things you can do with dynamic SQL in MySQL. I've had a chance to hang out with Giuseppe a bit during the conference, a fun person to be around (great stories). Just learned that Giuseppe is a perlmonk, even more cool than before.
(Sheeri's doing a writeup two seats over, I think I'll just kick back and enjoy this!)
Posted by mike at 8:18 PM
Advanced UDFs in MySQL 5
John David Duncan is talking about UDFs in MySQL 5 (or SQL as a Glue Language) at the MySQL Users Conference. I just realized that this is JD.
Why do we like MySQL? Because it's open source fast, reliable, and easy to use. This has been true since the early days of MySQL.
For many years MySQL did not have stored procedures which can make client/server apps more efficient. MySQL made up for this with lots of internal functions and UDFs.
In SQL standard (SQL92) there are 7 defined string functions. Oracle 8i had 11 string functions. MySQL 4 had 40+ string functions.
JD looks at PHP, Perl and MySQL and how they all are a part of a stack.
User-Defined Functions are fairly simple C programs loaded into the server at runtime. In MySQL you do CREATE FUNCTION metaphon RETURNS STRING SONAME "udf_example.so".
In MySQL 5 there are triggers, stored procedures but no debugger. MyProcDbg. Anders Karlsson created a UDF that serves as a debugger. There's also a memcache UDF that ties MySQL into the memcache technology that comes from livejournal. The last example is a UDF that JD created for asynchronous messaging. Pretty interesting idea.
JD then goes into information on the pluggable parser for fulltext parsing. In the future there are hopes to have additional plugins including things like authentication.
Posted by mike at 7:26 PM
MySQL UDF for Debugging Stored Procedures
This is cool, although I haven't used it so it will have to be a theoretical coolness.
A few months back Anders Karlsson posted a UDF and accompanying application for debugging stored procedures on sourceforge. The idea is that you can insert statements into your stored procedure for debugging purposes. The statements use a UDF that sends UDP messages to a debug server (provided with the UDF) to allow you to discover information about what is happening as your stored procedure is executed.
Posted by mike at 7:23 PM
What's New in MySQL Cluster 5.1
Listening to Stewart Smith presenting details at MySQL UC 2006 about what's new in MySQL Cluster version 5.1. Stewart has a great personality for presentations, entertaining at the same time he covers the technical details. Reminds me a bit of Damian Conway (the Perl guru), and not just because they're both from Australia.
What was new in 5.0?
- engine condition pushdown - 5-10 times performance improvement
- batched read interface
- more metadata objects
- decreased index memory usage
- enabled query cache
New in 5.1 Cluster
- Variable size rows (used to be fixed, even if you specified varchar): Cluster can pack records a lot tighter.
- Online add-drop index: much performance improvement over previous index modifications.
- Data partitioning: can specify data storage based on slices of data (just like partitioning in other engines).
- Replication (for geographical redundancy or splitting processing load or whatever): You can also have a backup master and slave for redundancy if master or slave processes fail. If you want to failover to the slave cluster it's a manual process that takes 4 steps. Stewart goes through these steps but notes they are in the docs. Automating failover is very complex.
- Disk-based cluster: in 5.1 the data can be put on disk but the indexes can only be in memory. The future is to allow indexes to be stored on the disk too. No uncommitted data is written to disk
- Node recovery: Traditional cluster (pre 5.1) copied all the data over the wire, in 5.1 you can use the optimized node recovery to copy just the data since node failure.
- Autodiscovery of DDL: if a table is created on one SQL node it is discovered on other SQL nodes as long as bthe binary log is enabled.
Stewart goes into the details on each of these sections (I've attempted to capture the essence as he's a pretty quick speaker and is moving quickly to get through everything).
Q/ATo find memory bing used by the cluster you can use a command in the manager that tells the storage nodes to write current memory information into the logs on the node. I think the command is ALL CLUSTERLOG STATISTICS=15.
Posted by mike at 5:54 PM
MySQL Runs 60 to 90 Percent Faster on Sun?
MySQL.com has a press release publishing news about some performance tests they've run on an 8-way Sun Fire V40z that indicate MySQL runs much faster on Solaris 10 than it does on Linux:
The primary difference between the two servers was in the underlying operating system, keeping the hardware configuration and database properties the same. During the read/write test, both systems reached their saturation point at eight CUC, at which point the the server running the Solaris 10 OS was 30 percent faster. Additionally, the Sun Fire V40z server running the Solaris 10 OS was running database queries at a 64 percent better rate on average, when compared to the server running Linux.
The Solaris advantage was magnified during the read-only test, where performance exceeded the Linux test case by 91 percent. Remarkably, in this experiment, the peak performance under the Solaris 10 OS was achieved with 16 CUC, while the less robust Red Hat Linux tapered off at only eight CUC. Despite running at twice the load during the peak phase, the Solaris 10-based server was performing 53 percent more transactions per second than the Linux-based server.
I don't get to work with Sun stuff like I used to when I was at Tufts, but it seems to me that if you run Solaris on Sun hardware and compare it to Linux on Sun hardware you're favoring Sun. Maybe that's not true because of the AMD processors, but it still seems a bit unfair. I wonder if you took a serious box like this that was tuned for Linux and put Solaris 10 on it if you'd see the same results. Probably not.
So what's a fair comparison then? Maybe comparing a well-tuned Linux box comparable to the specs or price of this Solaris box.
Posted by mike at 5:39 PM
MySQL Cluster API: Hidden Magic
Second session of Tuesday morning at the MySQL UC is Johan Andersson's talk NBD API: Using MySQL CLuster's Hidden Magic. Sitting with Mikael (Ronstrom), one of the original cluster developers who came over to MySQL from Ericsson and is noting some additional details about this stuff (as noted in parens).
The NDBAPI is a transactional native C++ API to access cluster data nodes. Be aware, there is no (or very little) support for this on the forums.
(from Mikael) Using the API gives you monumentally better preformance than going through the MySQL server (which is also built on this API). Mikael can get over 110,000 reads/second on his lone development box running both the storage nodes on the same box (a Dual core 2.8 GHz machine with 2G RAM).
What does it give you?
- Direct access to data with minimum overhead
- synchronous and asynchronous modes of execution
(another note about no support on forums)
Batching and Modes of OperationJohan describes making inserts in synchronous individual inserts, batching them, and batching them asynchronously on a slide.
(Mikael notes using batches to process data is one place where you can get a huge performance boost).
Life of a Transaction
- start the transaction (nothing sent to NDB)
- define operations (read/write etc)
- define colums
- commit (or execute) the transaction - the point where data is actually sent down to the data nodes
- close transaction (even if it fails)
Johan shows a list of the main API classes. A pretty good list that's way too much to cover here. Hopefully slides will be somewhere.
Live ExampleWhat do you need to tap into the API:
- MySQL source code
- configure --with-ndb-docs --with-ndb-cluster --enabled-shared
- make; make install
- headers and libs in /usr/local/mysql/include and /usr/local/mysql/include/ndbapi
DDL operations must be through the MySQL server, can do it through the API but then won't be able to see the tables in the MySQL server.
Johan works through a code example of connecting to the cluster management node, creating a cluster object, connecting to the storage nodes, specifying an operation, committing that operation, and closing the transaction.
This is pretty cool stuff, but likely not something I'm going to be using anytime soon. It is helpful to see the API interface to the cluster to help understand how the internals of MySQL Cluster work.
Posted by mike at 2:55 PM
JIT MySQL Session Coverage
Apologies to anyone following the MySQL UC who thinks a session writeup should only be posted once the session is complete. I've been posting at midpoints throughout the session as things unfold. This sometimes means that there are unfinished thoughts, but they eventually get corrected.
Perhaps it's the paranoid in me that wants to save and save often (and is too lazy to use the "Draft" option), or I have some delusion that if I post at midpoints throughout the session that someone somewhere is following the action as it unfolds and finds that this JIT session coverage helps satisfy their craving for the latest news from the conference.
(Or maybe it's just that I want to be first)
Posted by mike at 2:40 PM
Database Normalization and Joins
First session of Tuesday morning at the MySQL Users Conference (2006) is Mike Hillyer and Arjen Lenz presenting An Introduction to Database Normalization and Joins.
Not a new subject for me, but I've heard good things about this presentation (has been given at a number of conferences) so want to see what it's about.
What is the overall idea with database normalization?
- cure "spreadsheet syndrome"
- store minimal info
- remove redundancies
- remove anomalies (accidental differences because data is entered multiple times)
- restructure data
Presentation will cover the first three forms of normalization. Mike shows an example of a book table with lots of data piled up into single column rows.
First Normal Form
- all values atomic - every column contains the least amount of information possible - must look at your needs - somtimes firstname lastname is OK in the same column but most often it needs to be broken out
- each row must be unique
Solving it means breaking data into separate tables.
Forming RelationshipsThere are three forms of relationships:
- one to one - expect none or one row in the other table (ie blob data sitting in a separate table)
- one to many - expext one or many
- many to many (joining table) - contains two identifiers, one for each of the represented joined tables
Second Normal FormData must be in first normal form first. Second normal is about composite keys. All columns in a row must refer to the entire primary key.
Third Normal FormAll columns must depend directly on the primary key. "The key, only the key, and nothing but the key, so help me Codd." An example is having a an author table with address, city and zip. The zip isn't necessarily part of the user, it belongs to a city. Mike points out that to get this form of normalization with the zip code example requires many tables, often more than is really required.
(Arjen takes over for the second part of the talk on JOINs)
A JOIN is really about sets (Venn diagrams). Arjen shows a few diagrams with geometric shapes surrounding groups of numers to demonstrate the data that will be returned in the resultset.
Inner gets the sets of common data between two tables. Outer gets the common values plus the extras in the right or left depending on what you specify.
Presentation will be on Mike's site.
Posted by mike at 1:55 PM
Windows XP on Apple Hardware in the Wild
Well, I've seen it with my own eyes. Windows XP running on a MacBook Pro. Steve Lucas from Business Objects is giving a presentation at MySQL UC 2006 and is using XP on his Mac for doing a presentation and giving a demo of Crystal Reports (on Windows) using MySQL (on Windows).
It sounds like on Saturday (three days ago) Steve got his MacBook Pro and went through the process of installing Windows XP in order to demo Crystal Reports, which is Windows-based.
This supports my thinking about why it makes sense for Apple to allow Windows on the Mac. People who would have never purchased a Mac before are now able to buy them without risking the problems that come with trying to integrate with a business that has Windows-only products. Steve mentions that he will be trying OS X as well, which means you've given a Windows person a chance to have OS X readily availble for playing with.
Posted by mike at 1:20 PM
State of the Dolphin: Interview with Kaj, Monty and David of MySQL
Tuesday morning's keynote at MySQL UC 2006 is a conversation with Kaj Arno (VP community relations), Monty Widenius (founder) and David Axmark (founder). The conversation is somewhat informal, Kaj asking questions from a card and being spontaneous with responses.
The storage engine architecture started with Money being lazy becase he had some data in one format but was having to move to another data format and didn't want to migrate the data.
What is Falcon? It's Swedish beer. It's also a new storage engine from MySQL that uses memory in a new, and significantly improved way. Monty is working on Falcon in a more development role than he's done for awhile. He's also working on a new version of MyISAM that will be ACID, have much faster indexes, and include data warehousing features. There are a bunch of other new storage engine ideas. Monty's favorite new storage engine idea is Falcon.
Mike Zinner gets up to demo the MySQL Workbench, a GUI-based tool for designing databases. After showing some of the cool stuff (including a command-line interface) in the workbench Michael goes to the Query Browser to do a demo of the Falcon engine (I want to get my hands on that Falcon engine).
Back to Kaj, talking now about business/licensing. Staying with GPL, not sure what version 3 will be but will be looking toward it. Will continue with dual-licensing.
A brand new MySQL Network offering is coming, includes monitoring. It's driven by a data-collection mechanism that's fed into a rule-based advisor mechanism. Reports on security, administration, schema, performance and replication. The services are in beta now, you can contact a support rep to get the service. (It's too bad this isn't a open source tool I can run like Nagios).
How did the community grow? Start by making the database easy to run. There hasn't been time to accept code contributions in the last year, are trying to make it easier for people to contribute with the MySQL forge. The forge is a place to put snippits, schemas, stored procedures etc.
Last piece of the morning are community awards to Giuseppe, Roland, Marcus and Rasmus (Lerdorf). Congratulations.
Posted by mike at 11:53 AM
Lots of New MySQL Conference Attendees
The morning keynote session (at MySQL UC 2006) starts with some business from Arjen. Interesting to note that when he askes for a show of hands from people who are here for the first time there were a lot of hands. I'd guess 25% of the folks attending this morning raised their hands. Also a lot of folks who are back for their 2nd year.
Kaj Arno says registration is over 1500 attendees, pretty good. That's close to the numbers for OSCON just a fwe years back.
Posted by mike at 11:29 AM
Hosting Meetup Videos (Uses Bandwidth)
Looking at my weblog traffic summary tonight and was startled by this statistic. In recent months folks have grabbed around 1.2 terrabytes of data off of this site. I guess that should be expected when posting monthly MySQL meetup videos that are anywhere from 100-300 MB. Doesn't take many of those to make bandwidth an issue.
The video hosting at Google is obviously the better choice from a data-transfer perspective. However, I've heard numerous people complain that the sound isn't right or the video freezes up on Google so I think it's nice to provide a Quicktime version that's downloadable to play it off a local machine. I had also been thinking it would be nice to provide a video in iPod format for folks on the run (and while I'm at it maybe an mp3 too for audio-only).
Is 1.2 TB a big deal? I'm not sure for everyone, but since the bandwidth is donated I like to think I'm being careful with it. Looking around it seems that there are some semi-inexpensive hosting options that provide monthly data transfers in the TB range. Might be an option if we want to provide more than just the Google version of the video.
Posted by mike at 1:08 AM
April 24, 2006
Virtualizaton on OS X with Parallels
As a person who's required to have a copy of Linux readily available for site development and database stuff at work I was glad to see some folks offering tools to run virtual machines on OS X.
I've done this with VMWare on Windows, but there's nothing yet for Mac. Yes, there's the dual boot thing, but I need to be able to use my Mac OS for everything except running a development environment.
You might also suggest I have a server somewhere that I can access with SSH, but I'm not always on a good connection and don't want to have a machine exposed somewhere that reveals what's going on with our development. Yes, I'm convinced that having a virtual OS is the right way to do web-based UI development.
So I grabbed a beta copy of the newly available Parallels Workstation for OS X, which will allow me to install a wide variety of operating systems inside OS X.
So far it looks pretty good. I've got Linux installed and am grabbing the source tree. Two problems I've noticed so far:
- If I suspend OS X the virtual machine crashes. There is a suspend mode to control the machine, which I should try out next time I need to close things down on my laptop.
- You are required to specify a network device before boot, and it must be one of the physical devices on the machine. There is no internal network.
At $39 (discounted $10 for pre-order) it's a much cheaper alternative to a VMWare license. Of course there is no VMWare for OS X yet so it's kind of a moot point.
Update: I filed a complaint with Parallels about not having NAT and got a response saying it's currently in development. I hope it's available soon.
Posted by mike at 7:39 PM
Dual Core Processors are Cool
I was just looking for some information on how much memory certain processes on my new Mac laptop were taking when I noticed that the CPU graph is divided into two rows. Took me just a second to figure out that it wasn't some aggregate of multiple pieces of info on the same graph but a graph for each CPU core.
Something really cool about working on a laptop that has two CPU graphs (ie, dual CPUs). Am going to keep my eye on this graph (stuck it in the Dock) to see just how much I'm getting out of the both of them.
Posted by mike at 7:08 PM
MySQL Replication for Scaling and High Availability
Jeremy starts with a general overview of replication and provides some detail on general things replication is and is not. There is no limit to the number of slaves, but there is a limit on the bandwidth that a server can serve so with too many slaves the master could be overloaded. If you need massive numbers of replicated machines it's better to not have them all pointed at one master. It's better to have a relay slave who's job it is to stand in the middle and serve data out to groups of machines.
Master to master replication is possible, but there are lots of problems. Primary keys can be carefully disignated but the bottom line is that when it fails it's hard to know what state things are in.
Jeremy's got this cool diagram that shows the flow of replication from the master through the IO thread to the SQL thread, through the replication configuration filters and into the slave data.
A few things that Jeremy says to be aware of:
- make sure you have a unique server id
- enable binary logging
- if a slave is a master be sure it
- takes backups on the slave
- keep a slave
A few things to not do:
- Don't put master configuration info into my.cnf, it doesn't get changed when the change master SQL command is used
- Don't run backups on your master
Jeremy goes through the replication-related SQL commands which include:
- SHOW MASTER STATUS
- SHOW MASTER LOGS
- SHOW BINLOG EVENTS
- PURCE MASTER LOGS
- CHANGE MASTER TO
- SHOW SLAVE STATUS
- SLAVE STOP (hmm, I didn't know about this syntax, I use STOP SLAVE)
- SLAVE START (ditto)
- LOAD DATA FROM MASTER
The following options are used in configuring MySQL replication:
- server-id - unique id of the server
- log-bin - must be enabled for commands to be replicated
- log-slave-updates - updates that arrive on the slave get put in a
- binlog-do-db - disables logging of changed except for these tables
- binlog-ignore-db - logs all changes except for the specified databases
Building Replication FailoverThe goal is to build a system that can failover. This is use *most* often in planned failovers, it's more likely that you'll need it for machine updates than you will in an actual failure (but it also works in a failure). DNS is not good because it's unreliable for change as it doesn't ensure clients notice the change.
You do not connect directly to the master, you connect to a proxy that connects you to the master. Proxy decides who the current master is based on the configuration in the proxy. Stunnel with a null-cipher would do the same.
The setup is multi-master, but you only write to one master at a time. Jeremy doesn't recommend automatic failover, there's a problem in determining when to failover. It could be that the network could be down, it could be just down from the view of the monitor.
(having to work on a somewhat important issue at work, missing a bit here)
Jeremy says that, as a Yahoo!, the purchase of InnoDB doesn't worry him.
Yahoo! uses two different camps of hardware. When having to use existing hardware it's typically 32-bit 1.x GHz dual CPU with a few gigs of RAM. If they can buy new it's AMD opteron dual 2.x GHz with 4G or 8G of RAM.
Yahoo! uses mostly version 4.0 still (5,000 on 4.0 and 1,000-2,000 on 4.1).
Posted by mike at 4:42 PM
MySQL Cluster Configuration, Tuning, and Maintenance
First tutorial of Monday morning at the MySQL Users Conference is Johan Anderson on MySQL Cluster Configuration, Tuning and Maintenance. Found a spot next to Giuseppe (the data charmer) and a power strip (which are in ample supply here).
Johan came with the cluster aquisition from Ericsson to MySQL in 2003 and works mostly as a consultant with other folks using MySQL cluster for telecom, search engines and e-commerce. Likes to keep it simple, or be sure to avoid areas where the cluster isn't the best (ie MySQL cluster isn't great at joins).
Johan starts with a general overview, using the typical diagram of the cluster nodes and walks through the pieces of the cluster. MySQL is currently working on a new feature where you can add a new storage node online. The management server isn't critical to a running cluster, only used on startup or for failure conditions.
(some working out of logistics, turning up the mic, getting a lazer pointer, etc)
Data DistributionData distribution is done by each table being divided into fragments, based on the number of storage nodes. One node will be have the primary fragment of the data and another node will have the secondary fragment. Decision on where to store the data is based on a hash value of the primary key. Each data node in the cluster can handle transactions.
Node FailureWhen a node fails the data from that machine is then pulled from the other node in that node group that has a copy. The other node then has to assume the full responsibility of serving that data until the failed node recovers. When the failed node comes back it knows that it's joining a running cluster and will grab a copy of all the information it needs from the running node. Once it has a complete copy of the data it will rejoin the cluster as a healthy node. 5.1 will have an option to recover a node just applying the changes that have happened since the node failed.
BackupsCluster has it's own backup that's a part of the cluster engine that runs on each node. You start the backup from the management server which tells all of the nodes to take a backup. The nodes each have a copy of the data. When a backup is taken the fragments are written to disk and you are then responsible for copying those files to another location. You can take a backup from a 4-node cluster and restore it to a cluster with different configuraitons.
The changelog has data changes that happen during the backup process. You cannot use the changelog outside of a backup. So there's no log file that can be applied down the road somewhere to roll the database.
If a node fails during backup it will be aborted and the backup will have to be restarted.
If you lose all the nodes in a node group you have a system crash and you need to do a restore from the most recent checkpoint.
TransactionsEach data node has a transaction coordinator. The SQL node picks one of the data nodes (round-robin fashion) and tells it to start a transaction (5.1 may include option to specify node). Each transaction coordinator handles the two-phase commit protocol for a transaction. The cluster uses a linear two-phase commit.
It's important to note that there's a 50% chance that the transaction will start on a node that doesn't have the data it needs, so data nodes have to be able to communicate between each other.
For an update on a two-replica cluster the TC sends a message to the LQH with the primary fragment for the tuple. Once that LQH gets the lock it needs it sends the message along to the LQH on to the other node. When the second node has the lock on the LQH it returns a message to the transaction coordinator. After the prepare is done (locks obtained) the TC sends a commit. The transaction coordinator recieves a notice from the first LQH once the change is committed. The TC then sends a message to the other LQH saying that the commit is finished.
Once the commit starts it must be finished. Even if a node crashes between one node committing and the other committing the transaction has to be completed.
What happens if the TC dies? Another transactions coordinator takes over and aborts transactions that haven't made it to the commit phase. The application must be able to handle this. If the commit has started the new TC will complete the transaction.
LockingWhen working with ranges a batch of records (likely a subset of all that is needed) is brought up to the MySQL client. The lock is obtained on the subset of the records that is obtained, but that doens't mean the lock is obtained on all records that will be processed. The subset is changed and then returned to the cluster where the lock is released and then the next batch is locked and sent up to the client.
If an operation takes a lock and it doesn't unlock in a timely fashion the TransactionDeadlockDetectionTimeout setting controls how long a second lock will wait before aborting. The default is 1200ms. TransactionInactiveTimeout can be used to maximum time between operations, or how long a transaction can hold a lock. The default is 0, which is not good. A better setting is 5000-10000ms. Set TimeBetweenInactiveTransactionAbortCheck (perhaps 1000 ms) to specify frequency of checks for inactive transactions.
Memory is split into two pieces, data and index. Data stores tuples and ordered indexes. Index data stored primary keys in a hash index. Memory page sizes are 32KB, index memory page size is 8KB. Max tuple storage is 8052 bytes except for blobs. Use ndb_size.pl script to determine storage requirements for a given MyISAM or InnoDB table in your MySQL instance. Do not use more than 90% of your data or index memory, the cluster needs headroom.
Disk usage is often neglected, but is important to think about about because you have checkpoint data needing to be written to the disk. Each node is writing checkpoint data, and UNDO/REDO logs. Every change is put into the REDO log. Every time three checkpoints have passed the REDO log is cycled. A local checkpoint can be intensive. You can set the time between the checkpoints using TimeBetweenLocalCheckpoints. The default is 20, which means that reaching 4MB of data in the REDO log will trigger a checkpoint. During a checkpoint the UNDO log is written to provide data on changes that happen during the checkpoint.
In addition to writes, this data needs to be read from disk on recovery.
Johan gets into some pretty detailed diagrams of the timeline for disk writes and configuration of disk utilization. It appears to be quite a science to calculate how much disk space you should have available for your cluster. Some of this calculating goes away with 5.1 where the checkpoint has been rewritten and the UNDO log is no longer used.
The bottom line is to be aware of how many transaction records will be used in your largest transactions and identify how many operations records they will use and be sure to have room in memory to process those records.
Transport BuffersEach connection uses a transporter for sending and receiving data. The SendBufferMemory/ReceiveBufferMemory can be configuerd to allow more memory for transporting data in cases where records are large.
MaintenanceA few maintenance items:
- logs are circular
- memory isn't reclaimed unless a table is dropped
- develop a backup strategy
- rolling restart will let you update config.ini and then restart each NDB node one at a time
- rolling restart resolves memory fragmentation
Hardware RecommendationsGet good hardware, with good disks. Data nodes shoud be Dual 64-bit machines with SCSI disks and lots of RAM. The network should be gigabit or SCI.
Best PracticesA few ideas for utilizing the cluster efficiently (not everything, just what catches my eye):
- use primary key operations, it's good at them
- scans are expensive
- maintaining indexes is expensive
- avoid joins
- denormalize data to have a more horizontal layout
- engine-condition-pushdown=1 to pushdown where clauses
Posted by mike at 11:37 AM
MySQL Users Conference Day 0 (the day before)
Arrived at SFO at 9am this morning without event. Marcus and Roland were nice enough to pick me up at the airport with the intention of heading into San Fransicso for the day. After a slight amount of driving confusion we ended up near Fisherman's Warf. We hopped on one of the old-school, tourist trolleys and rode up (and then back down) to Union Square where we walked around a bit, had some lunch and visited the Apple store (they are also out of non-lime-green 15" laptop sleeves). The trolley ride back across town had a spot for hanging off, I couldn't resist. It was well worth the $5 each way to have the trolley experience and see some of San Fransicso.
After taking the trolley back over to Fisherman's Warf we hopped in the car and drove out across the Golden Gate Bridge, stopping at the vista point across the bay so we could walk back onto the bridge (which ended in a philosophical discussion about people who jump).
Last stop was at the beach on the west side of San Francisco near Golden Gate Park. We walked along the beach for some distance watching the surfers, dodging the surf, and continuing conversation on various topics.
After getting back to Santa Clara and getting checked in/registered I joined up with Marcus, Roland, Beat and Giuseppe for dinner at a pretty good fish place over on El Camino Real, a few miles from the conference center. Kicked back for a long time after dinner discussing various MySQL-related topics. A lot of fun to hang out with these Planet MySQL icons. Great guys, good conversation.
Now must get ready for Day 1, tutorials on cluster and replication.
Posted by mike at 1:26 AM
April 23, 2006
Off to San Francisco (free WiFi at Boston's Logan Airport)
Got up at 3:30am to make a 6:10 direct flight from Boston to San Fransicso this morning. Excited to get there and be a part of the MySQL action. Will be spending the day up in San Fransicso doing a little sightseeing and then headed down to Santa Clara in the evening.
Much to my delight the Logan airport WiFi, which has always required a registration (including $$), seems to have let me in without issue so I can get some stuff done while I wait for boarding.
Posted by mike at 4:58 AM
April 22, 2006
New Apple Laptop (MacBook Pro) in Hand
I did it. Tonight I stopped into a local Apple store and bought a new Mac Book Pro (2.0GHz Intel Core Duo, 1G RAM, 100GB HD etc, etc). It's taken a bit of thinking to get to the point of committing to the kind of cash it takes to bring one of these things home.
After spending a week on my 12" PowerBook (after my realization that I'm no good at Windows) I threw it up for auction on eBay and got a respectable contribution toward the new laptop. I figured even if there have been problems with the MacBook Pro, and there's a chance Apple will have a new laptop coming in another few months, I needed something to work with now and there's no sense in waiting any longer. Chances are good that even when the new MacBooks are announced that I'd still prefer the larger screen. I do hope that there's not some major revision, or price reduction, of the MacBook Pro along with these other announcements.
I did a lot of reading in the past few days about the problems and it appears that most of them have been resolved over time and that as long as you buy a machine manufactured after the 11th week that the problems will have been resolved. I checked the serial number on the box at the Apple store. The first one they brought out was manufactured in week 14, but I knew there were even newer models so I asked the sales person to see if they had a newer one and sure enough they had one manufactured in week 15.
So far I like it a lot. It's at least 100x faster than my 12" PowerBook 867, but doesn't feel huge. The screen is nice and wide, but not much taller than the 12" (good for airplanes). Also much quieter than the fan on my 12".
If I notice any of those problems popping up I'll be sure to post.
Posted by mike at 10:41 PM
Feedback from PHP Quebec MySQL Cluster Talk
I've spoken at a variety of conferences and meetings over the past few years. For many of the gatherings the organizers work diligently at getting users to fill out some kind of quick feedback form for each session. I've always wondered about those forms, particularly when I've given a session.
The other day I got an email from the PHP Quebec folks with a document attached with the information gathered from those forms for my talk on MySQL Cluster. Ratings from 1-5 on a few different questions (like "How well did the presenter know the subject?) and then a number of feedback notes from the folks who attended.
I wish all conference organizers had that kind of time and dedication to get the feedback back to the presenters. In cases where the feedback is praising it's good for a speaker to know they did a good job and were on the mark. In instances where there were problems it's also good to know what issues to be working on for next time. Maybe that's a challenge for conference organizers who might stumble across this post (or maybe most are already doing this).
In this case I'm delighted to report that the marks for the MySQL Cluster talk were high, and the comments were all very positive about the approach and subject matter.
Posted by mike at 9:16 AM
April 20, 2006
Get Ready for the MySQL Onslaught
Each year my weblog gets bombarded with MySQL entries over the 5-day MySQL Users Conference. I've been posting a lot more about stuff happening with the MySQL database lately, but it doesn't compare to the 4-6 entries a day during the conference. Not only that, the entries will likely be more technical than other MySQL posts. My apologies to anyone who doesn't care about the MySQL stuff I post. You may want to put a filter on, or take a break for a week.
For planetmysql readers I'm just not sure how we're going to handle what I think is coming next week. The last year has been huge for the MySQL community. One year ago at this time Mike Hillyer was getting the conference aggregator ready and a few folks aggregated their posts, which seemed like a lot to follow.
Since then planetmysql has grown to over 100 syndicated weblogs, and many of those folks will be out at MySQL UC posting away as the action unfolds over the week. Since Arjen limited the front page to 10 posts at a time I'm guessing there will be times that posts will only see the front page of planetmysql for a few minutes (oh, yea, that already happens). Thank heavens (or whoever you prefer) for RSS feeds and good aggregators.
It's already started with folks posting their "I'm on my way" entries.
Posted by mike at 10:57 AM
April 18, 2006
automysqlbackup - Nifty MySQL Backup Script
Working through the MySQL mailing lists and forums tonight for questions/answers/problems related to backing up/restoring a MySQL database. Just one avenue for making sure the UC presentation on backups get at the heart of what people are experiencing with MySQL backups.
Back to the matter at hand. I stumbled into a post recommending a backup tool that I hadn't used before, automysqlbackup. The release notes indicate it's been around since 2002, the most recent release coming in Jan 2006. I know I've seen this before but never looked closely at it.
It's an easily configurable Unix shell script that (among other things):
- is well documented with header and inline comments (along with info on how to restore from backups)
- wraps around mysqldump
- rotates daily, weekly, monthly backups
- backs up one or multiple databases
- mails report with configurable levels of detail (including attaching dump files up to a specified limit) from the backup
- runs user-defined pre and post scripts
This is cool if you're one of those mysqldump folks but need something more to throw in cron to organize and rotate your backups.
Posted by mike at 11:33 PM
April 17, 2006
MySQL Storage Engines and Data Types at DBAZine
I can't be 100% sure (and please correct me if I'm wrong) but I think MySQL is fairly new to DBAZine. When I first went to look around the site to get an idea of what chapters might fit I couldn't find anything about MySQL anywhere. Conversations with Apress indicated that DBAZine was looking to expand to include some of the open source database offerings. I'm excited that MySQL is the first of the open source databases to get attention.
Posted by mike at 11:23 PM
solidDB Coming to MySQL
ZDNet (via slashdot) is running an article about MySQL and Solid working together to bring solidDB storage engine to MySQL in June. There's an announcement linked from the Solid homepage and a letter from the CEO in PDF format. I don't see anything at MySQL.com, but the announcement from Solid seems to indicate that details will be unveilded next week at the MySQL Users Conference.
A short bullet list from the announcement about the capabilities of the solidDB storage engine:
- multi-version concurrency control
- full transactional integrity with robust ACID support
- online backup capabilities
- fast performance for mixed workloads
- robust security with database roles
- dynamically configurable parameters
I like the online backup, hopefully will be something you can do without buying a license. And database roles? Will those be usable alongside MySQL's existing privilege management? Curious to see if/how storage engine-level roles come into play at the engine-agnostic server level.
This statement from the VP of Marketing for Solid might raise concerns for MySQLers looking to get free from InnoDB because of it's ties to Oracle:
Our decision at the moment is to go into the open source track and use mySQL as a channel. So we’re going to make available code that works only with mySQL. On the side we have a proprietary line of products.
I'm not sure exactly what that means, the press release seems to indicate that the storage engine will be GPL. I'm sure some folks will be nervous about this.
And for those out there who love to be reminded MySQL doesn't have transactions:
Stacey Quandt, research director for the Aberdeen Group in Boston, told me the deal also opens new markets to mySQL. "For years mySQL has defined itself as not being a competitor (in the enterprise space), but with a transactional engine that gap can be narrowed," she said.
The filtered slashdot comments about the article are an interesting read.
Looking forward to more details at the MySQL UC.
Posted by mike at 1:04 PM
Time for a New Mac?
I'm all geared up to replace my 12" PowerBook G4 with a new Mac laptop. I've had this laptop for 3 years now and it's been through a lot.
Unfortunately, there's not a clear choice for a replacement. I'd love to have something in hand by the end of the week as I go off to the MySQL Users Conference (oh how I hate the thought of lugging my Dell out there). Even better would be to have a decision made so I can go purchase a new laptop at that iconic San Francisco flagship Apple store or the Apple headquarters in Cupertino.
The only curently avaiable choice is the MacBook Pro. With Apple switching over to Intel processors, it would probably be foolish to get into a PowerPC laptop now. There's a chance I could have one of those new MacBook Pro laptops in hand by the end of the week, but there are some uncertainties.
First, rumors abound that Apple is close to announcing the MacBook, a smaller Intel-based laptop, soon. There's a chance that the smaller laptop will better meet my needs. So I should for sure wait for that announcement to evaluate whether the MacBook is a better option than the MacBook Pro.
Secondly, there are some uncertainties about the MacBook Pro. The Ars Technica MacBook Pro review is pretty positive, but it seems there are mixed feelings based on the fact that Apple couldn't get a thin enough DVD drive, try to avoid conversations about battery life, and haven't mastered the mag-safe power connector.
Last, I'm not thrilled about having to upgrade applictions to use the Intel processor. I was really in the pits about this but as I built my list of applications and checked the various sites most of them are already available in universal binary. I use the Mac-included apps heavily (Mail, iTunes, iCal, iMovie, iDVD, iPhoto etc), so not including them in the list:
- mysql (done)
- Photoshop (sometime in 2007)
- Adium (done)
- Omnigraffle (done)
- Shrook (done)
- Microsoft Office (?? - do I really need it?)
- Firefox (done)
- Thunderbird (in prerelease)
- Final Cut Express (later this year)
Running Photoshop and Final Cut in Rosetta (the PowerPC->Intel emulator) is not going to be great, and I bet the cross-grade isn't going to be free (esp since I'm on an old version of Photoshop). Maybe it's time for GIMP (beta testing for universal binary just winding down) and Cinelerra (very shaky on Darwin).
But having looked down the applications folder it doesn't seem that I'd be too far from complete niversal binaries for the applications I run.
I guess I just wait until the right choice becomes clear.
Posted by mike at 9:25 AM
April 14, 2006
Video of MySQL Meetup: Coding and Indexing Strategies for Optimal Performance
The Boston MySQL meetup this past week was well attended. At one point I counted just over 50 folks. Thanks to MIT for the space and MySQL for the food.
The presentation was excellent, but longer than my tape so to see the full deal you'll have to catch up with Jay's presentation at MySQL UC in April or OSCON in July. There are sections where questions are asked that are hard to hear because of the lapel mic we're using. It makes for much better audio from the speaker, but it's not so good for interactivity. At one point I unplugged the lapel mic to capture interaction at the chalkboard. I suppose if we got really serious we'd have a mixer with an audience mic and a speaker mic. Probably not going to happen. Maybe next time I'll try to unplug the lapel mic every time there is a question from the audience.
Yes, it's a tiny screen. That's the only way to get 90 minutes of video down to a manageable size in iMovie. Jay should have the slides posted at some point to let you follow along. Will link to them from here when they're up.
Posted by mike at 9:12 AM
April 13, 2006
Now there's a good feeling, taxes submitted (after leaving them sit since mid-March when I did a rough sketch of the numbers to see just where we stood).
When all was said and done we only gave 4.28% of our income for federal taxes. This number is on the TurboTax (online version) summary page. I believe it's a calculation based on the adjusted gross income and the total tax payment (after tax refund or payment). The state taxe (Massachusetts) summary doesn't have a similar number, but it must be in the same ballpark.
That was a bit of a suprise to me. Of course, the bulk of the money that didn't go to the government went to other places we'd rather it not go, like mortgage interest.
So it's not so much of *if* but *where* that money goes.
Update: To not leave folks hanging, here's a little more detail on our deductions:
- we give a lot in tax-deductible donations
- we bought and sold a house, many of the fees are deductible and a new house always means heavy interest payments (the loan is *not* an interest-only loan)
- we have two kids in a private pre-school, a chunk of the tuition is tax-deductible
Posted by mike at 11:13 PM
Back on Mac
After giving Windows a good shot for a month I'm back to using my (very) old 12" PowerBook for the large chunk of time I spend at the computer. It hit me on Monday this week that I was reallly out of the loop on a bunch of stuff, and that I'd been struggling to be productive for almost a month.
I don't have the energy to go into what it is that makes such a big difference (I started to make a list but it's complex to document why X is better than Y). Let's just say that the shift to a different UI and development setup was significant enough to put me out of my zone. I gave it a good run, but am just way too engrained to doing things on a Mac. It's been many years now that I've been doing things the Mac way.
I am obligated to get some value from the Windows laptop. For now it still runs Linux in VMWare that I access from my Mac for doing development. It also still has my email, which will likely be switched off to Thunderbird or something that I can run on the mac. I will still take the Windows laptop to the office on the one day I work there each week. Why? Because I'll need to have my Linux development environment with me and it's still on that machine. The Linux installation we use is standardized across all machines and it's important to keep close to that.
And yes, there is a chance that if I'm really not using the laptop for more than a development environment that I'll wipe it clean and install some flavor of Linux.
It's such a relief to be back where I really am most efficient.
Posted by mike at 7:37 AM
April 12, 2006
What happened to mysql.com Search?
I have no idea when this change has happened but over the past few weeks I've gone to MySQL.com to look for a particular keyword, like "sort_buffer_size" or "Created_tmp_tables" and noticed that the search results were close to useless. But I wasn't dismayed because I figured these were obscure words that were best found by browsing to the page about configuration variables or server status.
Then just last week I was looking for information on creating descending indexes, which would likely be found on the CREATE TABLE page, which confirmed that things with search took turn for the worse somewhere.
Let's see, if I search for "CREATE TABLE" I get 54 results on the first page. Are any of them to the page dedicated to the "CREATE TABLE" command? No! Scrolling down to entry ~15 I see the DROP TABLE page is a match. I know if I click on that I'll get to a part of the manual where there's a link on the left nav to the CREATE TABLE page. Yes, that works, but isn't quite as good as having a CREATE TABLE page show up in the results.
Later on I'm searching for another phrase, ALTER TABLE, again for ordered index details. Rather than risk having to scroll down 50+ matches to see if it shows up let's try Google to see if it can get me closer. As expected, Google's first match for the phrase "mysql alter table" is the page I'm looking for. Turns out that MySQL.com has it as result 15, which isn't quite as bad as the "CREATE TABLE" search. But why would a document titled "Problems with ALTER TABLE" show up first, second, and third.
So why was it good before? To the best of my memory mysql.com used to display both the search results as well as a small list of commands that might match my terms. Back in those days I never even bothered looking to the search results because that small list of commands always had what I was looking for.
As I look down the search results now it seem like one factor in all of this is the splitting of the documentation into separate docs for the various versions. My search for CREATE TABLE pops up several of the "SHOW CREATE TABLE" documents for different versions. However, limiting the results to one particular version doesn't change the search relevance.
I should say that this is no reflection on the documentation itself, MySQL docs are excellent. It's just that getting to the specific documentation you need isn't what I think it was in the past.
So I guess for now I guess I'm switching to Google (Yahoo! also gets it right, for you Yahoo! folks) as an interface for finding information about commands in MySQL, and hoping that the search gets better at some point.
Can we get back that nifty command-only list? And I do find it interesting that mysql.com uses Mnogo rather than MySQL's fulltext indexing. Would the fulltext produce better results?
Update: Jim Winstead comments on why search isn't so great and points out that getting to these SQL command pages is as simple as going to http://dev.mysql.com/<command>. So to get to the CREATE TABLE page I can use http://dev.mysql.com/CREATE TABLE. That's better than having to search.
Posted by mike at 5:38 PM
April 9, 2006
Where was this MySQL option when I needed it?
Digging around the MySQL 5.0 SQL mode docs tonight and see this option:
NO_ENGINE_SUBSTITUTION - Prevents automatic substitution of the default storage engine when a statement such as CREATE TABLE specifies a storage engine that is disabled or not compiled in. (Implemented in MySQL 5.0.8)
Where was that when I needed it?
Back when I switched data storage from the MyISAM to InnoDB storage engine we got burned by not having this option. I had enabled InnoDB on the development box to allow the programmers to start using foreign keys in their database schemas. There was a disconnect on pushing the new functionality up to the testing environment and it went up before InnoDB was enabled on the test database. Unfortunately none of the developers who were pushing the DDL statements up to testing noticed that InnoDB wasn't enabled because MySQL ignored the parts of the SQL that specified foreign keys. MySQL also translated ENGINE=InnoDB into ENGINE=MyISAM without batting an eye.
Of course it's very easy to enable InnoDB, and having to recreate the tables in the test environment didn't cause much of a ripple.
Nice to see that now (once you're on 5.0.8 or later) you can prevent this kind of mishap.
Posted by mike at 10:36 PM
April 7, 2006
My SSH Private Key is in a Proprietary Format?
I am most annoyed (two days in a row now). I recently switched laptops, from an Apple PowerBook to an assigned Dell with Windows XP. I got a license and installed SecureCRT for getting access to our servers. Everything at work is done with SSH keys (no passwords) so I went through the process of generating a private and public key in SecureCRT. The public keys were propogated around to all the servers to give me access.
Today I was trying to do something with a few somewhat large log files and found that my laptop didn't have the space to handle them. So my brilliiant idea is to put my private key on a different machine (of course this is experimental because I'm not an SSH or key guru so I'm not even sure it would work, but I want to try).
So I get the private key onto another box and when I run it I get a command-line asking for my private key password. I enter the password just as in SecureCRT but it is not accepted.
I stumble into this response to someone trying to do the same thing on a forum:
The problem that you are seeing is due in part to the way the SSH draft defines keys. The public key format is well defined and standardized by the IETF draft, whereas the private key format is not.
The private key format was not defined specifically, but left to the individual developers. Because of this, each developer writes their private key in a format that meets their needs. This means that the private keys are not going to be readable by another developers software.
With OpenSSH, we have the ability to read both the public and private keys because the format is known. This means that SecureCRT can use both the public and private keys generated by OpenSSH. But SecureCRT can not export to OpenSSH.
We have had people request this ability to be added to SecureCRT. If you would like to be notified if a version of SecureCRT is released with this ability, please send an email to Support@vandyke.com with a subject of Attn: Shannon Re: Forum Thread #3919
Please include your contact information so that we can contact you when a version of SecureCRT that has the ability to export/import public and private keys becomes available.
The forum member responds with:
Is there a description of SecureCRT private key file format available? In case of this document its easy to write converter without waiting for future releases.
The SecureCRT folks respond with:
Unfortunately, this is not an option. The format of SecureCRT's key is proprietary . . . the conversion tool is currently being considered for possible inclusion in a future release of SecureCRT. We would be glad to let you know if it becomes available!
I can't believe that, the key format is proprietary?
I can see the key, it looks a lot like an OpenSSH key with a few additional pieces of information:
---- BEGIN VAN DYKE SSH2 PRIVATE KEY ---- Subject: Mike Kruckenberg Comment: "Mike Kruckenberg" ModBitSize: 1024 <actual key snipped> ---- END VAN DYKE SSH2 PRIVATE KEY ----
I have tried to remove those extra lines but still the same thing.
So I'm annoyed that they think the key I generated is somehow theirs, and am left to do more research (perhaps using OpenSSH trying to move from a different server to a different server). Perhaps moving private keys is just not allowed, but I want to know for sure before I give up on the idea of reusing my private key.
Update: I succesfully moved a private key between a Mac OSX and a Linux box (both generated with OpenSSH) and had no problems with login using the private key generated on another box. Further information on this is found on this forum.
Posted by mike at 5:44 PM
April 6, 2006
Tired of Geek Talk
I'm tired of geek talk. I'm not talking about conversation that is centered around technology. Geek talk isn't so much about what the subject is as much as it's how the conversation proceeds.
Here's what I mean.
I spent the past few days engaged in a series of technical conversations. Some at work, and others at LinuxWorld. Some of them are formal meetings and others are casual in the hall or on the exhibit floor. All of these conversations are centered around technology, all the way from programming concepts to server configurations to trends in technology.
There were two kinds of conversations. One where there's a feeling of cooperation and mutual interest in exchange of ideas. The second kind of conversation is where people try to "outgeek" the others. The responses are less about exchange of ideas and more about proving that you know more than the next person. Sometimes the comments demonstrate technical superiority by revealing some improvement and other times it's by dismissing an idea by proposing a new idea that makes the original idea appear inferior.
I have to say this most often happens in groups of more than two, but it can happen with just two as well. I also have to say it's sometimes more annoying than others. And there's a spectrum. Sometimes conversations are thick with geek talk and others where it's very subtle.
An example of geek talk:
Geek 1: I've been fiddling with X for some time and really like it but it's missing Y
Geek 2: You're using version 1, last week on Slashdot there was a post saying that version 1 was crap
Geek 3: And using X was advised against by XYZ Security
Geek 2: No, you can use version 1 as long as you patch it with patch 1.1
Geek 3: Yea but then X consumes three times as much CPU and doesn't deallocate memory
Geek 2: Hardware is cheap, grab a few XZ200 boxes and throw in 10 sticks of memory
Geek 3: Even so, if you want to do Y there are 10 solutions better than X
Geek 2: I've been using A, it's much faster and has no disk-performance issues
Geek 3: A is good, but only when you compile it with the --little-known-fact option
Geek 1: Our team has been using X for a long time and it works pretty well except for it's missing Y
Geek 3: X is legacy, you should be off it by now
Geek 2: The predecessor to X, W, was written for DOS (ha ha ha), X inherited all that DOS-based junk
Geek 3: If you want Y you shouldn't be using X, it wasn't designed for that
Geek 2: (to Geek 3) You think there was a design behind X? (both laugh)
The alternative to this conversation might have been something like:
Geek 1: I've been fiddling with X for some time and really like it but it's missing Y
Geek 2: Interesting, what kind of things are you doing with X?
So you get the idea. One conversation is about one-upping the last comment to prove your geekiness, the other is about interest in the other person's experience.
I've been a part of far too much geek talk over the past few days. Am glad to have a break for at least a few days.
Posted by mike at 9:06 PM
MySQL nerds beat Oracle geeks in LinuxWorld Golden Penguin Bowl
LinuxWorld's Golden Penguin Bowl was a tight race, but in the end the MySQL nerds nailed a few critical questions to pull ahead and win. My favorite question was a quote about the MIT $100 laptop saying (essentially) that nobody in their right mind would want to use one of those slow, cheap, hand-cranked things. Neither MySQL or Oracle folks guessed that it was Bill Gates.
The Golden Pengin is quite a piece of art in itself, hefty large glass sculpture that is quite a thing to lug around. Jay got lots of strange looks coming home on the subway, and when my kids saw it the next morning they were shure it was some kind of glass rocket with a bent tip. Can't find a photo for reference in Google, if I get a chance to get a photo of it will post.
The photo (the best from a selection of pretty crappy shots) is of Brian Aker, Jay Pipes and Ted Ts'o singing a selection from an old IBM company songbook.
Posted by mike at 9:10 AM
Boston MySQL Meetup: Performance Tuning Best Practices
The next Boston MySQL meetup is just around the corner, Monday April 10th.
Jay Pipes is sticking around for a few days after Linux World to present this session, which was given as a webinar back in March and was the 2nd best attended webinar in the history of MySQL webinars. It's also a preview of Jay's performance talk at the MySQL Users Conference. Jay really knows his stuff when it comes to the inner workings of MySQL, you don't want to miss this:
Learn where to best focus your attention when tuning the performance of your applications and database servers, and how to effectively find the "low hanging fruit" on the tree of bottlenecks. It's not rocket science, but with a bit of acquired skill and experience, and of course good habits, you too can do this magic!
The Apress folks have given us a bunch of copies of Pro MySQL to give away and MySQL AB will be there with pizza and soda.
Posted by mike at 8:46 AM
April 5, 2006
Apple Makes Room on it's Hardware for Windows
It's hard to believe, but Apple has released BootCamp, software that will come prepackaged with their next version of OS X that allows you to install Windows XP and run it natively on your Mac. I am so curious to see how this changes adoption of Mac hardware, will there be a lot of folks who purchase Apple computers with no intention of running OS X? Will Apple get to a point where you can buy Windows along with the new computer?
This was probably a quick win for Apple now that they are getting onto the Intel processor. Does VirtualPC still have a market there? I realize that VirtualPC lets me run the Mac and the PC at the same time, but will folks find the dual-boot good enought to satisfy their PC needs?
It's been just over a month that I had to give my 15" PowerBook back to Tufts and got a Dell with Windows XP from OpenAir. It's been interesting to use Windows again after many years of OS X. I'm hoping that over the next year I'll get myself back onto a Mac.
Posted by mike at 9:44 AM
April 4, 2006
MySQL Performance Tuning Underway
I got fully immersed in performance tuning on the databases at work this morning. There was a seriously overloaded MySQL machine. No time like a crisis to have the new guy try his hand at resolving a performance issue. I was in a meeting for part of the time where I couldn't do much more than watch and I noticed over time that there were anywhere from 2 to 10 queries stuck writing to temp tables. This was happening over and over for more than an hour. Somewhere halfway through the meeting a query came through that was taking many minutes, mostly stuck in a mode of writing to a temp table.
Upon further inspection I noticed that the myisam_sort_buffer_size was set fairly large, but the sort_buffer_size was quite small. To my knowledge, the myisam_sort_buffer_size is for table maintenance and optimization, but isn't used during normal database operations. The sort_buffer_size is used for sorting for normal operations. After getting consensus from other folks I upped the sort_buffer_size and have seen marked improvement in the lingering queries writing to temp tables. I need to do some additional research to look at the specific queries and determine if there are other tuning needs, but for now we're doing OK.
Quick summary of MySQL sorting. MySQL sorts data by loading it into a memory buffer where it sorts based on the ORDER BY clause. Pre 4.1 the sort data is put into the buffer with a pointer to the row for sorting. After the buffer is sorted the records are retrieved from the data file. This requires that the records be read twice (first to get the sort data and second to get the records). In 4.1 and beyond, the sort puts the sort data, the row position, and the columns required by the query. Thus there's no need to go back to the table as the columns can be read from the sort buffer.
In either case, if you don't have a large enough sort buffer for sorting the data MySQL does a quicksort of the data in the buffer and then sticks it in a temporary table to free up the memory buffer for more sorting. That operation is expensive compared to keeping things in memory.
This is covered pretty well in the MySQL internals doc on sorting.
Posted by mike at 12:56 PM
April 3, 2006
Thanks for the iPod (and the good times)
I had a blast playing with MySQL 5.1.7 and exploring the latest cluster stuff last week. This week I find out that I'm getting a new iPod for it. That is just sheer goodness. Thanks MySQL.
And yes, I do love to rock.
Posted by mike at 5:59 PM
Meeting with Oracle DBAs
Last week I was contacted by a person looking for a non-MySQL employee who would spend an hour or so talking to a group of employees at a large international company exploring MySQL. The request was to have someone that's been using MySQL "in the trenches" for some time to get on the phone and have a friendly but honest conversation about how MySQL works. I can think of a dozen people who would be excellent, and perhaps more qualified, but I'm going to give it a shot. Jay's in town for LinuxWorld (and conveniently staying at our place) so I might have to pick his brain about any deep MySQL "internals" questions.
I got a bunch of details on the phone today and it looks like the meeting will be a lunchtime (in my time zone) phone call with a handful of long-time Oracle DBAs who manage databases on the order of hundreds. The organization would like to move into using open source tools and thinks using MySQL on Linux is the place to start exploring. Good choice.
I'm more curious to hear what kinds of questions are on their mind, and I have a lot of questions of my own for them. I did reserve the right to take questions away from the meeting in case there's something I want to be sure about.
One thing I will do is put in a plug for the MySQL Users Conference, where else can you get such rich information and meet other experienced MySQL folks? Nowhere.
I was glad to hear that the manager recently attended a MySQL training (not sure who it was offered through) and the instructor recommended Pro MySQL. There's another few copies in circulation.
Posted by mike at 5:30 PM
April 1, 2006
PHP Quebec MySQL Cluster Slides
I promised folks I'd put up the PDF of the slides (~401K) from the Introduction to MySQL Cluster session at PHP Quebec. The PDF doesn't capture much of the detail, and none of the conversation that comes from questions raised but it does give a sense of what was covered.
I think the session was recorded, not sure how/when that is available. I think PHP Quebec produces a CD with all of the sessions in mp3. If it's generally available I'll post it here.
Posted by mike at 11:17 AM