« February 2006 | Main | April 2006 »

March 31, 2006

PHP Quebec for MySQL Folks

I haven't tagged my series of posts from PHP Quebec as MySQL because there's a lot of non-MySQL stuff that may or may not be of interest to MySQL folks. They are the previous seven posts in the "Technology" category for anyone that's interested.

A few items worthy of note for MySQL folks:

Just a few highlights, was a great conference and trip.

Posted by mike at 11:27 PM

PHP Quebec Winds Down

Made it back to Boston on a late flight having had a great time up with the folks at PHP Quebec in Montreal learning about PHP and related technologies. Although there was a lot about PHP specifically, there was also a good deal of conceptual ideas that carry across many languages. Watching Rasmus use callgrind to sort out details of where performance was weak gave me a lot of ideas for my own work.

I had almost no time to get out and see Montreal, just a few quick walks at the lunch breaks. Seems like a great, very international city (almost no English seen or heard on the streets around the hotel). Would love to get up there on a family vacation and have Heidi (who did many years of French in college) to translate.

Posted by mike at 10:56 PM

Rasmus Lerdorf: Fast and Rich Apps

At PHP Quebec listening to Rasmus Lerdorf (creator of the PHP language) presenting on using PHP to create fast and rich web applications. The focus is on using things like XML and AJAX to create web pages with less user-noticed reloading (that's the rich part).

Rich

Gives an example of using XPath in PHP 5 and applying an XSLT stylesheet. A look at how Rasmus does XML processing in PHP (prefers SimpleXML). Gives a single-line example of grabbing a feed from Flickr to show recently uploaded photos.

Then goes on to providing and consuming SOAP services with code to demonstrate how it's done. A pretty siimple snip of code to provide and use web services (for this example).

Rasmus throws up a slide that shows the top 15 searches of the day for Yahoo (there are a few comments about what people are searching for as the audience looks down the list). Then he takes that information, grabs the number one image for that search, plots those images along an ellipse and outputs in html in the browser. A cool looking ellipse of images that you can mouseover and see. Rasmus built it about a year ago and left it up for a few weeks. When he took it down he got enough complaints that he put it back.

Rasmus shows another cool app that he built to move photos from his local web-based gallery to Flickr. All on a single screen in a web browser. Added a bit of eye-candy. The app shows the gallery images on the left and the Flickr account on the right and if you click on the image in the left it bounces over to the Flickr window.

Shows the geocode API from Yahoo! for creating a page that has a list of Yahoo! maps. Really cool navigation with Javascript and some Flash. He extends that by plotting earthquakes from an earthquake RSS feed.

The presentation goes into detail on using the MVC model and how it maps to web-based applications.

Fast

Rasmus gives a scenario of a sample web site with 500,000 users and looks closely at how many requests, response times, and number of servers you'd need to be responsive. Runs apache in callgind to follow application and where it's spending time. Sort of like a debugger in that you step through the pages in the app and see what places in the code are taking what kind of time. Rasmus is using PostgreSQL, and finds in the callgrind output that much of the app time is spent negotiating SSL connection with PostgreSQL. SSL is on by default, but it's not needed here. Turns it off and gets much better performance.

Switches app over to MySQL, gets much better performance. Queries using MySQL non-persistent connections are faster than Postgres persistent connections.

Looks at APC and subtle ways to get pieces of code cached. Better off having longer, but fewer include files in PHP for performance.

Presentation is here. Includes the slick Web 2.0 app code.

Posted by mike at 1:03 PM

March 30, 2006

PHP Debugging with XDebug

Listening to a presentation on using Xdebug for PHP debugging by Derick Rethans at PHP Quebec.

If you have pear installed it's easy (pear install xdebug-install), otherwise you can download and do the install manually. After that there's a configuration option to add to php.ini. PHP --enable-versioning and most Zend extensions will prevent the debugger from working.

Xdebugger shows the error and then the output of the stack trace of calls through the code.

The debugger gives web-based results but also has a shell-based debugger. Derick shows using the debugger in ActiveState's Komodo IDE, entering breakpoints and stepping through a particular script. Cool stuff.

Posted by mike at 3:47 PM

Introduction to PostgreSQL

First afternoon session for PHP Quebec is An Introduction to PostgreSQL by Robert Bernier.

"PostgreSQL isn't hard, it's just loaded with details." Robert says the goal of the presentation is to make the audience curious enough to go play around with PostgreSQL. PostgreSQL documentation is superb.

History

Relational databases started with Ted Codd breaking data into concepts using language for breakpoints. IBM, Oracle (or what was to become Oracle) and Cal Berkeley took the idea and started development. Postgres was an idea developed at Berkeley by lots of students. This was in the late 60s & 70s. Informix is based on Postgresql, bought by IBM now. Oracle and SQL Server has PostgreSQL code in it too. Pretty much every database except MySQL has some PostgreSQL in it.

Open sourced in 1996, Berkeley terminated the project. Core team of 4 took the code and for 2 years cleaned it up.

In 1996 Robert started working with PostgreSQL and MySQL. MySQL worked, PostgreSQL had better documentation.

MySQL is more popular, why isn't PostgreSQL? Two reasons:

Who uses Postgres? Every router by Cisco has a postgres database in it. Big companies are using Postgres, Wall Street included. Weather service adopted it. Robert gives numerous examples of folks using Postgres. Some pretty big uses.

Technical Details

Robert uses Debian with KDE. In Japan PostgreSQL is the number two database, Oracle is number one. MySQL is not used because of language issues.

After installing use the initdb to create a data cluster. A data directory is created with a set of folders and configuration files. New cluster is 30MB (Oracle is 1G empty database).

A recent project Robert was working on where a financial company was moving to PostgreSQL. They were getting 400 transactions/second with Oracle. On an untuned PostgreSQL were getting 740 transactions/second.

PostgreSQL won't run as root and automatically disables connections from outside the server it's installed on. Some permissions are stored in the a configuration file. The default network access is not secure, passwords can be encrypted using another encryption option.

Robert gives an explanation of domain sockets and how they work. PostgreSQL uses sockets on local machines (like MySQL).

In PostgreSQL you can create template databases to be used when creating a new database. That's a pretty cool feature. I guess in MySQL it's a dump piped to another database.

Robert recommends going to the Client Applications and Server Applications in the documentation to start.

Example

Robert shows an example of a PDF upload tool. His table has a field that is cdir datatype, that is cool.

The last slide is a picture of a dolphin being roasted on a skewer over a fire. Yikes . . . sorry Sakila.

Posted by mike at 2:03 PM

Zak Greant: Copyright, Contracts and Licenses (and more)

Second session of the morning at PHP Quebec is Zak Greant talking about copyright and software licensing. Not going to attempt to document the whole presentation but will note key ideas.

Zak starts by saying that before listening to anyone on copyright or licensing you should know who is giving the advice. Zak works for eZ systems and the Mozilla Foundation. This presentation is not legal advice, Zak is not a lawyer.

In general you must be diligent about copyright, licenses and contracts. Take the time to review contracts and licenses in detail, get legal advise.

Copyright protects expression, not ideas. It's possible to create a work having seen another and have the ideas reflected in the new work the same but the expressioni different and be ok under copyright law. Getting a copyright doesn't require any work, it happens when the work is created. For further protection a person can go through a process to file a copyright which is free in most countries (US charges a small fee).

Fair use allows you to quote a copyrighted work in limited amounts and parody existing works. Correlates closely to political freedom.

Zak says the fundamental reason for copyright is to create an environment to encourage people to create (or close to that idea). Interesting idea, I'd like to hear what Larry Lessig has to say about that.

Software often is licensed, not sold. You enter into a contract saying you won't sell it, won't make modifications etc. Companies are trying to limit what you can do with purchased software. It's still not clear if this is a legal way to distribute software.

Zak mentions ChillingEffects.org as an interesting site to look at to find records of cease and desist letters from fanatic lawyers.

In general, patents do not help software. They no longer help innovation.

Every time we install a piece of software we are entering into a contract. The key thing with contracts is "did you or did you not agree to the contract." Just because you didn't read and sign doesn't mean you didn't agree.

Always read contracts that are personal (mention your name). Read it several times. Then find a lawyer who knows the area and read it with them. Lawyers who aren't in their area of expertise may give you bad advice.

Licenses (a quick look)

GPL requires distribution of the source code and the entire bundle if you make changes and ship them. So if you make any derivitives and use them along with the GPL licensed software you have to ship it all. If you develop something that stands alone but also works with GPL software you can ship separately, but if you ship with any of the GPL software you have to ship all of the GPL software. If you don't ship software that you odify you don't have to contractually distrubute anything you make.

If I build a service on top of GPL code but I don't ship the code, just provide a service using it do I have to provide the source code? The answer is no.

General advice, don't piss off large organizations. If you get sued, even if you are in full compliance, you will spend tens of thousands of dollars proving it. Fight for your rights before entering into contracts.

Make a carvout on your contract that things you want to be yours is specified as yours (or more specifically not the company's).

Keep a list of all of your agreements, how long they last etc.

Zak says the slides will be up on his site later.

Posted by mike at 11:07 AM

PHP XmlRead: Easier than SimpleXml but yet more flexible and powerful

Listening to Marcus Börger's presentation on XmlRead, an extension for PHP at PHP Quebec.

Some impressive work has been done to resolve gaps in using XML within PHP. Marcus found XML work in PHP 4 lacking and is part of a group who tackled XmlRead to improve this.

Marcus goes through a nice comparison of DOM and SAX parsing, and how PHP parsers have handled XML parsing using the different methods. XmlRead provides complete W3C compliance. He provides an overview of the interface to XmlRead and then gets into a slew of code examples.

Marcus' slides can be found here.

Posted by mike at 10:33 AM

PHP Quebec Underway

Made it to Montreal safely last night. Off the plane ~9pm but not to the hotel until ~11pm. The shuttle bus from the airport was quite packed and required a shuttle bus transfer downtown. I should have researched taking the subway, but figured that the aiport shuttle for $11 was worth not having to figure out if there's a path via public transportation. Maybe next time I'm in Montreal I'll look harder.

It's cool to be immersed in PHP, even if for just a few days. I've used PHP heavily in the past on a few projects but it's been a little while since I've been actively writing PHP although I do periodic maintenance.

The schedule looks jam-packed with interesting sessions. I'm interested in hearing about what's happening with PHP and also in seeing some of the database presentations (primarily non-MySQL). The only other MySQL presentation is a Tour of MySQL 5 by Damien Seguy, but it's in French.

Will write as long as the battery lasts.

Posted by mike at 10:18 AM

March 29, 2006

MySQL Development Team's Response to Bugs

It's pretty rare that I come across a bug in MySQL, but the few times I have I'm impressed at the kind of response that comes from the developers. Yesterday's bugs (filed late last night) were verified by mid-morning. I had marked both of them as severity S3 (non-cricical) but turns out one is reclassified as S1 (critical) because it's causing a server crash (as opposed to a client disconnect as I thought). It seemed trivial, a connection loss caused by a syntax error on setting up disk-based storage. Then again, nobody wants to fear that if they type their SQL incorrectly it might bring the server down.

I guess verification and resolution are two different things, but it's nice to hear back quickly. I'm glad these aren't happening in a production environment because I'd be hoping for verification and a fix by now.

Posted by mike at 12:27 PM

March 28, 2006

Day Spent with MySQL 5.1

Today I took a break from software engineering. Besides a few meetings and a few small code tweaks I spent the bulk of the day looking at MySQL Cluster in 5.1.7.

I've set up a cluster a handful of times now so it's fairly familiar. From scratch to having three boxes (one Fedora Core and two Red Hat ES 4) and MySQL 5.1.7 installed took until ~10:30. At this point I was walking from the server room back to my desk when I noticed a conference room full of engineers. A 10am meeting that I thought was pushed back to 2 wasn't. Oops.

A few hours later I was loading sets of data from our production database to see just how well the data moves from 4.0 to 5.1. I haven't completed any comprehensive tests, but I wasn't impressed with performance when restoring from a dump file. Perhaps I need to do some cluster tuning, or the cluster is just slower at DDL. Next I'd like to grab some sample sets of queries from production and run them against the cluster to get a sense for performance based on real queries.

The bottom line is that I got the 5.1.7 MySQL Cluster up and running and loaded with sets of data in both in-memory and disk-based storage and am ready to put together a plan for testing our application against it.

I didn't find any major issues, but to be fair I did find some bugs along the way and filed them with the folks at MySQL.

Posted by mike at 11:12 PM

What's New with MySQL 5.1.x

There's a lot to be said about what's happening in MySQL 5.1 Quite a bit has already been written about 5.1, but as I'm starting to do some testing I thought I'd hit the release notes and see how things have progressed over the past few months. Some great things coming down the line. A few that I found worthy of note:

These highlights represent a small fraction of all the entries in the release notes for 5.1. Things seem to be moving along quickly with the 5.1 release, first public release in November 2005, into beta already by February 2006. Rumor is that 5.2 won't be far behind 5.1 and will include numerous exciting enhancements.

Posted by mike at 12:10 AM

March 27, 2006

No Longer a MySQL Cluster Hobbyist?

I've been a MySQL Cluster hobbyist for some time now. I say that because over the years I've followed what's happening with MySQL Cluster and have run it on various personal computers (and laptops). I threw a chapter into Pro MySQL about setting up and managing a cluster and am off to PHP Quebec on Wednesday to give a presentation on getting a cluster up and running. But I've have yet to work with the cluster in a setting other than for personal intrigue.

The other day at work I was telling a few folks about the upcoming trip and mentioned my presentation. The CTO was along for lunch and had a slew of questions about the MySQL Cluster. To my delight one of the next things on the infrastructure list is to research alternatives to slicing data across many standalone machines and implement a clustering solution.

Later in the afternoon I got an invitation to spearhead the research and implementation of database clustering (there is no DBA or sysadmin, this kind of responsibility is shared by engineers). We had a brief meeting looking over alternatives and to talk about where MySQL is on it's clustering features. The current stable MySQL Cluster (5.0) with in-memory storage may be all we need for now, but we'd be looking for the stable release of the cluster with disk-based storage down the road for a full implementation.

The current MySQL Cluster is just one piece of a bigger puzzle. Without disk-based storage we'll have to keep a set of servers for larger databases with failover to replicated machines. We've also thrown in looking at 3rd-party cluster alternatives as an interim solution to the lack of disk-based storage.

Now I still have software to engineer for a good chunk of my time, but am taking tomorrow off to set up the MySQL Cluster at the office for folks to start poking at. I'd like to start with 5.1 to play with the disk based storage to see where thats at, but will likely end up back on 5.0.19 for real testing.

Looking forward to having some time at work to devote to this, you can only go so far as a hobbyist.

So what am I now . . . probably still a MySQL Cluster hobbyist.

Posted by mike at 2:27 PM

March 25, 2006

New Hampshire for the Weekend (there's WiFi in these woods)

We jumped in the car Friday immediately after work (working from home makes for a quick transition from the office to the car) and headed for New Hampshire. We're spending a few days at a lake house on Pea Porridge Pond near Conway/North Conway in the White Mountains. We came up in the fall when friends invited us to come up and use their house, had such a great time we couldn't wait to get back.

Last time we were here I had this thought about how cool it would be to have a job where my physical location wasn't that important. Since then I have switched jobs and am working remotely from the home office most of the week. There's a lot of wiggle room now as long as I can get a decent connection to keep in touch on email and make code commits.

Turns out I can get a very faint signal on the Blackberry, which at a minimum lets me keep going on email. It also could be tethered to the laptop for committing code changes and server configuration work (a bit slow). Much better than that . . . if you can believe it someone in the surrounding woods has an open WiFi network. There are other cabins not too far away . . . thanks to whoever is providing connectivity up here. Probably good to have a backup plan in case it's unreachable (signal strength is low to medium).

Posted by mike at 7:51 AM

March 24, 2006

March MySQL Meetup - No Video

I'm sorry to report that the video of the last Boston MySQL meetup isn't going to see the light of day. The meetup was good, ~20 in attendance to listen to a presentation on MySQL cluster that I put together and have a discussion about the technology.

I spent some time fiddling with the video we shot, but the fact that we had no microphone or tripod, and were in a huge auditorium at MIT where the camera had to be set at the back to capture the slides on the mega-screen all add up to a dark video with unacceptable (many times inaudible) audio. There are chunks of time where the video is almost pure black while hearing comments from attendees. Next time will be better.

As always, thanks to MySQL AB for pizza and soda. Also to the group member who scrambled to find a new room when the room we were supposed to use was double-booked.

Posted by mike at 10:32 AM

Oracle nerds vs. MySQL geeks at LinuxWorld

Not long now until LinuxWorld Boston gets underway.

This year I've been conned into helping at the Apress booth on Thursday afernoon for a bit and will be meeting up with Jay and Jason for lunch. In the past I've gone over for one afternoon just to check out the exhibits. This year I'm also going to head over on Tuesday at 4 for the Golden Penguin Bowl, which should be fun:

Back by popular demand, it's the Golden Penguin Bowl. Expect an epic battle this year, as we bring you the riveting spectacle of Oracle nerds vs. MySQL geeks. Find answers to age-old questions like "What is your name?", "What is your favorite color?", and "What is the terminal velocity of an unladen swallow?" This irreverent geek trivia game will determine who takes home the coveted Golden Penguin.

Rumor has it that Jay's on the MySQL team.

Posted by mike at 9:59 AM

March 18, 2006

Afternoon visit to the Harley Davidson Store

I've wanted to do this for a long time. The Boston Harley Davidson store (big, with lots of motorcycles) is on a road I frequent not far from where we live. This afternoon I got a chance to spend some time there looking at the bikes. I was waiting for a new tire to be put on our car and was told it would take at least an hour. The tire store was a 5-minute walk from the Harley store.

It's been many years now that I've been saying I will someday go and make an impulse purchase at that store (and not for a t-shirt). Most people that know me find it hard to believe. I don't have a history of riding motorcycles, but there have been points over the years where I've had access to various types and sizes of motorcycles and can never pass up the chance to go riding if the opportunity comes up.

Do I need a motorcycle now? Most definitely not. Am I looking for reasons why it would make sense to have one? Yes. The truth is that I think being able to go for a ride on a spring or summer day is reason enough to have one, but I'm waiting for a more practical application. Perhaps my next job will require non-public transportation travel to the office and have great parking for folks on bikes.

As far as the experience in the Harley Davidson store . . . it was very comfortable. I suspected I would walk in to a group of serious, bearded, leather-wearing bikers that would be unwelcoming. At the front door there was a person greeting customers and was very friendly. Some of the employees had the typical biker look but everyone was very friendly and welcoming. I must have been the most un-biker looking person in the store, but there was definitely a range of folks. A man looking at the same class of Harleys I'm most interested in looked like a business executive.

The Harley that I like the most is the Sportster 883. It's the simplest bike currently coming off the line, and is just about as big as I'd like to go. I'm not looking for something to ride across the country or race, just to cruise around town, run errands, and possibly get to work and various meetings.

I'm not 100% sure that a Harley is the right bike for me, but it seems like if I'm going to do it (get a bike) I should at least entertain the idea of getting a bike that has so much history. I also went over to the Honda bike shop (across the street), not nearly as impressive and much less friendly folks. I have seen some nice Honda Shadow motorcycles.

Of course, before I do anything I'll have to get licensed. This course looks like a good thing to take, and successful completion exempts me from taking the RMV test. Something to put on the list.

Posted by mike at 9:17 PM

March 10, 2006

Storing Binary Data in MySQL

The topic of storing binary data in MySQL rears it's head every so often, a debate always ensues about if and when it's appropriate, and fades out without any resolution. Sheeri posed the question again a few days ago.

We did this at Tufts, had ~1.5 million images (~35G) that were stored and served from a MyISAM binary_data table. I think the grass is always greener on the other side. The performance was decent, and we were able to use replication to spread load but we regularly talked about moving to an NFS.

I should mention that I presented about this at OSCON, and was fairly pro database for binary data. Since that presentation our image database has more than doubled in size, and we gained access to a fullly-reduntant, multi-terrabyte Network Appliance. Once we had some of our data on the netapp, we thought the grass was greener over in NFS land for our images for a few reasons:

  1. Database backups and snapshots are much easier without huge amounts of binary data. If you need to refresh a slave or sync data down to a development environment it adds a lot of extra time when the dump and restore includes the binary data. It took us many hours to get a snapshot and sync it down to another machine.
  2. The data center could back up and restore to the filesystem, but not a MyISAM table. If an image (or 10) got deleted, the filesystem backups could handle putting them back. With MySQL you had to drop the entire table, restore from snapshot and roll the binary logs forward. Just seemed like a lot of work to get images back.
  3. A corrupted disk with ~1.5million images is easier to deal with than a corrupted 35G MyISAM table. I feared the day that our binary_data table reported that it needed repair. I never had to run the repair on that table, but on tables a fraction of the size it was a painful wait.

Having said that, there were good things about having the images in the database. Being able to replicate and have images appear on many servers in near-real time was pretty cool. The load could be spread across several machines instead of having one server. We didn't use foreign keys, but having integrity with your binary files is nice.

To Sheeri's point about moving to MySQL because you don't want to get another NFS server to add more images, putting them in MySQL might not change that. A lot depends on how you design the system. If you need more space or horsepower at some point, you'll still need to swap in bigger drives or slice the data out onto another machine. There's a slide in my presentation that compares performance between filesystem, MyISAM and InnoDB retrieval. I did not compare inserts or updates.

We had a lengthy conversation about this on one of my last days at Tufts, will have to check in with the developers in a few months and see what they decided.

I do think the grass is often greener on the other side, if you're unhappy with NFS the database looks better, if you're unhappy with the database NFS is the answer.

Posted by mike at 10:03 PM

March 9, 2006

Fourth Year of the Mike Kruckenberg Weblog

This weblog recently entered into it's fourth year. I've been meaning to make a note of it for some time.

On January 13th, 2003 I started this weblog at Pete's suggestion (he first posted in Oct. 2002). I wondered at the time if it would be a passing thing. I guess it still could become such. After three good years I'm tempted to say it's become a fairly permanent fixture in my ongoing activities.

Looking back on my first entry tonight it seems kind of silly. What did I know back then? So much has happened over the past three years. Things just seem so much clearer (something I notive every year, my thinking and observation becomes clearer).

I also noticed that that first entry had a comment from none other than the weblog and MySQL superstar Jeremy Zawodny.

Ah, the good old days.

Posted by mike at 11:08 PM

March 6, 2006

The Perfect Day

A few weeks ago we had what I would classify as the perfect day. It happened to be a Sunday in February when a snowstorm blew into town and closed almost everything down. Throughout the day I mentioned a time or two that it had the makings of a perfect day. Turned out to be somewhat picture-perfect.

I'm not sure what the perfect day looks like to you, but these were the happenings that led me to categorize it as such:

I guess that's the perfect winter day, probably a completely different set of criteria that makes a good summer day.

The photo is of my 3-year old, Ezra, who's become quite a master at setting up and running my electric train (O guage).

Posted by mike at 10:49 PM

Bostom MySQL Meetup on MySQL Cluster

Monday, March 13th is the next MySQL meetup in Boston. I've volunteered to speak, giving a talk called "An Introduction to MySQL Cluster", a presentation that details getting a cluster installed, configured, and running. This will be a shortened version of the PHP Quebec talk.

If you're in town, please stop by. The meetup is now convening at MIT.

Will be carving out some time in the next few days to dig through the latest changes in the cluster and iron out the details of the presentation.

Posted by mike at 10:42 PM