April 21, 2005
Tim O'Reilly's Closing Thoughts at MySQL 2005
A few thoughts from Tim O'Reilly's closing remarks at MySQL 2005.
The new paradigm is that software is a service.
Part of the new application is user-contributed information.
Data is the Next "Intel Inside." NAVTEQ is the Intel inside maps. All map services use their data.
Tim talks about numerous software developments in flickr, amazon, google, yahoo etc. Tim shows a cool application that joins Craigslist and Google maps and lets you plot all matches from Craigslist on a Google map. Another one is the baby name wizard's voyager applet that graphically plots history of name popularity.
Posted by mike at 6:46 PM
MySQL Conference Wrap Up
Waiting in the lobby for the doors to open in the ballroom to hear Tim O'Reilly give some remarks as MySQL 2005 comes to a close. It is amazing that a single piece of software can pull together 1300 people for an entire week, with as many as 7 simultaneous sessions at any one time.
Like all good things, it's hard to see it end. The conference has been filled with useful sessions, in some cases more than one at a time that required choosing between two. On the flipside, I don't think I can process any more information and am ready to let everything sit for a few days before I have to give the abbreviated summary back at the office.
Favorite moments? It all seems like a blur. The two things I've been really fascinated by here have been first, hearing about how MySQL manages it's employees and development work. It's particularly amazing how diverse the employees are (spread around the world), yet how fast the projects move foward. I think it was the GUI team lead that said he's got three employees, one in Austria, one in Germany and one in South America.
The second thing that stands out is the progress being made on the database itself. Not only is 5.0 really getting there, a lot of talks hinted at 5.1 and the good things that already are, or will be happening there.
Posted by mike at 6:14 PM
Writing a Storage Engine for MySQL
Listening to Brian Aker talk about writing a storage engine for MySQL at MySQL 2005.
The CSV and BDB storage engines provide good examples for doing this. Look at the source files and copy the methods. There is also a lot of information on the MySQL site for how to work through this process. You might start with the example engine.
Directories/sql is for the kernel. mysys/ is the portable runtime, should make use of it to make your storage engine portable. mysql-test is the Q/A test suite. If writing code in the server, write test cases.
Termsstorage engine - code that stores data handler - code that controls the storage engine
Has anyone does this?Friendster Google Yahoo
Brian knows of ~25 storage engines.
Side note: In 5.1 MySQL is moving away from the text-based logs (slow query log, query log and error log) and using the CSV storage engine to create data files that are readable.
Another side note, timestamps are handled in the storage engine, not in the MySQL server.
In looking at a piece of code Brian takes a jab at "another open source database" because they use sscanf() and snprintf(). MySQL wouldn't because it doesn't perform.
Brian suggests that MySQL will be moving to a more modular design, where modules can be integrated in a more Apache-like way.
Posted by mike at 4:34 PM
MySQL 5.0 Query Optimizer
At MySQL 2005 Listening to Timour Katchaounov talk about the query optimizer in MySQL 5.0.
(I missed the introduction while enroute to the room).
The query processing steps:
- Parser (builds tree)
- Preprocessor (checks syntax, columns)
- Optimizer (generates query execution plan)
- query transformation
- search for optimal execution plan
- plan is refined
- Query sent to execution engine
- access methods (whether table scan or index)
- where conditions
- union, group, etc
MySQL uses a left-deep linear plan for executing a query. All of the tables fall into a single line. Many other systems use the bushy plan, which is more tree-like.
Timour shows a large query with 5 or 6 WHERE conditions and steps through the process of how the query is parsed.
In optimizing a SQL statement there is quite a bit of analysis of the cost of a query. The cost is calculated by looking at things like how many times the disk will need accessed, the number of pages per table, the length of the rows and keys and the data schema (key uniqueness etc). Determining costs involves mathematical operations to determine the cost using different methods. The type of storage engine isn't considered in the cost.
MySQL 5.0 has greedy searching. It doesn't consider everything, just gets enough information to find a good path and then moves on.
Posted by mike at 2:26 PM
MySQL Development Model
Maurizio Gianola, VP of Software Engineering at MySQL is talking at MySQL 2005 about how MySQL develops software.
Maurizio opens by talking about a book, Get Out of My Life, for parents of teenagers. MySQL has entered the teenage years, a lot of excitement and enthusiasm but needs to learn to grow into an adult.
FactsHeadcount 28% in North America, 43% in Western Europe, 25% in Eastern Eurpope (Russia, Ukrain), 4% other (Australia, South America). Employees are paid based on their region. If they move their salary get's adjusted.
Development GoalsOverall, MySQL aims for performance, reliability and ease of use. There are a number of other goals, a few of note. Increase release schedule predictability. More and more people are relying on MySQL and can't accept the former model of waiting forever for a release. Another thing that has been done is to say that at the beta level there are no new features, and controlling the amount of code changes in a beta in general.
OperationFunctional teams are grouped by expertise, not geography. Monty and Brian control the architecture very tightly.
Internal communications is done with IRC, email and VoIP (for last minute and emergency). Using IRC and email lowers the language barrier because typing gets the words across better than verbally. Each year they have one week where eveyone comes together and works at a central location. A monthly newsletter is sent around to keep people informed.
Recruiting is done mostly through the community and via word of mouth. Most already know and use MySQL and can be asked detail questions about how they would fix a bug or write a use-case to test some software.
Development starts with big meeting with engineering, marketing, sales and support where they hash out how things are going to work. Flagship projects are committed for a specific release. Rolling projects are worked on over time and committed to the release coming next after the feature is built. Sponsored features are those that a customer wants to contribute to for converting a rolling feature into a flagship. Time is assigned to projects during the design, not by the developer.
Agile development model, using scrum for the monthly development cycle. Bitkeeper is used for source control with a wiki for sharing information about plans, schedules and policies.
Design and code reviews are done by Monty, and he's picky.
The release tags:
- alpha - partial features are available and working
- beta - all critical features are implemented and working
- gamma - no fatal runtime bugs
General availability is backward from what commercial companies typically think. For MySQL GA comes with the first alpha release.
Q/A is done internally as well as by external sources and the community.
Tools and ProcessesWorklog is used to keep track of projects. All ideas are captured there, and then potentially turned into projects. Each developer takes a task at the beginning of the month, typically designed to consume 65% of their time, and works for one month. Each week a progress report for each project is created, and then the summary.
Worklog is a complex tracking system, can do quite a bit to coordinate between.
The build process for the community version goes out every 4-6 weeks. For the certified version they take the community version and maintain patches of only critical bugs for it for a long time.
Features are considered in a systematic way.
(For both the build and feature conversation there is a very complex graphic of the)
How to work with MySQLIf you want to add code, you must pass it through Monty, who is *very* picky. Contributions must be turned over to MySQL for reasons of their dual-license model.
Embracing Insanity is a great book to get up to speed with the open source community.
Posted by mike at 1:29 PM
April 20, 2005
Using and Extending Full Text in MySQL
Listening to Joe Stump talk about MySQL's full text indexing at MySQL 2005. Joe is at eNotes.com.
I looked at the full text search a year or so ago but it didn't feel like it was worth the switch from DBIx::FullTextSearch. The matching of MySQL's FULLTEXT didn't quite do the job, and weighting is important to us.
It appears that not much has changed in FULLTEXT since I last looked.
Joe has created a weighting workaround. He does multiple queries on different fields, and then takes the relevancy from each query and multiplies it by a factor based on the field that the match came from. Joe then combines the results and sorts by the calculated relevancy.
Joe indexes PDFs, Word docs and MP3s with special tools.
Posted by mike at 7:54 PM
A Tour of the MySQL Source Code
I'm a little out of my league here, but for the sake of Jay, the co-author on ProMySQL, I'm attending a presentation on the source code of MySQL by Brian and Monty (that's Aker and Widenius). If you're not in the loop Brian is the Director of Architecture (and wrote the slashdot system) and Monty is one of the founders who wrote the original code.
Monty starts by asking why people want to know about the source code? Some people are interested technically, but others are curious about the process of bringing all the development together.
Server DesignC, C++ and ASM. 80% in C for client library, networking, I/O, etc. The C++ isn't really object oriented, used like C but written in C++ for performance.
(a nice graphic to show how things interoperate)
MySQL -> Parse/Optimize/Retrieve Store -> Storage Engine
Storage EnginesA quick look at the list of storage engines.
FilesystemWhat's in MySQL directories.
Kernel/sql - running instance itself - file naming scheme matches the functionality (delete, update, etc) - written in C++ /libmysql - contains the embedded MySQL server
Storage Engine/myisam /heap /innobase /merge /heap
In 5.1 these will all be moved into a directory called storage.
Portable RuntimeThe portable runtime is a place to centralize functions to run on different systems.
/mysys - find out open, read, write
/strings (includes character sets)
/dbug - use to generate trace files
Clients/libmysql /libmysql_r /sql-common - common things between server and clients
librarieszlib regexp readline vio
testThe QA system that In 5.0 most of the config stuff is in config/acmacros. After 5.1 the file tree will be reorganized.
Adding a FunctionYou'd want to look at these files. lex.h sql_yacc.yy item_create.cc myfunc.cc myfunc.h Makefile.am
Adding Field TypesEdit field.cc and field.h, subclass Field and implement 12 methods.
Monty and Brian open it up to folks to suggest different fields they'd like. IPv4 IPv6, compressed blob, GUID come up.
Array datatype has been on the list, someone has been looking at it but hasn't been committed to a certain version.
Development ModelSCRUM (agile) used for development. Break down tasks into 30-day chunks. Developers pick up.
Three queues. Raw, worklog and sprint. The worklog is managed by an open source tool called worklog.
Do a high-level architecture document, then do a low-level document and then put it in the backlog queue. When a task is in the backlog queue any developer can pick it up, put it in sprint to work on it for a period of time and then put it back to backlog.
Once it's developed it's reviewed and then put into source. Even changes made from within the company must be approved. It often takes 3-4 iterations to be approved and then is finally put in the tree. A test case is required for approval. When it's put in the tree, the approving person is held accountable for the contribution, not the writer of the code. The senior developers allocate.
The code commits are sent to a mailing list so the community can review any changes in the system.
The developers of MySQL are expected to spend 80% of their time working on things assigned by the company, 20% of their time they can choose what to work on.
Authorization and authentication module is being written to enable LDAP authentication.
Posted by mike at 4:25 PM
MySQL Cluster Architecture
Listinging to Max Mether present on the architecture of MySQL cluster at MySQL 2005.
Data DitributionTables are horizontally partitioned. Spread equally across the data nodes. Each piece of data has a master and a secondary data node. If a node failes then it's secondary data node. If you loose too many nodes Max says "it's game over for the cluster". Max suggests that you should associate one node with one computer, having multiple nodes on a computer doesn't help with redundancy because if the computer fails all the nodes on that machine will fail.
Synchronous ReplicationThe cluster operates on the two phase commit principle. In the first phase all the data nodes are updated. If the first phase was successful on every node, the transactions are committed. The commit phase is the second phase.
Failure DetectionTwo methods used to determine if a node has failed.
- communication loss with node
- heartbeat failure - each node sends heartbeat information to tne next node in the cluster. if a node isn't sending a heartbeat, the node that should be getting it will inform the others. hearbeats are sent every 1.5 seconds by default. A node is failed if it misses three consecutive heartbeats
Node RecoveryA failed node needs to be recovered ASAP. The restored node:
- rejoins the heartbeat circle
- copy meta data (tables, indexes, cluster info)
- join possible transactions
- copy data from the primaty node
- regain the primary status
System RecoveryWant to avoid a complete system shutdown if possible. There are checkpoints that put the data to disk that can be used, but it's a bit of work.
On each committed transaction there is a redo log on disk that gets written. The data is flushed to disk on every global check point, which happens every 2 seconds. There is also a local checkpoint that makes a complete copy of the data in the cluster. The local checkpoint takes a lot of time, so they write dirty data but also create an undo log that can be applied to bring the data to a consistent snapshot. Each node has it's own backup, redo and undo log.
(I missed the rest, got caught up in a sysadmin issue dealing with a restore from backup back at the data center).
Posted by mike at 2:16 PM
April 19, 2005
MySQL Cluster New Features and Roadmap
Listening to John David Duncan present on MySQL cluster at MySQL 2005. Cluster was announced at last year's user conference, released with version 4.1.3.
The time since then has been spent on understanding the cluster, it's limitations and processes. Nobody initially understood how NDB would integrate into MySQL and a year later there is a much greater understanding.
4.1.10a has been released and is a milestone release. Easy to install binaries with RPMs. A chapter in the documentation was a part of the release. BLOBs data types were added.
At 5.0.3, NDB was put into the source tree. Where clauses are being pushed down to the storage engine. Improved space effencency. Includes significant speed optimizations and compatibility with the query cache.
MySQL server requests the data from the storage engine and when it gets all the data it applies the where clause. With NDB this is not a good arrangement so are now pushing the WHERE clause down to the NDB engine to allow the engine to return only the rows needed in the query.
Cluster in 5.1 will include integration with replication, partitioning (will apply to all storage engines), non-indexed attributes stored on disk, variable-length records and online schema and cluster changes.
Partitioning in MySQL 5.1 will allow you to have a table composed of many tables. Will be better than MERGE tables, supports all storage engines and can PARTITION BY range, hash, list or key. The partition work builds a foundation for parallel query execution (wouldn't that be cool!?).
Posted by mike at 7:02 PM
Ticketmaster's MySQL Implementation
Listening to Ed Presz talk about Ticketmaster's use of MySQL at MySQL 2005. Ticketmaster has 9000 clients, 500 arenas, 500 amphitheaters, 150 sports teams, etc. Have sold as many as 500,000 tickets in a single day. This year's U2 tour was crazy. 11,000 tickets per minute, 135,000 tickets/hour for U2 tour.
All event information is stored in MySQL. Venue information, seating charts, delivery (Ticketfast, delivery options, parking upsells), artist, directions, etc. All branding (ie MLB) information is in MySQL.
Converted from SQL Server to MySQL in 2002. Had a lot of expertise with MySQL in CitySearch, a sister company. Have seen improved performance since they moved from SQL Server.
Moved from SQL Server to Oracle for the business transaction (credit card, address information).
Using Linux AS 3 and MySQL 4.0.18 and InnoDB except where they need full-text search. They do not use autoextend, would rather manually allocate than have the database take resources in doing it. The event data stored in MySQL is ~20G, around 200 tables. Rely heavily on replication, using master-master replication with different table sets in some small areas. Put InnoDB tables on both local disks and NetApp server, find little different in performance between the two.
Ed goes through a bunch of tricks in managing their MySQL installation. Ways to defrag an InnoDB table, alter tables etc.
Core ticketing is 28 VAX machines spread across the country. An application feeds the data from the core ticketing VAX machines to two primary MySQL databases which are replicated down to four regional slaves which each have four slaves, meaning that when you hit the ticketmaster site you may hit any one of sixteen databases boxes. I guess that means that in total, there are 22 databases in their setup.
The search box on Ticketmaster uses MyISAM full-text search. Replaced third-party product. The order of indexes matter when creating these. Poor performance from tables where indexes are. Search database runs on a separate box from the other data. Set ft_min_word_length=1 and ft_stopword_file=/dev/null to be able to find all bands.
Ticketmaster has four utitility scripts that give them infmormation on the status of their machines.
- total connections/active connections, selects/second from SHOW STATUS
- Check binary log for deviations in binary_log positions to indicate slow or failed replication.
- Check free space on InnoDB.
- Parse through MySQL errror log looking for key words that indicate a problem.
Posted by mike at 7:01 PM
MySQL GUI Tools
Michael Zinner is presenting about GUI tools at MySQL 2005. He's the lead developer on the MySQL GUI Team. Was developing DBDesigner4 as an open source project two years ago and got swept up by MySQL AB. They are up to eight different tools now (both currently available and those in development/planning).
As far as the vision for the GUI team, Michael wants to develop a dedicated application for each task, to make it easier for new users to learn MySQL, and for advanced users to be able to achieve complex tasks in less time. Wants to make MySQL competitive with Oracle and SQL Server, both of which have slick user interfaces. Didn't want to use Java, preferring to use the native tools which means porting for each environment. The GUI team is only three developers, adding one more soon.
They created a central set of C libraries, which are used by different code specific to the native operating system libraries.
Worthy of note are a new version of the MySQL Administrator which includes more fully featured backup utility.
MySQL Migration Toolkit is a new tool for migrating data into MySQL. Currently the toolkit supports migrating data from Oracle and Access. Will be adding SQL Server, Informix and DB2 next.
Next on the list is a MySQL Workbench (for database design) and beyond that is a MySQL Enterprise Manager and a MySQL Cluster/Replication Manager.
Overall, the GUI team at MySQL AB has created some really nice tools. It's amazing to think that in just 18 months (since Michael joined MySQL) they've built such an incredible set of stable tools. The user interface on these tools is very clean looking.
Posted by mike at 4:45 PM
Does Larry Ellison Deserve a Dart in His Face?
I saw that one of the vendors at MySQL 2005 had a dart board with a picture of Larry Ellison on it, letting people take a turn at throwing darts.
This bothers me.
First, my impression is that there are a lot of Oracle people here considering MySQL as an alternative. I'm not sure what kind of message that sends. I'd guess it doesn't help people feel good about MySQL. I've been a long-time MySQL user, have been off of Oracle for many years and I'd fight to use MySQL anywhere. Yet I get a crappy feeling when I see something like Larry on a dart board.
Second, I think most people realize that there are a lot of choices out there and in some cases another database is actually better for their requrements.
I should lighten up, right? It's just fun, right? I don't see what's funny about creating an us vs. them in this way, especially when some of "them" are here and we'd like them to think that "us" are worth their time.
Posted by mike at 4:10 PM
Using MySQL for 600+ Machine Kiosk Network at Suzuki
At MySQL 2005, listening to Margo Zenk from Matrix Consulting talk about how they built a system to keep 600 machines in Suzuki dealership kiosks in sync. Matrix Consultants was chosen to build the tools because they were good with open source tools.
A little bit of background. Motorcycle buyers (especially ones who race for a hobby or processionally) do a lot of research before they buy a motorcycle. They don't go to a store to get a sales pitch. They read magazines, look everywhere on the net, owner opinions, tech specs etc. Uncovered every flaw with the rider experience with every model and brand and then go to the store. The buyer expect sales folks to have the same level of expertise, which is very challenging. Sales are seasonal, which makes it even harder because the sales force only works with the bikes during the spring and summer.
It used to be that information was sent out for sales guides, printed but were out of date. CDs were tried but the information got out of date quickly as well.
Needed a tool that had to up-to-date information (with alerts on new information), scripts for inexperiences sales folks. The solution also needed to monitor sales process and provide reports. For the customer, the sytem needed to allow customers to play with building their own bikes using 3-D models, and then be able to order the bike.
They created a kiosk with all this functionality, which is now in 60% of the dealerships. Those who opted to buy and install the kiosk from Suzuki found their bike sales went up as much as 35% since they got the kiosk. Won the kiosk of the year award.
The project couldn't rely on the internet because the 3-D models are ~800 Mb and many dealerships don't have dedicated broadband.
Application is in VB .NET, but SQL Server was to expensive so looked to MySQL. Why? Had experiences MySQL staff, easy to use with ODBC, was inexpensive and reliable.
They run one master database with many clients. Each night new content is fed to the clients using rsync. They send a dump of the recent changes to the database (binlog), PDF files, 3d-images in one packed up file via rsync. 15% are on dialup, and some of those have to be upgraded by CD only. Try to keep downloads to at most 150 Mb.
If a site is offline for a long time, which can be months) it simply starts pulling updates from the last time it was online and works forward to the latest version.
Posted by mike at 2:25 PM
Building Redundancy into the Human System
Listening to Andrew Cowie talk about operations professionalism. I attement the pre-conference tutorial.
We need a systematic way to learn from experiences, bridging between structure and flexibility.
The goal of the presentation: Awareness of operations professionalism, and we need to take pride in what we do.
In IT we have a tendency to wing it. True professionalism is being respectful enough to follow the procedures every time so no mistakes are made. When something does go wrong to take notes, document the problem.
Andrew suggests that in operations we could learn something about good practices from extremen programmming. Having two people sitting together to document and run operations will prevent numerous problems.
When the system breaks the programmer say, "Well, systems break. What did you expect?" The programmers have no direct role in the operations, even though they are running things that . Build teams to include systems administrators, database administratos and programmers.
You can't always hire good systems administrators. They might be experienced in some technology, but to make them good in your environment it takes time and training. The faster you can transfer knowledge to new people the faster they get up to speed.
Andrew spends some time talking about the decision-making process. OODA Loop as an example.
The scientific method is rarely used in the IT world. Example is performance tuning. Do we change one thing at a time to see what the performance changes are, or do we change a lot of stuff and then aren't sure about why the performance improved or declined.
Posted by mike at 1:31 PM
Michael Tiemann: Defining Open Source
Mike notes that in 1993 Sun was encouraged to open source Solaris and refused. Ironic that they are now doing it with Solaris 10.
Massive Change by Bruce Mau: Now that we can do anything, what will we do?
Mike points to a document written by the Swedish software review process (in their government) that indicates that the Swedish government finds open source software is of the same, if not higher quality than proprietary.
In Apache, 10-15 people do 85% of the work. Beyond that is a slew of other developers.
Mike suggest that to make good open source software, don't write to get around a problem, write to fix a problem. Writing around a problem leads to code that over time gets bad and when you've got many people developing the same application you need to
Eric von Hippel says that 85% of the best developments from requests and ideas provide by the user, not the provider.
Microsoft is a failure of the shared source code model. In 2002 they spend 100 M on security. In 2003 they spend 200 M on security. In 2004 they were up to 1 billion. For 2005 they've budgeted 2 billion.
In 1996 Linux was told that 2% of Fuzz Tests caused the kernel to fail. Someone thought that was unaceptable and a single person went and fixed all the problems to make it 100% fuzz-proof.
Posted by mike at 10:00 AM
State of the Dolphin at MySQL 2005
Am sitting in Tuesday morning's keynote, The State of the Dolphin at MySQL 2005 with David Axmark and Monty Widenius.
I wrote about this last year at MySQL 2004, some of the same ideas (and slides) are used.
HistoryThe most stable database has no features at all, MySQL only put features in that were/are needed.
Documentation was written with each question that came in.
Second Generation of MySQLStarted in 2001, a real company with a board of directors. Got funding in 2001 and 2003.
Principle of MySQL- Write code as good as you possibly can the first time. - Create a modulare architecture. - Prioritize bugs that can be repeated. - Fix small things - Hire experts regardless of location. - Have the same source tree for all compiles.
Storage Engine ConceptThe storage engine concept came from an early experience where a customer didn't want to support storing data in a certain way but wanted to move to a new format. Monty didn't want to force people to upgrade so wrote a layer that can interact with many different storage containers.
MySQL 5.0The coding is complete, now just finishing bugs.
They have a user who is storing 200,000 terrabytes of data.
GUI ToolsMichael Zinner is giving a demonstration of the Migration Tookit, which leads you through a series of steps to migrated data. He's migrating an Oracle database, including views and stored procedures.
The MySQL Workbench is a database design tool, is in development.
A MySQL cluster and replication manager is planned to start later this year.
MySQL NetworkA comprehensive set of proactive services that saves enterprise developers and DBAs time and effort.
- certified software
- knowledge base
- production support
- intellectual property protection
- Multi-vendor support
ContributionsMySQL doesn't accept many contributions, mainly because it's hard to develop code. They are much more interested in ideas and well documented bugs.
Posted by mike at 9:15 AM
April 18, 2005
Mastering Changes and Upgrades to Mission Critical Systems
Afternoon tutorial of the first day at MySQL User's Conference is Andrew Cowie talking about Mastering Changes and Upgrades to Mission Critical Systems. Andrew is a pretty engaging speaker. He's got some interesting papers and presentations, including a paper on recovering systems from disaster in lower Manhatten in September 2001.
I certainly didn't capture everything that Andrew said, but got some of the interesting points.
We need to think about changes and upgrades as a life cycle. Every so many years things will be replaced. Software upgrades, deploying patches, replacing hardware. It's not a rare curcumstance that we have to do these things, they can be planned for.
Our systems as a whole are very complex, and are often required to be up at all times, with nothing going wrong. The longer a system can't get refreshed, the more of a problem it becomes. When a cluster goes live and is working, it's hard to justify taking it down to make a configuration change or update to the code. The problem is that the longer you postpone doing something in the system the more risky it gets to do something in the system and more likely that it will break.
To look at how changes get made, it's important to think about:
Organizational Blueprints -> Process -> ProceduresProcedures are the actual steps executed, processes involve policies.
Andrew argues that when we're talking about management of our systems we're not in the technology business, we're in the operations business. We shouldn't be looking to programmers to learn how to best get things done, we should be looking to places like NASA for best practices on designing, documenting and carrying otu procedures. In IT there is a lot of ad-hoc, not following procedures. In mission critical systems we should look to experts in operations.
One mistake that techies makes is that they give too much procedural information to the manager, and not explain things in the context that a manager will need to make a decision. When making procedures you should provide an executive summary, especially before embarking on some risky change.
Beyond the summary, you need to put together a list of steps, organizes into sections and tasks.
Steps must be done concurrently, but named items within the steps can be done simultaneous.
Times are critical, and making a big picture estimate is good. But to assign specific times to all the small tasks is kind of pointless (Andrew doesn't like MS Project because everything must be assigned a date).
Andrew recommends good versioning, and not keeping things in Word docs that don't get versioned. Wiki is on his list of recommendations.
Senior people shouldn't be doing tasks all the time. Devolution should be happening, where tasks get moved to more junior people when they've been ironed out.
You do not need some large document for disaster recovery, you need people who have experience with your system who can think on their feet who have access to the process steps.
Posted by mike at 4:41 PM
MySQL 5.0 In Depth - Stored Procedures, Views, Triggers and Cursors
The first session of the MySQL User's Conference (for me) is a tutorial on MySQL 5.0 features by Jan Kneschke, a MySQL trainer from Germany. Jan is going to be working on a new super-secret project called Merlin, being announced tomorrow. I've been writing a lot about the 5.0 features and am interested to see if there's anything I don't understand properly. Not attempting to get everything, just things I think might not be clear from the documentation on mysql.com.
Hmmm. Looks like 5.0.4 was released Saturday, will have to upgrade today.
A lot of MySQL users both program and manage the database. When moving into MySQL advanced features it becomes more important to have a person who just does the database stuff.
Stored Procedure PurposeFor handling business logic at the database level, reduce network traffic and have application code in the database. There are security benefits for putting data access behind a procedure.
Usage ScenarioA few things where stored procedures can be very helpful. Recursion, like in a tree, is a good use for keeping logic in the database.
Jan gets into a demonstration of recursion, and goes through the concept of a merge sort (break large set of data into smaller pieces, sort them, and then merge them together comparing the first items in each of the smaller pieces and stacking them up together).
Business logic abstration is also a good use, allows for strict control over access to the procedure as well as strict checking of input parameters.
SyntaxMySQL uses SQL:2003 standard for creating, managing and using stored procedures.
Stored Procedures and Stored FunctionsExternal functions (UDFs) written in C have been available since MySQL 4.1. With C functions you can access everything in the system but not in the database. With internal functions, written in SQL, you cannot access the system but can interact with the data. Internal functions are available as of 5.0.
The main difference is that a procedure returns records, a function returns a single value.
The CREATE FUNCTION systax is used for both internal and external functions. If you're creating a UDF you include SONAME in the definition, if you're creating an internal procedure you leave out the SONAME syntax and put the statements in the body.
Neither stored procedures or functions are compiled in the database.
CursorsCursors are a pointer to a result-set. The FETCH gets data from the current row. In the case of recursion, the cursor can remain open while the procedure calls itself.
To get out of a cursor loop you need to DECLARE a HANDLER. You do not check to see if a FETCH fails, but let the HANDLER catch the NOT FOUND exception which sets a value that tells the loop it's done.
SP ExampleJan looks at a recursion procedure that does recursion to find a tree of data. The tutorial handout/booklet has printouts of the procedure, and all procedures from the presentation.
Issues with SPsProcedures don't scale. If you have man complex operations in stored procedures they will add to MySQL becoming a bottleneck. Procedures are executed in the context of the database. Procedures should be focused on the data.
With replication, SP and trigger create statements aren't replicated. If the slave has the same SP and triggers the insert or update statements that trigger the trigger will run on the slave as they did on the master. The replication of the create statements will be fixed soon.
Errors in procedures are hard to find, there is no debugger. Procedures can cause dead-locks.
Stored procedures must be written internally in SQL, no support for external languages. There is hope that down the road there will be support.
Currently you cannot generate a SQL error from within your stored procedure. What I've done is use an INOUT parameter @error that the application must check before moving on.
As of 5.0.4, there is no limit on the number of cursors that can be opened. If you go out of control on a recursion, or have a lot of data to recurse over your MySQL server will happily open new cursors until the database server crashes because there's not enough resources.
Lock Problems with SPLocks are taken when the function is called. So if you have a function that does a SELECT, a read lock is obtained. But if you call a function within that to do an INSERT, you have the wrong lock on the table.
The locking issues, found by Jan recently, will be fixed soon.
TriggersTriggers, for Jan, are kind of derivative of a SP. They use the same syntax for the body, but are created with language to tell the database when they should be run automatically. Jan looks at an example where when an email address is inserted, a trigger reverses the address for indexing purposes.
Currently there is no way to see if a trigger applies to a statement sent to the database. One would think that there would be something like EXPLAIN that would show that a part of the statement includes running a trigger.
Don't use triggers just for fun, or for something complex. Jan suggests that a trigger should not be used for generating content in the database, for putting business logic into the database or for calling SQL statements that take a long time to complete. Triggers should be used for small data integrity checks or changes.
A rollback will work even if a trigger has been involved in the update.
ViewsAre excellent for limiting visibility of columns in JOINs or limiting to certain rows. Also good for hiding new tables structures from legacy applications.
Views are created with a SELECT statement, which can get data from tables or views. Permissions can be granted separately for a view, which gives you ability to easily grant on columns across tables.
Using ORDER BY in a SELECT against a view will replace an ORDER BY that was.
Algorithm in ViewsMerge allow is more efficient and updatability. It merges the CREATE VIEW statement with the SELECT view statement and executes (essentially creates a new columns and where clause based on the two statements). Merge doesn't work with functions and certain keywords (DISTINCT, GROUP BY, HAVING). In these cases a TEMPTABLE is used.
Views are UpdatableThe algorithm must be MERGE, cannot have JOINs or subqueries and cannot have an updatable view in the FROM clause. In addition, an insert can happen only if there are no duplicate view column names and the column names are pure (no database prefix).
Actually, if the update only affects one of the tables in a join, it can be updated.
Posted by mike at 11:55 AM
Table-Reference Restriction Lifted from Functions and Triggers in MySQL 5.0
There is no longer a restriction accessing tables from within MySQL stored functions (the internal ones build with SQL statements) and triggers. The MySQL documentation still has the phrase about it, but Jan Kneschke just said (in his tutorial on 5.0) the documentation isn't up to date.
I tried referencing a table in a function in 5.0.1 or 5.0.2 (they blur together now) and it definitely didn't work, didn't try it in 5.0.3 because I didn't see it in the documentation. I just tried this in 5.0.3 and am pleased to report that Jan is correct:
delimiter //A SELECT using this function works as it should:
create function get_customer_count () returns INT
begin declare customer_count INTEGER;
select count(*) from customer into customer_count;
mysql > select get_customer_count();
| get_customer_count() |
| 300000 |
1 row in set (0.97 sec)
Posted by mike at 11:14 AM
I'm always optimistic that a conference will lend to enough free time to squeeze in a morning run. As I'm in California (Santa Clara) for the MySQL User's Conference I figured I'd at least bring my shoes and shorts.
Lo and behold, at 5am I could not longer sleep due to the time difference. I got up and got dressed in my running clothes, but found that it was still very dark. I stretched, read the paper, did a little writing and at 6:30 it was light enough to head out. I thought California would be warm, but it was pretty chilly out.
I ran from the hotel down past Misson College to a very large Yahoo! building (it has a Yahoo! sign on it, but I think the main "campus" is in Sunnyvale). I have no idea how far that is. I'd guess 2 miles round trip. I got through around 30 minutes of music so I figure somewhere around that.
On the way back into the hotel lobby the bell-hop handed me a towel wrapped around a bottle of cool spring water. How cool is that! I don't travel a lot, and don't always run when I'm staying somewhere, but I've never had that happen before.
Posted by mike at 10:53 AM
April 17, 2005
Day in San Francisco
Today I flew into San Francisco for the annual MySQL User's Conference, being held in Santa Clara. I've never been to San Francisco so I detoured north from the airport too see the city. To me it's always exciting to see things you've only heard about or seen in books, TV or movies. I'm a firm believer in experiencing things in person.
The only place I really "visited" was Fisherman's Warf, which is interesting but I think I'd agree with the literature that suggests it doesn't really represent San Francisco, but feels like any other tourist destination filled with street performers (magic shows, silver robots, drummers, pan-flute players), souvenier shops, people soliciting you to join their tour boat or bus, a specturm of people unaware of how socially akward it is to pass them on the street, etc. I did find an In-N-Out burger joint, which I've always wanted to experience.
I walked the entire stretch of Fisherman's Warf down to Aquatic Park (stopping to look at Alcatraz and the Golden Gate Bridge) and then walked south up Hyde Street. There was an open house on North Point St, I was tempted to take a tour and see what it was like but decided I'd rather be walking around outside.
After walking around for a few hours I went back to the car to start the drive to Santa Clara, but was too tempted and decided that after all the documentaries I'd seen on the construction of the Gloden Gate Bridge, I just had to drive across it. So I did. If I had been in the right lane as I was approaching from the south I would have gotten off at the park and taken some photos, but I was all the way in the left lane. On the other side of the bridge I got to the right side and was able to get into the north-side vista, which was more about the view of San Francisco than of the bridge, but I took photos of both.
Coming back I went through San Francisco on Lombard St and then down to Route 101 on Van Ness Ave.
The trip to Santa Clara was uneventful, but I did start to get a sense of what people mean by "the valley". More thoughts on that later.
Posted by mike at 11:09 PM
April 16, 2005
ucasefirst Function in MySQL
I've been playing with stored functions in MySQL 5.0.x since the first alpha was released. If you're using 5.0.x and have always wanted a function that will return a string with the first character in caps and the rest in lower case, here it is:
create function ucasefirst (phrase varchar(255)) returns varchar(255)The restult when calling:
mysql> select ucasefirst("tHIS will TEST the fUnction.");This function is common in most scripting languages I've used, although might be named something else; ucfirst, upperfirst, firsttoupper, firstuc etc. So why not in MySQL? Simple to do with stored functions.
| ucasefirst("tHIS will TEST the fUnction.") |
| This will test the function. |
1 row in set (0.00 sec)
The cool thing about stored functions in MySQL 5 is you can build functions right in the MySQL client, that operate just like the built in functions. It's no longer necessary to write code in C like you did with pre-5.0 UDFs.
Posted by mike at 12:32 AM
April 15, 2005
Off to MySQL 2005
Well, tomorrow I'm off to the MySQL 2005 User's conference in Santa Clara. The last time I was in California was OSCON 2002, before that was back in the early 90s. I have never been to San Francisco (other than for a connecting flight) or the bay area so am excited to see what it's like. I've certainly read a lot from the weblogs of folks who are working out there.
The conference agenda looks excellent, looking forward to the tutorials and presentations.
I'm also going to be maximizing free time to make some significant progress on the book. We're moving into the 2nd drafts (having gotten them back with comments from a technical reviewer and the editor). Unfortunately, I haven't finished all my first drafts.
Besides attending the conference, and spending long hours writing, I plan to spend at least a few hours in San Francisco (probably Sunday before I can check into the hotel). I also plan to go up and meet a few folks at Apress office in Berkeley. Oh, and I thought I'd try to get up to SBC park on Wednesday night and see the Giants play.
Posted by mike at 5:40 PM
April 7, 2005
CORDRA for Discovery of SCORM Content
Dan Rehak, from Canegie Mellon University, is delivering a presentation at the annual MedBiquitous conference about searching for SCORM content. How do we enable a Google-like search to find reusable learning objects?
SCORM is about content objects and meta data, but not about discovery and access.
Places where you might go to discover? MedEdPortal, PubMed, NLM, HEAL, RSNA, IVIMEDS, Meld, Medianet, PHIL.
Learning content should be available, easy to find, uncoupled from courses, accessible seamlessly, reusable, subject to rights management and managed.
Repositories hold items, registries are interfaces for content discovery, federations are collections of repositories and registries.
There are a lot of pieces of content, but there isn't a single point of access. How do we build to the larger community?
CORDRA is an open, standards-based model for how to design and implement software systems for the purposes of discovery, sharing and reuse of learning content through the establishment of interoperable features. The CORDRA model has many classifications of registries, and groups all of these registry types into one global interface.
CORDRA is a model for creating federations, and is on the way to becoming a working system.
A few issues:
- how to enable one-stop search (esp if the meta data is in different standards
- how to rank items
- federated search isn't scalable (CORDRA solves by federated search of the registry)
There's a demo site running to play with.
Posted by mike at 1:01 PM
RSS Syndication of Medical Content
At MedBiquitous 2005 listening to Howard Tanzman from the American College of Surgeons talk about how they are using RSS in a professional society.
MedScape offers a lot of RSS feeds. There are a lot of health weblogs as well.
Allan Bell from HighWire Press follows with a brief look at aggregation in special applications, web services and browsers. Allan shows NetNewsWire as an example of a good aggregator.
After I responded to a question about if there are ways to deliver RSS to an email inbox (yes you can both get and create RSS feeds via email), a man in front of me asks if there are aggregators for cell phones and PDAs. Yes, there are for both. Yahoo! has feeds for any WAP 2.0 cell phone browser and there's quite a collection of RSS readers for both PocketPC and Palm.
Posted by mike at 10:50 AM
RSS Feeds for Medical Education
I've just noticed that there is a lot of talk about RSS in the medical education community. I'm attending the annual MedBiquitous conference and have heard several people talk about the adoption of RSS for enhancing medical education.
One example, medical publishers are using RSS to feed information about the latest medical journal articled to subscribers.
I've thought about RSS for TUSK, our learning and content management system at Tufts University. At a mimimum, we could provide a student with a feed that has his or her announcements and timely information about evaluations, quizzes, grades etc. Even more advanced would be a feed of content that is made popular by their peers, or content that has been recently added or changed.
Funny I should mention it, looking at the day's sessions I see that there is a presentation later today that is dedicated to using RSS.
Posted by mike at 9:38 AM
Content Collaboration for Medical Education
This morning's first session at MedBiquitous is about Content Colaboration. The panelists are:
David Davies, IVIMEDS
Sandra McIntyre, HEAL
Grance Huang (Harvard), MedEdPortal
Marty Nachbar (NYU), COMET
IVIMEDSFirst presentation is from David Davies from IVIMEDS talking about how IVIMEDS is pulling together content from many schools, organizations, publishers etc.
HEALSecond presentation is Sandra McIntyre is from HEAL (Health Education Assetts Library). HEAL runs on SQL Server, but is being modified to run on MySQL. Have about 20,000 items in the repository. HEAL is both a web site and a local installation. The local installation is called HEAL_local, scheduled to be released in the fall. It will allow you to install HEAL in your library and use it to both store your own assetts as well as subscribe to the central HEAL repository.
HEAL has a peer review, and the reviewers are not anonymous.
New Collaborative Groups to HEAL
George Washington Partnerships for Training
Build the HEAL Metadata schema from IMS, but are moving toward LAM.
MedEdPortalThird presentation of the session is from Grace Huang, who's talking about MedEdPortal, a project from the AAMC.
(a few notes, doing more listening that writing)
MedEdPortal is coming up with a new search, that isn't just generic.
COMETThe fourth presentation of the session is Marty Nachbar from NYU talking about the COMET group, which is a collection of a dozen+ schools that are working in collaboration to build better surgery tools. The focus is on the virtual person, students don't see enough people.
Marty highlights some of the things that are happening at the different schools.
Posted by mike at 8:46 AM
April 6, 2005
Evening with the Orioles at Camden Yards
Since I was in Baltimore, and staying not far from Camden Yards, I got a ticket to tonight's game between the Baltimore Orioles and Oakland Athletics. It was a lot of fun.
The fans aren't quite as passionate in Baltimore as they are at Fenway Park, but I felt closer to home when the fans jeered at their own shortstop for not making enough effort to grab a grounder. I had planned on getting a hot dog in the park, but a vendor right outside was selling them 2 for $4 with a soda and I was pretty sure that inside the ballpark $4 would only get half a hot dog so I ate outside.
Camden Yards is a really nice facility. Designed to be like the parks built in the early 1900s. I had pretty decent seats, $27 will get you much closer to home plate in Baltimore than it will in Boston.
It was such a nice experience that I'm looking at trying to get up to see the Giants when out in San Francisco area for the MySQL 2005 conference.
Note about the photo: I took the photo using my Canon PowerShot s400's panorama feature, which shows you the right edge of the previously taken photo on the screen along with what's in the lense. This allows you to turn the camera to the right and line up the next shot with the previous photo's edge. It still took a bit of Photoshop to smooth out the seams and make the lines match up. It would look so much nicer on a wide-screen monitor, the 12" PowerBook screen is quite lacking.
Posted by mike at 11:38 PM
Virtual Patient Panel
Listining to a panel of speakers talking about their work with patient simulation at MedBiquitous 2005. The panel is led by Chris Candler of AAMC.
Virtual patients is about helping the learner interact with patients by training with onnline. AAMC is interested in promoting the sharing of virtual patient cases.
Grace Huang, Harvard Medical School
Hemal Thakore, University College Dublin
David Davies, IVIMEDS
Marc Triola, New York University
James B. McGee, University of Pittsburgh School of Medicine
Rachel Ellaway, University of Edinburch
To begin with, there is disagreement on how people define a virtual patient. The panel proposed that it's a a set of human characteristics in data rendered in an application. Could be a medical record, a mathematical model for prescriptions or an actual clinical encounter. Medical students are getting less time with actual patients, so the hope is to give students encounters with more patients with certain characteristics or medical conditions.
The difference between the case and patient? A case will have one or more patients, a patient doesn't always have a case. It might be a medical record where there is no case interaction.
Rationale for using virtual patients? Instead of presenting the content, you present the problem and get a lot more validity and retention. The LCME (medical accredation) demands documentation of exposure to a certain number of clinical cases. In cases where medical training is for war, there's no way for a medical student to get hands-on training so virtual experiences is important for them before they are sent into a conflict.
(numerous other examples, too fast to type)
AAMC is trying to gather information from all US, and eventually internationally. The MedEdPortal is the gathering place for this information.
Now we turn to the technical details of sharing virtual patient information. James McGee is talking about the XML Schema they've developed at University of Pittsburgh School of Medicine for representing virtual patient information.
Hey, that's TUSK on the screen (the project I work on at Tufts University). James is talking about how the data from the Pitt database could be exported in a valid XML format and used in TUSK's case tool.
David Davies is giving a demo of the virtual patient tool from IVIMEDS. The virtual patient can be exported in IMS Content Packaging format or in SCORM. The IVIMEDS system is written in Java on MS Access (seems like an odd choice). The authoring tool is Access forms.
This is cool, the virtual patient allows you to define localized terms and have them specialized when rendered.
Last person on the panel to talk is Marc Triola who's showing SIMM (Surgery Interactive Multimedia Modules) from NYU. SIMM is web-based, built on Apache/Linux backed with MySQL. They have a team of 3-D sumilators. SIMM is defined with an XML format, designed for structure not for function.
Posted by mike at 11:10 AM
Medical Professional Competence and Patient Safety
Wendesday mornings second presentation at MedBiquitous 2005 is Carolyn Clancy, the Director at the Agency for Healthcare Research and Quality. Carolyn is talking about patient safety and is going to tie it into how standards and IT relate to helping medical professionals not make mistakes.
Safety has become quite an issue for the public, people wanting to bring majic markers to the hospital to mark the area that's a problem so there won't be an error. Of people surveyed 1/3 have been or have a family member who've been involved in a preventable medical error.
There are some challenges in trying to solve this problem:
- adverse events are relatively rare
- since things change over time, it's difficult to isolate the key components
AHRQ has invested about $160 million since FY 2001 in a Patient Safety Initiative. Numerous initiatives undertaken, both in research and in proactive measures.
One major finding fron the research is that serious medical errors fell when interns work shifts were reduced from 30 hours to 16 hours. Chance of an accident was also twice as likely when interns were driving home after a 30-hour shift.
One of the primary findings in the research was that medical information wasn't being exchanged, and that better IT was needed to make sure medical professionals were able to access information from prior visits with physicians. A significant number of respondents indicated that during a visit the physician had to go back to the records office to try to find more information on the patient.
Presciption issue technology also needs some IT improvement.
Posted by mike at 9:44 AM
Enabling Collaboration for Healthcare Education
The 2005 MedBiquitous conference gets started this morning with an update from Peter Greene, the Executive Director of MedBiq and Associate Dean for Emerging Technologies at Johns Hopkins University School of Medicine.
Peter is giving an overview of many of the MedBiquitous efforts. I'm engaged in listening (and writing about yeterday's SCORM meeting) and am not catching everything. A few of the main points.
MedBiq has gone through the process of becoming a authorized standards setting body via ANSI.
A lot of work going into online teaching and trying to create a virtual experience that closely represents the real experience interacting with patients. HL7 is an existing standard for clinical data.
The Medical community has made contributions to SCORM 2004 to make sure that content can be packaged with all the relevent information necessary in health education.
A standard way of describing and reporting professional education and maintenanace of certificates. MEMS (Medical Education Metrics) is a group focused on standards for reporting.
Need to have a technical format. Working with IEEE Reusable Competency group.
A common format for exchanging information about healthcare professionals (including education, training, cerfification, memberships etc). Having this information in a standard facilitates cross-organization collaboration.
A Number of Guideline Documents have been created:
- Single sign on using SAML (Security Assertions Markup Language).
- RSS Syndication to say what's new in a news feed
How are MedBiq Standards Developed?
- ANSI process which means opennes, transparency, consensus and due process
- Focus on near term, achievable goals
- Partner with industry leaders
- Build on existing standards
- Partner with other standards groups
Posted by mike at 9:16 AM
Best Practices for Developing Reusable E-learning Content Using SCORM
Yesterday I attended a pre-conference tutorial on using SCORM to create sharable pieces of content. The presentation was done by Nina Pasini Deibler from Carnagie Mellon University. She's an instructional designer but has been involved in the process of moving SCORM towards a standard and is quite familiar with the technical details of the specification.
Why SCORM? If you develop content and store it in SCORM format you can drop it on a CD and play it in any of a dozen SCORM players. You can also give it to any organization and as long as it's passes the verification tests you can be assured it will play in any other
SCORM 2004 is the standard to be using. The major addition to the specification is sequencing, the ability to create dependancies between different content objects.
The tutorial was a nice mix of technical information, content design and talk about organizational uses for properly packaged content. It was pretty informal and included a few exercises where we got into groups and had to think through the process of creating some SCORM content.
Posted by mike at 9:03 AM
April 5, 2005
In Baltimore for MedBiquitous
Flew down to Baltimore, Maryland this morning for the annual MedBiquitous conference. MedBiquitous is a consortium of organizations focused on creating standards to advance healthcare education.
Today's pre-conference seminar on using SCORM (Sharable Content Object Reference Model) gets underway at 1. We've been eyeing SCORM for a long time as a potential way to exchange information between our external installations. Interested in seeing it in action.
Taking some time in the hotel lobby to get some work done before heading over.
Posted by mike at 9:39 AM
April 4, 2005
AppleScript to Get Around Unregistered Airfoil
Over the weekend I was listening to the audio from a series of presentations available as streaming Windows Media. When I've done this in the past I've plugged my laptop into the home stereo so I'm not glued to a headset somewhere.
It seemed like foolishness to have to plug in, when the Airport Express was already there and plugged in. With a little searching around I found Airfoil. Airfoil is a small application that will let you stream any audio (not just iTunes) to your Airport Express. It works very well, but is not free. If you don't pay the $25 registration fee, Airfoil will stream 10 minutes of audio and then add static into the stream to make the stream inaudible.
At first I thought about paying the $25, but since it's rare that I need to stream anything other than iTunes I decided it was a bit over my price range. I decided that for the few hours I was listening, I could just stop and start the stream every 10 minutes. This got kind of annoying too as I was often not at my laptop when the static started up.
After a handful of manual restarts I decided this was a job for AppleScript. What I needed was a script to press the button twice every 10 minutes, once to stop the stream and another to start. Having never written an AppleScript before I did some poking around and found that there are a nice collection of sample scripts installed with OS X that I could pull from to do just what I needed. I ended up with this short script that activates Airfoil, holds down the command button and presses "t" (for transmit off), holds down the command and presses "t" again (for transmit on) and then waits for 10 minutes before looping back to do the same thing:
tell application "Airfoil"This works really well, if you are willing to have a few seconds of silence every 10 minutes while Airfoil stops and starts it's broadcast. If that's not acceptable . . . it's probably worth your $25 to register Airfoil.
tell application "System Events"
if UI elements enabled then
tell process "Airfoil"
set frontmost to true
keystroke "t" using command down
keystroke "t" using command down
Posted by mike at 2:55 PM