Implement logging in your applications with Log4[insert-language]

In my Information Architecture class we’ve been discussing the benefits of logging and best practices for implementing logging in your applications. Because logging seems to be such a tedious task, it is something that is often overlooked or neglected in custom software development projects.

It seems that it is too much work to open up a handle to write to a log file just to record that an event happened on your server. Add the complications of rolling log files to older versions as soon as you log files reach a certain size, and it seems that logging events in your application is overkill. I used to think this way until I was introduced to Log4J and its ports.

Log4J is a Java library that was written to solve all of your logging needs. It provides a simple interface for logging messages as easy as logger.debug("Error encountered..."); which can be configured to output to a log file, an SMTP email message, a database record, a network socket, a console, or even a Jabber client for instant message alerts.

Log4J offers different levels of debugging, which allows developers to log debugging information all over the application, and then simply switch off that level of log messages with a simple configuration file.

The levels of logging include:

  • Trace
  • Debug
  • Info
  • Warn
  • Error
  • Fatal

If the configuration file is set to trace-level logging, then messages will be logged for everything below it as well. Debug-level logging will ignore trace statements, and info-level logging will ignore both trace and debug.

Making log calls at the different levels is extremely easy. It’s just a matter of calling the right method on your logger object. Here are some example calls that can be used:

logger.trace("Use this instead of print lines");
logger.debug("SQL: " + sqlStatement + "Executed in: " + executionTime );
logger.info("invalid login attempted at: " ipAddress);
logger.warn("a warning statement");
logger.error("could not connect to socket");
logger.fatal("application crashed");

This type of logging is pretty simple and straight forward.

Another really cool thing about Log4J is that you can specify the format of your log messages in the log file. If you want to capture timestamps, class name and line number of a log message, etc. you can set that up all in the configuration file.

Log4J can also automatically roll your log files at certain size limits and set to only keep X number of rolled log files so that you don’t eat up your entire hard disk and crash your system.

Because Log4J has made logging so simple and easy, Log4* ports have been created for several other languages including C, C++, .NET, PHP, Ruby, Perl, and even JavaScript.

I wish I would have known about this years ago so that I could have taken advantage of easy logging in php and Java.

Mobilefish.com has posted a simple tutorial on using Log4J. I recommend checking it out.

Toys 2.0 – Mashups at their best

The ability to mash two or more concepts into a new product has sparked some really creative ideas that have received a lot of attention in the web 2.0 boom. Recently, toys have seen some thrilling success, as movie brands have been mashed up with traditional toys of the past.

My favorite mashup has been Star Wars mashing with Legos. I’ve always been a huge Star Wars fan, so naturally I get really excited when traditional toys get mashed up with Star Wars themes. I’m kind of jealous of kids now days, although I have a 1366 piece Imperial Star Destroyer sitting in my spare bedroom, waiting to be assembled.

The mashup concept was taken to a new level as Lego Star Wars: the video game was released. I don’t play many video games, but when I visit our in-laws, I usually have a good time playing some games with my brothers-in-law. Lego Star Wars I and II bring together 3 great concepts into an extremely creative, fun game.

My sister sent me a link this morning to a new mashup, Spider Spud. Wow! This one is awesome! I already have Darth Tater and Spud Trooper, but I’ll probably have to get this one too.

Spider Spud

I may have just revealed too much of my underlying nerdiness, if I hadn’t done so already with this blog. In case you’re wondering, yes, all of my action figures are cherry.

MySQL Optimization Video

Jay Pipes, a MySQL employee, gives a lecture at Google on MySQL Optimization. I’ve been in several discussions on how to optimize large queries, especially during my time working on TagJungle*.

This video has opened my eyes to some query practices that I used to do that are sub-optimal, and how to correct those problems.

One of my favorite optimizations that Jay Pipes demonstrates is that you can actually run a query like: SELECT * FROM users WHERE email LIKE '%@gmail.com'; and have MySQL use an index to help perform that query. It’s really quite clever, and I used to think that you were just out of luck with a slow query if you ever wanted this type of information. I would post the solution to this problem here, but the video is beneficial enough to just recommend watching the entire video.

Another big eye opener for me was the concept of Correlated Subqueries (which aren’t a good thing to do). Here’s an example of a correlated subquery:

SELECT name,
(SELECT COUNT(*) FROM Reservation WHERE confirmed<>'N' AND campId=C.campID) AS Confirmed,
(SELECT COUNT(*) FROM Reservation WHERE confirmed='N' AND campId=C.campID) AS Unconfirmed
FROM Camp C

This is bad because each of the subqueries must be executed for each camp record. This makes it so the query doesn’t scale well as the number of camps grow.

To fix this problem, it is suggested to think of these types of queries in sets rather than as procedural operations. The above query could be executed like so:

SELECT C.name, Confirmed, Unconfirmed
FROM Camp C LEFT JOIN
(SELECT campId, COUNT(*) as Confirmed
FROM Reservation WHERE confirmed<>'N'
GROUP BY campId) AS ConfTable
ON C.campID=ConfTable.campId
LEFT JOIN
(SELECT campId, COUNT(*) as Unconfirmed
FROM Reservation
WHERE confirmed='N'
GROUP BY campId) AS UnconfTable
ON C.campID=UnconfTable.campId;

This is provides much faster execution as only 3 lookups need to be made to the db and then just joined together. This will scale up as more records are added to the database.

Here’s the video:

* Disclaimer: I no longer am an employee of Tag Jungle or 42Co.

Club Penguin – Safe for kids?

My sister alerted me of a new Internet game that is gaining popularity among kids called “Club Penguin.” In a sense, it’s a virtual world much like Second Life, except instead of being whatever you want, you’re constrained to being a penguin.

Within Club Penguin, you can buy items, such as pets, clothing, food, etc, and get jobs to work for money. You can send chat messages to the people who are in your virtual world. You can throw snowballs, dance, etc. You can add a particular penguin to your buddy list and send that penguin emails.

It has won lots of “awards” for being Kids Safe, but I still don’t trust it. Even though there is live moderation and message filtering, it’s the same game where people try to break the controls that are constraining them. Someone is going to find a way to infiltrate the system.

I decided to take a look at the security behind the application itself. I found that Club Penguin was sending messages over port 6112 in plain-text. This was surprising to me as I though there would at least be some encryption involved.

I did some simple packet sniffing on port 6112 to see what I could discover about the messaging protocol being used. At first, it looked a bit cryptic, but later discovered that there were some simple codes being used such as np, rp, sp, and sm which represented “new player,” “remove player,” “move player,” and “send message.” There is a user id tied to each of these actions, and it gives coordinates on the current map.

Anything that is said can be easily read in plain text. For example, I sent out the following message, “anyone hear me?” and the transmission that I was able to sniff was: %xt%m%sm%18%23348762%anyone hear me?%.

What does this mean? Any predator in your neighborhood can just tap in and listen to port 6112 and see who is playing Club Penguin in the neighborhood. He/she can watch everything that goes on, spoof the Club Penguin server, and send un-moderated, un-filtered messages to your child.

Scary!

My advice is to watch what your children are doing online and become informed of potential dangers. There are people who would like to exploit this game. Several visitors have arrived at this post searching for “how to hack club penguins database,”  “club penguin packet sniffers,” “club penguin 6112,” and “clubpenguin mail spoofing.” I’ll let you be the judge.

[Update: I've closed comments because of vandalism by people not mature enough to leave respectful comments]

MySQL’s LOAD DATA INFILE

In my Information Architecture class, we’re doing MySQL query optimization, and are required to import data from an Excel spreadsheet into the appropriate MySQL database schema. Many people are trying to use NaviCat to do the import of the data, but are having some problems.

I propose a better way to go about doing this is exporting the data as delimited text files, and importing the data with MySQL’s LOAD DATA INFILE.  I first came upon this mysql functionality as I was working on building an International geocoder. I was importing millions of records in tab-delimited format from the National Geospacial Intelligence Agency.

I first tried importing all of this data with a php script, doing a few million insert statements. This was very slow, as I could only get 100 or so rows to insert per second. I remember it would have taken something like 16 hours to get all of the records inserted into the database.  It just wasn’t feasible.

MySQL’s LOAD DATA INFILE command handles this stuff with ease. You just point it to your csv file, define the data a little bit, and let it go. I was able to import all of the data from the geospacial tab-delimited file in minutes rather than hours.

Omniture Web Analytics Competition – We won!

Yesterday, Ben Robison, Ben Swanson, and I competed in the final round of the Omniture Web Analytics Competition held by BYU’s eBusiness Center. We were given access to analytics data inside of SiteCatalyst for CostumeCraze.com. We then were required to dive into the data, define Key Performance Indicators, find an area of the site that needs improving, and make recommendations based on analytic data findings. The grand prize was $5,000 to the winning team.

We worked really, really hard on this. In just the last few days before the finals, we spent at least 13 hours together as a team, and 10 or more hours individually putting together reports, screen shots, and trying to make our recommendations bullet-proof.

The competition went well. We had practiced hard, and we had anticipated and prepared for many of the questions that we were asked. My team was awesome. Both Bens contributed a ton of insight into the whole process.

We only got to see one other presentation during the finals, since we were the 3rd of 4 to present. The presentation that we saw was really good. I wish I could have seen the other two, because I heard those were really good also and focused on areas that we hadn’t even considered in our research/presentation.

Anyways, we live in the Information Era, and the web and web analytics are becoming increasingly important. I’m glad we had this experience, and I now feel confident enough to do SiteCatalyst implementations and analysis for anyone who needs it.

Learn more by doing

I’ve come to realize that you learn a million times more by doing than by just studying. This may seem like a no brainer, but I really believe that experience is one of the best teachers. This applies to just about everything in life.

A couple of examples that have really driven this home for me has been the current Omniture competition that I’m involved in, the Adwords campaigns that we ran in Paul Allen’s Internet marketing class, and my recent starting of a business.

This year is my first year actually competing in the Omniture Analytics competition. I’ve attended all of the past final presentations, but always thought I was too busy to compete. I’ve watched, taken notes, and studied the charts, findings, and recommendations that prior teams have given, thinking that I could learn to be an effective data analyzer by just studying what the top teams have done. I’m sure this has helped us in getting into the finals this year, but I’ve gotta say, actually diving into Omniture’s Site Catalyst application and trying to pull out some gems is harder than it looks.

Our team spent hours and hours during the preparation time before the preliminary rounds just trying to figure out which direction to go. We debated key terms, key performance indicators, success events, and such. It actually took a good amount of time to narrow what we wanted to look for. Once we put it together, it all seemed obvious, but it was hard to not get distracted by all of the cool data found inside Site Catalyst. We watched training videos, read knowledge-base articles, but still, applying what we had studied was a whole new ballgame.

Last semester, we ran an Adwords campaign for WorldVitalRecords.com during our Internet marketing class. I had read plenty about how Adwords works, but I had never actually done it. It almost seemed intimidating to actually spend money on ads, but actually doing it was one of the best experiences that I had that semester. I learned more from doing than from learning about Adwords through articles and case studies.

Starting a business has been another awesome learning experience for me. I have always been fascinated by start-up businesses, but until last year had no experience with an actual start-up. I was able to spend 8 good months at Provo Labs, learning about what it took to run LDSAudio.com, LDSLibrary.com, and WorldHistory.com. I learned a lot of valuable lessons from that, but because ProvoLabs was venture funded, I missed out on a lot of the bootstrapped start-up experience.

My time at 42Co was a totally different experience, as we worked to get proposals to clients for contract work, pulled several all-nighters, and later pursued venture funding for TagJungle. I’m no longer with 42Co, but the time I spent there was very valuable. I was associated with a great team, and I moved closer to my goal of becoming an entrepreneur.

Now, Brian Corrales and I are starting our own venture, and things have been great. I’ve stepped into a new realm of managing my own business. It feels great to take charge of my future. I think the uncertainty and risk that comes along with entrepreneurship brings a lot of excitement and feeling of accomplishment when things come through.

I guess the point of all this is that experience is a great teacher. I know I’ll especially come to realize this in the next couple of days as all we’ve studied about being parents will fly out the window and we’ll be learning by doing.