Performance Monitoring - A Thirty Day View

I remember when I saw the movie "Super Size Me" by Morgan Spurlock.  I recall thinking about how much can happen in the course of just thirty days, not to mention how gross that food looked.  For people, 30 days seems like a long time at first, until you look back on what happened over the course of those 30 days.  It's easy to forget things that happened a week ago, not to mention a month back.  At CopperEgg, things move so fast that I really have to think hard to recall what we were focused on 30 days ago in our performance monitoring product development process.

Today, I was surprised by something really interesting.  First, let me ask this question: do you recall what you did to your business or product 30 days ago?  How about your website?  I bet you can't.  I bet you would have to go back to your email, or a code repository, or a calendar to find out.  What if something you did *30 days ago* was just now starting to cause you trouble?  Would you know where to look?

Well, one of our customers recently had this *exact* issue.  It turns out, that a very small change can make a big difference over the course of time.  In fact, it took weeks to notice it.  Here's a performance monitoring screen shot of what their memory usage looked like on RevealCloud Pro’s 1 hour view:

1 Hour:

1 hour graph

Seems pretty flat - everything looks great. Now, If we zoom out a bit, to a day, here's what it looks like:

1 Day:

1 day graph

Still looks fine.  This system should be quite stable over time.  Right?

Let's look at a week:

1 Week:

1 week graph

Hmm. Maybe something is eating a little memory, but, hard to say.

How about 30 days?

1 Month:

30 day graph

Holy smokes!  Something is leaking memory in bad way!

Wait a second - we had to look at a 30 day time span to notice that!  Just think how much has happened on this system in 30 days - code deployment, logs, bug fixes, etc.  Without performance monitoring and a historical view of this information, there would be NO WAY to know when the problem started, or that there was even a problem occurring slowly.  This system would likely have simply crashed, without anyone knowing why, and restarted, only to happen again.  

The real issue would have lurked for many more months, causing outage after outage, until someone finally decided to try to track this down.  Think of the lost clients, or customers, or data, or whatever your site does - this can be a real nightmare.

At CopperEgg, we really think collecting performance monitoring data and keeping it for historical analysis is a critical requirement for anyone deploying applications or services, whether that is in the cloud, in a private datacenter, or on in-house systems.

Related Stories

Leave a comment

Alternatively

This will only be used to quickly provide signup information and will not allow us to post to your account or appear on your timeline.