Monitor Your Mesos Cluster with StackState

This post is part 2 in a 4-part series about Container Monitoring. Post 1 dives into some of the new challenges containers and microservices create and the information you should focus on. This article describes how to monitor your Mesos cluster.

Apache Mesos is a distributed systems kernel at the heart of the Mesosphere DC/OS and is designed for operations at very large scale. It abstracts the entire data center into a single pool of computing resources, simplifying running distributed systems at scale.

Mesos supports different types of workloads to build a truly modern application. These distributed workloads include container orchestration (like Mesos containers, Docker and Kubernetes), analytics (Spark), big data technologies (Kafka and Cassandra) and much more.

>>> Read the full article right here.

The Container Monitoring Problem

This post is part 1 in a 4-part series about Docker, Kubernetes and Mesos monitoring. This article dives into some of the new challenges containers and microservices create and the metrics you should focus on.

Containers are a solution to the problem of how to get software to run reliably when moved from one environment to another. It’s a lightweight virtual machine with a purpose to provide software isolation.

So why are containers such a big deal?

Containers simply make it easier for developers and operators to know that their software will run, no matter where it is deployed. We see companies moving from physical machines, to virtual machines and now to containers. This shift in architecture looks very promising, but in reality you might introduce problems you didn’t see coming.

Read the full article on http://blog.stackstate.com/the-container-monitoring-problem

Automate incident investigation to save money and become proactive

How many hours did your best engineers spent investigating incidents and problems last month? Do those engineers get a big applause when they solved the issue? Most likely the answers are “a lot” and “yes”…

The reason that problem and incident investigation is hard, is because usually you have to search through multiple tools, correlate data from all those tools and interpret this data.

Click here to read the full post.

Why don’t monitoring tools monitor changes?

Changes in applications or IT infrastructure can lead to application downtime. This not only hits your revenue, it also has a negative impact on your reputation.

Everybody in IT understands the importance of having the right monitoring solutions in place. From an infrastructure – to a business perspective, we rely on monitoring tools to get us the right information.

Read more →

Let Operational Analytics improve your business

Products and services are getting smarter. The Google Car can drive itself. Your phone knows how to take the best selfie and it even tells you when to leave to be on time for that important meeting. The systems that run these services are able to use and understand data in a very smart way. Now it's time for IT operations to get smarter.

Today's DevOps teams lack the ability to use data from different systems in a smart way. They don't have advanced, data-science-driven technologies to see what's happening in their stack, to see what changed, to trouble shoot on issues and to understand the relations and dependencies between all the applications and systems in the stack.

All DevOps teams are experiencing the same problem - there is too much data, too many complicated graphs, too many alerts and dashboards from different tools with too few insights. Understanding your operations can be critical to business success. The role of Operational Analytics tools is to automatically detect, fix and eventually prevent problems. In this article, I will explain what and how Operational Analytics can supercharge your IT Operations teams to stay ahead of the game compared to your competitors.

Read more →

Our Answer To the Alert Storm: Introducing Team View Alerts

As a Dev or Ops it’s hard to focus on the things that really matter. Applications, systems, tools and other environments are generating notifications at a frequency and amount greater than you are able to cope with. It's a problem for every Dev and Ops professional.

Alerts are used to identify trends, spikes or dips in your metrics and events – for example to detect low free memory, high page-fault errors or unavailable database servers. With the right alerts in place you can get notifications or signals of problems before they escalate or respond quickly before it takes a business service down which could affect your customers.

But most companies don’t have the right alerts.

When problems occur, they have to manually correlate all alerts, metrics, events and log files from different tools to get contextual information and to understand the problem they are dealing with. How do you know which alert you have to focus on and which not?

To read the full blogpost, please visit blog.stackstate.com.