Why the data center needs an operating system (2022)

Why the data center needs an operating system (1)

Developers today are building a new class of applications. These applications no longer fit on a single server, but instead run across a fleet of servers in a data center. Examples include analytics frameworks like Apache Hadoop and Apache Spark, message brokers like Apache Kafka, key-value stores like Apache Cassandra, as well as customer-facing applications such as those run by Twitter and Netflix.

These new applications are more than applications, they are distributed systems. Just as it became commonplace for developers to build multithreaded applications for single machines, it’s now becoming commonplace for developers to build distributed systems for data centers.

But it’s difficult for developers to build distributed systems, and it’s difficult for operators to run distributed systems. Why? Because we expose the wrong level of abstraction to both developers and operators: machines.

Machines are the wrong abstraction

Machines are the wrong level of abstraction for building and running distributed applications. Exposing machines as the abstraction to developers unnecessarily complicates the engineering, causing developers to build software constrained by machine-specific characteristics, like IP addresses and local storage. This makes moving and resizing applications difficult if not impossible, forcing maintenance in data centers to be a highly involved and painful procedure.

With machines as the abstraction, operators deploy applications in anticipation of machine loss, usually by taking the easiest and most conservative approach of deploying one application per machine. This almost always means machines go underutilized since we rarely buy our machines (virtual or physical) to exactly fit our applications, or size our applications to exactly fit our machines.

It’s time we created the POSIX for distributed computing: a portable API for distributed systems running in a data center or on a cloud.By running only one application per machine, we end up dividing our data center into highly static, highly inflexible partitions of machines, one for each distributed application. We end up with a partition that runs analytics, another that runs the databases, another that runs the web servers, another that runs the message queues, and so on. And the number of partitions is only bound to increase as companies replace monolithic architectures with service-oriented architectures and build more software based on microservices.

(Video) a16z Podcast Why the Datacenter Needs an Operating System

What happens when a machine dies in one of these static partitions? Let’s hope we over-provisioned sufficiently (wasting money), or can re-provision another machine quickly (wasting effort). What about when the web traffic dips to its daily low? With static partitions we allocate for peak capacity, which means when traffic is at its lowest, all of that excess capacity is wasted. This is why a typical data center runs at only 8-15% efficiency. And don’t be fooled just because you’re running in the cloud: you’re still being charged for the resources your application is not using on each virtual machine (someone is benefiting — it’s just your cloud provider, not you).

And finally, with machines as the abstraction, organizations must employ armies of people to manually configure and maintain each individual application on each individual machine. People become the bottleneck for trying to run new applications, even when there are ample resources already provisioned that are not being utilized.

If my laptop were a data center

Imagine if we ran applications on our laptops the same way we run applications in our data centers. Each time we launched a web browser or text editor, we’d have to specify which CPU to use, which memory modules are addressable, which caches are available, and so on. Thankfully, our laptops have an operating system that abstracts us away from the complexities of manual resource management.

In fact, we have operating systems for our workstations, servers, mainframes, supercomputers, and mobile devices, each optimized for their unique capabilities and form factors.

We’ve already started treating the data center itself as one massive warehouse-scale computer. Yet, we still don’t have an operating system that abstracts and manages the hardware resources in the data center just like an operating system does on our laptops.

It’s time for the data center OS

What would an operating system for the data center look like?

From an operator’s perspective it would span all of the machines in a data center (or cloud) and aggregate them into one giant pool of resources on which applications would be run. You would no longer configure specific machines for specific applications; all applications would be capable of running on any available resources from any machine, even if there are other applications already running on those machines.

(Video) An Operating System for the Data Center Computer by Ben Hindman: FutureStack13

From a developer’s perspective, the data center operating system would act as an intermediary between applications and machines, providing common primitives to facilitate and simplify building distributed applications.

The data center operating system would not need to replace Linux or any other host operating systems we use in our data centers today. The data center operating system would provide a software stack on top of the host operating system. Continuing to use the host operating system to provide standard execution environments is critical to immediately supporting existing applications.

The data center operating system would provide functionality for the data center that is analogous to what a host operating system provides on a single machine today: namely, resource management and process isolation. Just like with a host operating system, a data center operating system would enable multiple users to execute multiple applications (made up of multiple processes) concurrently, across a shared collection of resources, with explicit isolation between those applications.

An API for the data center

Perhaps the defining characteristic of a data center operating system is that it provides a software interface for building distributed applications. Analogous to the system call interface for a host operating system, the data center operating system API would enable distributed applications to allocate and deallocate resources, launch, monitor, and destroy processes, and more. The API would provide primitives that implement common functionality that all distributed systems need. Thus, developers would no longer need to independently re-implement fundamental distributed systems primitives (and inevitably, independently suffer from the same bugs and performance issues).

Centralizing common functionality within the API primitives would enable developers to build new distributed applications more easily, more safely, and more quickly. This is reminiscent of when virtual memory was added to host operating systems. In fact, one of the virtual memory pioneers wrote that “it was pretty obvious to the designers of operating systems in the early 1960s that automatic storage allocation could significantly simplify programming.”

Example primitives

Two primitives specific to a data center operating system that would immediately simplify building distributed applications are service discovery and coordination. Unlike on a single host where very few applications need to discover other applications running on the same host, discovery is the norm for distributed applications. Likewise, most distributed applications achieve high availability and fault tolerance through some means of coordination and/or consensus, which is notoriously hard to implement correctly and efficiently.

With a data center operating system, a software interface replaces the human interface.Developers today are forced to pick between existing tools for service discovery and coordination, such as Apache ZooKeeper and CoreOS’ etcd. This forces organizations to deploy multiple tools for different applications, significantly increasing operational complexity and maintainability.

(Video) Building an Operating System for the Data Center

Having the data center operating system provide primitives for discovery and coordination not only simplifies development, it also enables application portability. Organizations can change the underlying implementations without rewriting the applications, much like you can choose between different filesystem implementations on a host operating system today.

A new way to deploy applications

With a data center operating system, a software interface replaces the human interface that developers typically interact with when trying to deploy their applications today; rather than a developer asking a person to provision and configure machines to run their applications, developers launch their applications using the data center operating system (e.g., via a CLI or GUI), and the application executes using the data center operating system’s API.

This supports a clean separation of concerns between operators and users: operators specify the amount of resources allocatable to each user, and users launch whatever applications they want, using whatever resources are available to them. Because an operator now specifies how much of any type of resource is available, but not which specific resource, a data center operating system, and the distributed applications running on top, can be more intelligent about which resources to use in order to execute more efficiently and better handle failures. Because most distributed applications have complex scheduling requirements (think Apache Hadoop) and specific needs for failure recovery (think of a database), empowering software to make decisions instead of humans is critical for operating efficiently at data-center scale.

The “cloud” is not an operating system

Why do we need a new operating system? Didn’t Infrastructure as a Service (IaaS) and Platform as a Service (PaaS) already solve these problems?

IaaS doesn’t solve our problems because it’s still focused on machines. It isn’t designed with a software interface intended for applications to use in order to execute. IaaS is designed for humans to consume, in order to provision virtual machines that other humans can use to deploy applications; IaaS turns machines into more (virtual) machines, but does not provide any primitives that make it easier for a developer to build distributed applications on top of those machines.

PaaS, on the other hand, abstracts away the machines, but is still designed first and foremost to be consumed by a human. Many PaaS solutions do include numerous tangential services and integrations that make building a distributed application easier, but not in a way that’s portable across other PaaS solutions.

Apache Mesos: The distributed systems kernel

Distributed computing is now the norm, not the exception, and we need a data center operating system that delivers a layer of abstraction and a portable API for distributed applications. Not having one is hindering our industry. Developers should be able to build distributed applications without having to reimplement common functionality. Distributed applications built in one organization should be capable of being run in another organization easily.

(Video) OSDC 2015: Bernd Mathiske | Wy the Datacenter Needs an Operating System

Existing cloud computing solutions and APIs are not sufficient. Moreover, the data center operating system API must be built, like Linux, in an open and collaborative manner. Proprietary APIs force lock-in, deterring a healthy and innovative ecosystem from growing. It’s time we created the POSIX for distributed computing: a portable API for distributed systems running in a data center or on a cloud.

The open source Apache Mesos project, of which I am one of the co-creators and the project chair, is a step in that direction. Apache Mesos aims to be a distributed systems kernel that provides a portable API upon which distributed applications can be built and run.

Many popular distributed systems have already been built directly on top of Mesos, including Apache Spark, Apache Aurora, Airbnb’s Chronos, and Mesosphere’s Marathon. Other popular distributed systems have been ported to run on top of Mesos, including Apache Hadoop, Apache Storm, and Google’s Kubernetes, to list a few.

Chronos is a compelling example of the value of building on top of Mesos. Chronos, a distributed system that provides highly available and fault-tolerant cron, was built on top of Mesos in only a few thousand lines of code and without having to do any explicit socket programming for network communication.

Companies like Twitter and Airbnb are already using Mesos to help run their datacenters, while companies like Google have been using in-house solutions they built almost a decade ago. In fact, just like Google’s MapReduce spurred an industry around Apache Hadoop, Google’s in-house datacenter solutions have had close ties with the evolution of Mesos.

While not a complete data center operating system, Mesos, along with some of the distributed applications running on top, provide some of the essential building blocks from which a full data center operating system can be built: the kernel (Mesos), a distributed init.d (Marathon/Aurora), cron (Chronos), and more.

Interested in learning more about or contributing to Mesos? Check out mesos.apache.org and follow @ApacheMesos on Twitter. We’re a growing community with users at companies like Twitter, Airbnb, Hubspot, OpenTable, eBay/Paypal, Netflix, Groupon, and more.

(Video) Cracking the container scale problem with the datacenter operating system

Cropped image on article and category pages by Karl-Ludwig Poggemann on Flickr, used under a Creative Commons license.

FAQs

What operating system is used for data centers? ›

Core OS is a minimal Linux distro that provides infrastructure primitives focused on deploying and running Docker services in production. It's 2 core primitives are etcd and fleet. etcd is a distributed key-value store that facilitates service discovery and configuration.

Why do we require operating system? ›

It manages the computer's memory and processes, as well as all of its software and hardware. It also allows you to communicate with the computer without knowing how to speak the computer's language. Without an operating system, a computer is useless.

Why does a server need an operating system? ›

A server OS is designed from the ground up to provide features suitable for multi-user, business-critical applications. It provides the central interface for managing users, implementing security, and other administrative processes. The focus of a server operating system is usually security, stability, and cooperation.

Do you need an OS for a server? ›

Every website needs a web server, and every web server runs on an operating system. Linux and Windows Server are your choices, and our guide will help you pick the right OS. There are many web hosting elements to consider before building a website, including price, bandwidth, storage, and software compatibility.

How does a data center operate? ›

A certain premise where an entire organization's IT operations and equipment are centralized and, where it stores, analyzes and distributes large amounts of data is known as a Data Center (DC). Earlier, the data processing needs were not too high but now-a-days those needs have grown exponentially.

Do data centers use Linux? ›

Most data centers include Linux, but there are many Linux server distros to choose from. Deciding which one is the right fit for your data center can be confusing, but there are three main options: Ubuntu Server, Red Hat Enterprise Linux and CoreOS.

What are the three main purpose of an operating system? ›

An operating system has three main functions: (1) manage the computer's resources, such as the central processing unit, memory, disk drives, and printers, (2) establish a user interface, and (3) execute and provide services for applications software.

What are the 5 main functions of an operating system? ›

Operating system functions
  • Controls the backing store and peripherals such as scanners and printers.
  • Deals with the transfer of programs in and out of memory.
  • Organises the use of memory between programs.
  • Organises processing time between programs and users.
  • Maintains security and access rights of users.

What are the 7 functions of an operating system? ›

Let us discuss the function of the operating system (OS) in detail.
  • Security. ...
  • Control over system performance. ...
  • Job Accounting. ...
  • Error detecting aids. ...
  • Coordination between other software and users. ...
  • Memory Management. ...
  • Processor Management. ...
  • Device Management.
25 Nov 2021

What operating system do most servers use? ›

There are two main choices for which OS you run on a dedicated server – Windows or Linux. Unlike Windows platform, Linux is further segmented into dozens of different versions, known as distributions, each with their own unique features and benefits.

What is the difference between a server and an operating system? ›

It is an operating system that is designed to be used on server. It is used to provide services to multiple client.
...
Difference between Server OS and Client OS :
Server Operating SystemClient Operating System
It can serve multiple client at a time.It serves a single user at a time.
9 more rows
2 Jul 2020

What is the purpose of a server operating system such as Windows Server? ›

Windows Server is a group of operating systems designed by Microsoft that supports enterprise-level management, data storage, applications, and communications. Previous versions of Windows Server have focused on stability, security, networking, and various improvements to the file system.

What is difference between server OS and client OS? ›

A server OS basically runs on a given server. A client OS basically runs on various client devices, such as computers, laptops, etc. This type of OS is designed in a way that it operates on any server. This type of OS is designed in a way that it operates within a desktop.

What is the difference between Windows OS and server OS? ›

Windows desktop is used for computation and other work at offices, schools etc. but Windows server is used to run services people use across a certain network. Windows Server comes with a desktop option, it is recommended to install Windows Server without GUI, to reduce the expenses to run the server.

Why is Linux used for servers? ›

Linux servers are widely used today and considered amongst the most popular due to their stability, security, and flexibility, which outstrip standard Windows servers. Another major benefit of using Linux over closed-source software like Windows is that the former is fully open-source.

What are the 3 main components of a data center infrastructure? ›

The primary elements of a data center break down as follows:
  • Facility – the usable space available for IT equipment. ...
  • Core components – equipment and software for IT operations and storage of data and applications. ...
  • Support infrastructure – equipment contributing to securely sustaining the highest availability possible.

What are the four main types of data centers? ›

Data center needs vary depending on their structure, physical limitations, density requirements and more. Here are four common data center types including onsite, colocation facilities, hyperscale, and edge data centers, as well as their use cases and industry trends.

What is difference between cloud and data center? ›

In a data center, data is most often stored on the premises of your organization. Some data centers may be in locations not owned by your organization—in this case, your data center is colocated, but not in the cloud. The cloud is completely off premises and your data is accessible from anywhere via the internet.

What is data center in Linux? ›

A data center -- also known as a datacenter or data centre -- is a facility composed of networked computers, storage systems and computing infrastructure that organizations use to assemble, process, store and disseminate large amounts of data.

Which of the following is an operating system found on most smartphones? ›

Android OS is the primary OS for Google mobile devices like smartphones and tablets.

What do operating systems do? ›

An operating system (OS) is the program that, after being initially loaded into the computer by a boot program, manages all of the other application programs in a computer. The application programs make use of the operating system by making requests for services through a defined application program interface (API).

What are the 4 main functions of an operating system? ›

List four major functions of an OS. It manages hardware, runs applications, provides an interface for users, and stores, retrieves, and manipulates files.

What are the three most common operating systems? ›

The three most common operating systems for personal computers are Microsoft Windows, macOS, and Linux. Operating systems use a graphical user interface, or GUI (pronounced gooey), that lets your mouse click buttons, icons, and menus, and displays graphics and text clearly on your screen.

What are the types of OS? ›

Here are the different types of operating systems you need to know:
  • Batch OS. The batch operating system does not have a direct link with the computer. ...
  • Time-sharing or multitasking OS. ...
  • Distributed OS. ...
  • Network OS. ...
  • Real-time OS. ...
  • Mobile OS.

What are the four main types of data centers? ›

Data center needs vary depending on their structure, physical limitations, density requirements and more. Here are four common data center types including onsite, colocation facilities, hyperscale, and edge data centers, as well as their use cases and industry trends.

What is data OS? ›

Sanders Version 1.0 Definition of a Data Operating System (DOS) A data operating system combines real-time, granular data; and domain-specific (e.g. healthcare), reusable analytic and computational logic about that data, into a single computing ecosystem for application development.

How are data centers built? ›

The traditional data center is built on a three-tier infrastructure with discreet blocks of compute, storage, and network resources allocated to support specific applications. In a hyper-converged infrastructure (HCI), the three tiers are combined into a single building block called a node.

What is data center virtualization? ›

Data center virtualization is the process of creating a modern data center that is highly scalable, available and secure. With data center virtualization products you can increase IT agility and create a seamless foundation to manage private and public cloud services alongside traditional on-premises infrastructure.

What are the 3 main components of a data center infrastructure? ›

The primary elements of a data center break down as follows:
  • Facility – the usable space available for IT equipment. ...
  • Core components – equipment and software for IT operations and storage of data and applications. ...
  • Support infrastructure – equipment contributing to securely sustaining the highest availability possible.

What is difference between cloud and data center? ›

In a data center, data is most often stored on the premises of your organization. Some data centers may be in locations not owned by your organization—in this case, your data center is colocated, but not in the cloud. The cloud is completely off premises and your data is accessible from anywhere via the internet.

How many servers does a data center have? ›

Most data centers are quite large, however, and a more typical number is close to 100,000 servers.

What are the three main purposes of an operating system? ›

An operating system has three main functions: (1) manage the computer's resources, such as the central processing unit, memory, disk drives, and printers, (2) establish a user interface, and (3) execute and provide services for applications software.

What are the 4 types of operating system? ›

Types of operating systems
  • Batch OS. The batch operating system does not have a direct link with the computer. ...
  • Time-sharing or multitasking OS. ...
  • Distributed OS. ...
  • Network OS. ...
  • Real-time OS. ...
  • Mobile OS.

What are the 5 operating system? ›

For the most part, the IT industry largely focuses on the top five OSs, including Apple macOS, Microsoft Windows, Google's Android OS, Linux Operating System, and Apple iOS.

What are the requirements of data center? ›

In terms of physical security, a data center should have:
  • Gated grounds.
  • Perimeter security with 24/7 guard posts.
  • Access controls with locked doors and racks.
  • Video monitoring of each rack row.
  • Closed-circuit security monitoring.
  • Live technical surveillance by expert NOC staff.
16 Sept 2021

What is required to setup data center? ›

The gear needed to create a business network includes switches, routers, firewalls and other cybersecurity elements. Cables and racks. Miles of wires interconnect IT gear, and physical server racks are used to organize servers and other gear within the facility space. Backup power.

How many types of data centers are there? ›

Data centers are made up of three primary types of components: compute, storage, and network. However, these components are only the top of the iceberg in a modern DC. Beneath the surface, support infrastructure is essential to meeting the service level agreements of an enterprise data center.

What are the 3 types of virtualization? ›

There are three main types of server virtualization: full-virtualization, para-virtualization, and OS-level virtualization.

What are the building blocks of data center? ›

With all of the different components, no two data centers will ever be the same. However, some of the key components that an enterprise (i.e., large-scale) data center will include are applications, servers, storage, networking infrastructure, management, and automation.

Does VMware have data center? ›

A typical VMware vSphere data center consists of physical building blocks such as x86 virtualization servers, storage networks and arrays, IP networks, a management server, and desktop clients. The vSphere data center includes the following components.

Videos

1. Can We Really Make an Operating System for Data Center as a Warehouse-scale Computer? The T-RON Oper
(Open Infrastructure Foundation)
2. High Performance Operating Systems in the Data Center
(UCIBrenICS)
3. Follow Up - Data Centre Operating Systems
(Old Mate's Backyard Tech)
4. Stick Shift or Automatic? How to Drive a Data Center Operating System
(The Linux Foundation)
5. Containers vs. Virtual Machines (VMs): What's the Difference?
(Technical Youtuber)
6. What is a Data Center?
(Google Cloud Tech)

Top Articles

Latest Posts

Article information

Author: Frankie Dare

Last Updated: 12/27/2022

Views: 5371

Rating: 4.2 / 5 (73 voted)

Reviews: 88% of readers found this page helpful

Author information

Name: Frankie Dare

Birthday: 2000-01-27

Address: Suite 313 45115 Caridad Freeway, Port Barabaraville, MS 66713

Phone: +3769542039359

Job: Sales Manager

Hobby: Baton twirling, Stand-up comedy, Leather crafting, Rugby, tabletop games, Jigsaw puzzles, Air sports

Introduction: My name is Frankie Dare, I am a funny, beautiful, proud, fair, pleasant, cheerful, enthusiastic person who loves writing and wants to share my knowledge and understanding with you.