What’s a “virtual machine”?

There has been quite a bit of discussion about virtual servers in the past couple of weeks.  But what exactly is a virtual server, and why should we worry about them?

Computers are basically big calculators.  The user enters a calculation, and the computer makes the calculation and shows a result.  Modern computers are much more complex that simple calculators of course, but this is the basic principle.  You give the computer a command, it shows you a result.  This can be the sum of two numbers (or an entire Excel column), or something that at first glance doesn’t appear to be a calculation at all, like returning a web page.

Computers are very good at doing calculations fast.  But they need to have a request made before they can start the calculation.  You enter 2+4 on a calculator, and it runs very quickly to get the result, then waits for you to give it another problem to solve.  The amount of time the calculator spends actually calculating is very small compared to the amount of time it spends waiting for the next question.

Desktop computers face this problem a little less, because the operating system is running a lot of different processes in the background.  However, if you’ve ever checked your CPU usage, you’ll know that the processor is usually idle — unless  something really intensive is running, like McAfee running a full virus scan.  Servers further reduce the inefficiency, because there are hundreds, thousands, maybe millions of people requesting calculations on the server at any given time.  But they still spend a lot of time waiting.  This is where virtualization comes in.

At the Library, we use 7 physical servers, along with some sophisticated software, to run more than 40 virtual servers.  Those 7 physical servers are much beefier than the typical server, and cost quite a bit more, but we can save a whole lot of money (and power, and space, and time administering the systems, and lots more) by exploiting the inefficiency described above.  When our storage and infrastructure upgrade project is completed this fall, we will actually be running just 3 physical servers, with the capacity to host up to 200 virtual servers.

Virtualization is saving us hundreds of thousands of dollars.

And hundreds of kilowatt hours per day.

And at least 100 square feet of high-cost environmentally controlled server room space.

And at least 80 hours of work per week in administration overhead.

So what’s the drawback of using virtual servers?  When the environment gets overloaded, like our current production environment, it can slow down all the services that are running in the virtual environment.  We can’t add more resources (processing power and memory) to existing virtual machines because we’ve used up all the physical resources.

The solution is to upgrade the physical hardware.  Our new infrastructure, which we will be migrating services to this fall, uses hardware that’s more than 5 years newer than our current hardware.  As the project progresses, you should expect to see searches complete faster, files copy faster, Library websites load faster, and generally improved performance for all Library IT services.

IT ticket response policy

Library IT has always held a certain standard for responding to IT tickets, but we’ve never formally announced the policy.  In times of staff turnover, we haven’t always done well in communicating the unwritten rules to new employees, and our response rates might have suffered for it.  To fix this, we have published an official policy for responding to OTRS tickets.  Highlights below:

  • You will get an immediate automated response when you submit a ticket.  If you don’t get this response, you may have entered your email address incorrectly.
  • You will get a response from a human within 2 business days.  In most cases, this will be a resolution to the problem, but it may also include requests for follow-up information.
  • If it will take more than 3 days to resolve the issue, you will receive an update on the progress every 5 business days (once per week).

Sometimes a ticket will be taken “offline”, and closed in OTRS to be dealt with outside the ticketing system.  If you are unsure whether you should follow up with OTRS or reply directly to an IT professional, we recommend that you do BOTH!  We believe that it is better to over-communicate than to have your problem be left forgotten.

Storage and Virtual Infrastructure Project Underway

The first step in the Storage and Virtual Infrastructure project has been completed!  The new hardware was delivered and installed last week.  Tim Vruwink documented the installation at the shared data center in DCL with a series of photos.

Storage array

The artistic view from above

The installation took Tim, Chuck Kibler, Michael Tipsword, and a host of other CITES technicians more than a day to complete.  By 7 PM, we had nearly filled two racks of equipment (14 shipping pallets) with servers and disks.

Disks

Each little white square represents one hard disk

The new system features 12 disk enclosures, each housing 24 disks.  There are two different disk types, fast disks that hold 600 GB, and slow disks that hold 2 TB.  Altogether, this storage array has a raw capacity of 220 TB.  Currently, the Library manages about 100 TB of production and preservation data.

Virtual Servers

The top server is used for managing and keeping statistics on the servers, and the bottom three run the entire Library infrastructure

Our current infrastructure needs seven servers to provide enough processing power and memory for all the services the Library provides.  With this upgrade, we will consolidate that to just three servers, plus one management console, and have room to more than triple our capacity.  Which we plan to do.

Controllers

Redundant storage controllers, used for connecting the disks to the servers so the storage can actually be used

The next step will be to upgrade the Main Library so it can communicate with this new storage and virtual server array in DCL.  We will then shift production services to DCL, and use the Main Library as an off-site backup.  This configuration was chosen because DCL offers better facilities in their data center, and therefore should provide a more stable and secure environment than the Main Library building.

There are still many challenges ahead, but this installation represents a major milestone toward a truly world-class digital infrastructure.