Where to store files?

There’s actually quite a bit of documentation on the G and H drive already available, but not much of it has been consolidated into an easy-to-digest format.  To hopefully clear things up, the Digital Library Access, Repository, and Scholarly Commons Services group has put together a summary document.  It is currently housed on the CITES wiki, but will eventually move to a more permanent knowledge base – which just happens to be a great topic for a later post!  High fives to Betsy Kruger and Kyle Rimkus for helping to create the first draft.

The G and H drives are great places to store data that needs to be protected from accidental deletion or corruption.  The most common IT request (relating to backups) is that someone lost a file, or made an accidental edit that they couldn’t undo.  IT can quickly handle these requests for the G and H drive.  IT can easily restore files that were corrupted or deleted up to seven days ago.  We also have a policy of keeping backups for one year, but due to technology limitations, it is much more difficult to restore from those backup sets.

Box.com looks very similar to G and H at first glance.  The major drawback to Box is that you cannot get an on-demand restore.  The data is protected from disaster, such as a data center losing power from an earthquake, but the service does not allow you to restore a file that you accidentally deleted.  On the other hand, Box is the easiest way to share files with people outside the University.

The G and H drives are maintained by the Library, and we consistently increase the quota to meet the needs of our units and patrons.  Therefore, we frown upon using them for personal documents, such as your MP3 collection.  However, Box was commissioned as a replacement for Netfiles, which was specifically intended as common-good storage for staff and students.  So feel free to use Box for practically any (legal) need you have for storage.

We are eagerly looking forward to the upcoming storage and virtual infrastructure upgrade,  which will allow us to restore files to specific times of the day and go back much further than we currently do.  As we implement features of the new system, we hope to offer self-service restores.  This will allow you to roll back to an earlier version of your file, without having to get involved with OTRS or IT.

What’s a “virtual machine”?

There has been quite a bit of discussion about virtual servers in the past couple of weeks.  But what exactly is a virtual server, and why should we worry about them?

Computers are basically big calculators.  The user enters a calculation, and the computer makes the calculation and shows a result.  Modern computers are much more complex that simple calculators of course, but this is the basic principle.  You give the computer a command, it shows you a result.  This can be the sum of two numbers (or an entire Excel column), or something that at first glance doesn’t appear to be a calculation at all, like returning a web page.

Computers are very good at doing calculations fast.  But they need to have a request made before they can start the calculation.  You enter 2+4 on a calculator, and it runs very quickly to get the result, then waits for you to give it another problem to solve.  The amount of time the calculator spends actually calculating is very small compared to the amount of time it spends waiting for the next question.

Desktop computers face this problem a little less, because the operating system is running a lot of different processes in the background.  However, if you’ve ever checked your CPU usage, you’ll know that the processor is usually idle — unless  something really intensive is running, like McAfee running a full virus scan.  Servers further reduce the inefficiency, because there are hundreds, thousands, maybe millions of people requesting calculations on the server at any given time.  But they still spend a lot of time waiting.  This is where virtualization comes in.

At the Library, we use 7 physical servers, along with some sophisticated software, to run more than 40 virtual servers.  Those 7 physical servers are much beefier than the typical server, and cost quite a bit more, but we can save a whole lot of money (and power, and space, and time administering the systems, and lots more) by exploiting the inefficiency described above.  When our storage and infrastructure upgrade project is completed this fall, we will actually be running just 3 physical servers, with the capacity to host up to 200 virtual servers.

Virtualization is saving us hundreds of thousands of dollars.

And hundreds of kilowatt hours per day.

And at least 100 square feet of high-cost environmentally controlled server room space.

And at least 80 hours of work per week in administration overhead.

So what’s the drawback of using virtual servers?  When the environment gets overloaded, like our current production environment, it can slow down all the services that are running in the virtual environment.  We can’t add more resources (processing power and memory) to existing virtual machines because we’ve used up all the physical resources.

The solution is to upgrade the physical hardware.  Our new infrastructure, which we will be migrating services to this fall, uses hardware that’s more than 5 years newer than our current hardware.  As the project progresses, you should expect to see searches complete faster, files copy faster, Library websites load faster, and generally improved performance for all Library IT services.

IT ticket response policy

Library IT has always held a certain standard for responding to IT tickets, but we’ve never formally announced the policy.  In times of staff turnover, we haven’t always done well in communicating the unwritten rules to new employees, and our response rates might have suffered for it.  To fix this, we have published an official policy for responding to OTRS tickets.  Highlights below:

  • You will get an immediate automated response when you submit a ticket.  If you don’t get this response, you may have entered your email address incorrectly.
  • You will get a response from a human within 2 business days.  In most cases, this will be a resolution to the problem, but it may also include requests for follow-up information.
  • If it will take more than 3 days to resolve the issue, you will receive an update on the progress every 5 business days (once per week).

Sometimes a ticket will be taken “offline”, and closed in OTRS to be dealt with outside the ticketing system.  If you are unsure whether you should follow up with OTRS or reply directly to an IT professional, we recommend that you do BOTH!  We believe that it is better to over-communicate than to have your problem be left forgotten.

Storage and Virtual Infrastructure Project Underway

The first step in the Storage and Virtual Infrastructure project has been completed!  The new hardware was delivered and installed last week.  Tim Vruwink documented the installation at the shared data center in DCL with a series of photos.

Storage array

The artistic view from above

The installation took Tim, Chuck Kibler, Michael Tipsword, and a host of other CITES technicians more than a day to complete.  By 7 PM, we had nearly filled two racks of equipment (14 shipping pallets) with servers and disks.

Disks

Each little white square represents one hard disk

The new system features 12 disk enclosures, each housing 24 disks.  There are two different disk types, fast disks that hold 600 GB, and slow disks that hold 2 TB.  Altogether, this storage array has a raw capacity of 220 TB.  Currently, the Library manages about 100 TB of production and preservation data.

Virtual Servers

The top server is used for managing and keeping statistics on the servers, and the bottom three run the entire Library infrastructure

Our current infrastructure needs seven servers to provide enough processing power and memory for all the services the Library provides.  With this upgrade, we will consolidate that to just three servers, plus one management console, and have room to more than triple our capacity.  Which we plan to do.

Controllers

Redundant storage controllers, used for connecting the disks to the servers so the storage can actually be used

The next step will be to upgrade the Main Library so it can communicate with this new storage and virtual server array in DCL.  We will then shift production services to DCL, and use the Main Library as an off-site backup.  This configuration was chosen because DCL offers better facilities in their data center, and therefore should provide a more stable and secure environment than the Main Library building.

There are still many challenges ahead, but this installation represents a major milestone toward a truly world-class digital infrastructure.

PaperCut, the New Face of Public Printing

The Library is in the process of upgrading the public printing system.  Our old system, LibPrint, was developed by Library IT staff more than a decade ago.  LibPrint has served the Library user community quite well, but we’ve identified a software product called PaperCut that has proven it can offer everything LibPrint did, and more.

PaperCut offers many features that improve on LibPrint, including fast printing with minimal waiting, double-sided printing, and an improved administrative interface that helps Library staff quickly diagnose and fix printing problems. LibPrint would sometimes become so bogged down that patrons might have to wait 15 minutes for a print job to come out.  As we continue to train Library staff about the new system, we expect that wait time to completely disappear.

As of July 1, we have installed PaperCut in the Undergraduate, International and Area Studies, Map and Geography, and Classics Libraries. This month, we plan to upgrade Funk-ACES, Grainger, and MPAL.  By August 31, all Library public printing will be managed using PaperCut.

I’d like to thank the UGL staff and faculty for their patience and assistance with the pilot PaperCut installation.  Gregg Homerding and Paula Adams deserve special recognition for coordinating communications and drafting and reviewing documentation.  Also, this transition would not have been possible without the efforts of the Workstation and Networking Support group, including Eric Mosher, Jake Metz, Bryan Choi, Jackson Deremiah, and Elzabad Kennedy.