Identity

This is the first of a two-part series introducing the Identity and Access Management (IAM)project at the University of Illinois.  Read part 2, Access Management.  More information can be found on the IAM project website.

Authentication

When someone tells you who they are, do you believe them?  If it’s a casual introduction, “Hi, my name is Jason,” then probably you do believe them every time.  If it’s during a transaction, like checking out a book from the Library, you probably ask for some proof (like showing a library card).  If the person is trying to access their bank account online, they are asked for a password.

The process of determining whether someone is who they claim to be is called authentication.  There are certain levels of trust associated with authentication, which are appropriate for different types of interaction.  When I introduce myself to someone on the bus, there’s no logical reason to complicate things by making me prove who I am.   When I access my credit card accounts, though, I want some pretty strong safeguards in place so I can be reasonably sure no one else is getting into my financial transactions.

The University currently has several methods for authentication, which are very loosely coupled, and that don’t rely on the same basic requirements.  There have been reports of people using, for example, a guest library card to convince someone at Campus Rec to also issue a day pass for the ARC.

The IAM project aims to address this situation by creating one source for information, a central identity store, which includes information on how trusted each identity is.  When a student applies for undergraduate admission, they may be entered into the identity store with a low trust level – we only know their email address, but that’s good enough for now.  The applicant then submits some paperwork, which may be checked against other sources (the Social Security Administration, credit bureaus, the high school the applicant attended, etc).  This adds increased trust, because it’s much more difficult to fake those records than it is to fake an email address.

One Person, One Identity

Currently, each U of I campus maintains its own identity store.  Different units may also have their own systems – the Library uses a database to differentiate between different patron types, for example.  Effectively, this means that a person could have several different identities with different levels of trust.  In order for that person to function on campus, all of these systems need to communicate with each other, and people who grant access to services need to be familiar with each of these other systems.

IAM will greatly reduce the complexity of identity management by ensuring that each and every person has one, and only one, identity.  The central identity store will be queried in various ways, depending on the level of trust required for the service being requested.

Identity and the Library

Creating and maintaining this store has several implications for the Library.  We currently gather patron information from ICard, CITES, and Active Directory.  Each feed has some delay associated with it, which creates a delay in updating the current information.  In the coming years, the IAM project will allow us to change how we query patron information, which should greatly simplify the current process and improve accuracy and timeliness of data.

But there are other issues that bring up problems.  As a public library, we do not require that patrons be affiliated with the University in order to use some services.  Will we need to expand the identity management store to include anyone who might be a patron?  Will we need to run a separate identity store for people who are not in the central one?  Will our service models change to allow a basic level of service for people without a U of I identity, which doesn’t require any identity proofing at all?  These are just a few of the questions we will have to address as the project moves forward.

Continue with part 2, Access Management

Where to store files?

There’s actually quite a bit of documentation on the G and H drive already available, but not much of it has been consolidated into an easy-to-digest format.  To hopefully clear things up, the Digital Library Access, Repository, and Scholarly Commons Services group has put together a summary document.  It is currently housed on the CITES wiki, but will eventually move to a more permanent knowledge base – which just happens to be a great topic for a later post!  High fives to Betsy Kruger and Kyle Rimkus for helping to create the first draft.

The G and H drives are great places to store data that needs to be protected from accidental deletion or corruption.  The most common IT request (relating to backups) is that someone lost a file, or made an accidental edit that they couldn’t undo.  IT can quickly handle these requests for the G and H drive.  IT can easily restore files that were corrupted or deleted up to seven days ago.  We also have a policy of keeping backups for one year, but due to technology limitations, it is much more difficult to restore from those backup sets.

Box.com looks very similar to G and H at first glance.  The major drawback to Box is that you cannot get an on-demand restore.  The data is protected from disaster, such as a data center losing power from an earthquake, but the service does not allow you to restore a file that you accidentally deleted.  On the other hand, Box is the easiest way to share files with people outside the University.

The G and H drives are maintained by the Library, and we consistently increase the quota to meet the needs of our units and patrons.  Therefore, we frown upon using them for personal documents, such as your MP3 collection.  However, Box was commissioned as a replacement for Netfiles, which was specifically intended as common-good storage for staff and students.  So feel free to use Box for practically any (legal) need you have for storage.

We are eagerly looking forward to the upcoming storage and virtual infrastructure upgrade,  which will allow us to restore files to specific times of the day and go back much further than we currently do.  As we implement features of the new system, we hope to offer self-service restores.  This will allow you to roll back to an earlier version of your file, without having to get involved with OTRS or IT.

What’s a “virtual machine”?

There has been quite a bit of discussion about virtual servers in the past couple of weeks.  But what exactly is a virtual server, and why should we worry about them?

Computers are basically big calculators.  The user enters a calculation, and the computer makes the calculation and shows a result.  Modern computers are much more complex that simple calculators of course, but this is the basic principle.  You give the computer a command, it shows you a result.  This can be the sum of two numbers (or an entire Excel column), or something that at first glance doesn’t appear to be a calculation at all, like returning a web page.

Computers are very good at doing calculations fast.  But they need to have a request made before they can start the calculation.  You enter 2+4 on a calculator, and it runs very quickly to get the result, then waits for you to give it another problem to solve.  The amount of time the calculator spends actually calculating is very small compared to the amount of time it spends waiting for the next question.

Desktop computers face this problem a little less, because the operating system is running a lot of different processes in the background.  However, if you’ve ever checked your CPU usage, you’ll know that the processor is usually idle — unless  something really intensive is running, like McAfee running a full virus scan.  Servers further reduce the inefficiency, because there are hundreds, thousands, maybe millions of people requesting calculations on the server at any given time.  But they still spend a lot of time waiting.  This is where virtualization comes in.

At the Library, we use 7 physical servers, along with some sophisticated software, to run more than 40 virtual servers.  Those 7 physical servers are much beefier than the typical server, and cost quite a bit more, but we can save a whole lot of money (and power, and space, and time administering the systems, and lots more) by exploiting the inefficiency described above.  When our storage and infrastructure upgrade project is completed this fall, we will actually be running just 3 physical servers, with the capacity to host up to 200 virtual servers.

Virtualization is saving us hundreds of thousands of dollars.

And hundreds of kilowatt hours per day.

And at least 100 square feet of high-cost environmentally controlled server room space.

And at least 80 hours of work per week in administration overhead.

So what’s the drawback of using virtual servers?  When the environment gets overloaded, like our current production environment, it can slow down all the services that are running in the virtual environment.  We can’t add more resources (processing power and memory) to existing virtual machines because we’ve used up all the physical resources.

The solution is to upgrade the physical hardware.  Our new infrastructure, which we will be migrating services to this fall, uses hardware that’s more than 5 years newer than our current hardware.  As the project progresses, you should expect to see searches complete faster, files copy faster, Library websites load faster, and generally improved performance for all Library IT services.

IT ticket response policy

Library IT has always held a certain standard for responding to IT tickets, but we’ve never formally announced the policy.  In times of staff turnover, we haven’t always done well in communicating the unwritten rules to new employees, and our response rates might have suffered for it.  To fix this, we have published an official policy for responding to OTRS tickets.  Highlights below:

  • You will get an immediate automated response when you submit a ticket.  If you don’t get this response, you may have entered your email address incorrectly.
  • You will get a response from a human within 2 business days.  In most cases, this will be a resolution to the problem, but it may also include requests for follow-up information.
  • If it will take more than 3 days to resolve the issue, you will receive an update on the progress every 5 business days (once per week).

Sometimes a ticket will be taken “offline”, and closed in OTRS to be dealt with outside the ticketing system.  If you are unsure whether you should follow up with OTRS or reply directly to an IT professional, we recommend that you do BOTH!  We believe that it is better to over-communicate than to have your problem be left forgotten.

Storage and Virtual Infrastructure Project Underway

The first step in the Storage and Virtual Infrastructure project has been completed!  The new hardware was delivered and installed last week.  Tim Vruwink documented the installation at the shared data center in DCL with a series of photos.

Storage array

The artistic view from above

The installation took Tim, Chuck Kibler, Michael Tipsword, and a host of other CITES technicians more than a day to complete.  By 7 PM, we had nearly filled two racks of equipment (14 shipping pallets) with servers and disks.

Disks

Each little white square represents one hard disk

The new system features 12 disk enclosures, each housing 24 disks.  There are two different disk types, fast disks that hold 600 GB, and slow disks that hold 2 TB.  Altogether, this storage array has a raw capacity of 220 TB.  Currently, the Library manages about 100 TB of production and preservation data.

Virtual Servers

The top server is used for managing and keeping statistics on the servers, and the bottom three run the entire Library infrastructure

Our current infrastructure needs seven servers to provide enough processing power and memory for all the services the Library provides.  With this upgrade, we will consolidate that to just three servers, plus one management console, and have room to more than triple our capacity.  Which we plan to do.

Controllers

Redundant storage controllers, used for connecting the disks to the servers so the storage can actually be used

The next step will be to upgrade the Main Library so it can communicate with this new storage and virtual server array in DCL.  We will then shift production services to DCL, and use the Main Library as an off-site backup.  This configuration was chosen because DCL offers better facilities in their data center, and therefore should provide a more stable and secure environment than the Main Library building.

There are still many challenges ahead, but this installation represents a major milestone toward a truly world-class digital infrastructure.

Good News About Storage!!!

I’m writing with some welcome news about support for the Library’s storage and servers.  For some time Library IT has wrestled to find the best way to support the Library’s servers, and its rapid use of file storage as we become an increasingly digital library.  The quick news is that the Library will replace its storage, server, and file backup infrastructure in the next several months.  Library IT, specifically Jason Strutz and the Infrastructure Management and Support (IMS) group, will be communicating with you about the specific activities of this replacement.   Normally this would be a “back office” operation, but it is significant for two reasons.  If you’re interested, read on–there is reason for the Library to celebrate, and for the campus to celebrate with us.

The new storage acquisition is significant because it represents a landmark step for the the Library and the campus toward more centralized support for data storage.  Why is this significant?  Because data storage has become like the network–it’s essential to support virtually all the work that goes on in any organization.  Library IT supports nearly 300 Terabytes of storage–the Library is one of the top ten storage users on campus, and that includes large colleges like LAS, and research centers like the Beckman Institute and the Institute for Genomic Biology (IGB).  With the exciting digitization of books for the Internet Archive, the Google Book digitization project, special collections, archives, film, still images, and audio, the Library’s storage needs are growing at an unprecedented rate.  For some time we’ve kept pace with this growth, but storage, networking, and servers are becoming increasingly complex, requiring more staff to support and manage these systems.

At the same time the Library was working on its storage challenge, the campus IT community started two initiatives in which Library IT pro’s and faculty got involved–the Data Center Consolidation committee, and the Data Storage Task Force.  Briefly stated, the campus IT community is focused on providing robust and reliable storage.  In some cases, that means centralizing storage operations and services.  The Library, the Data Center Consolidation committee, and CITES have come to a tentative agreement to move the Library’s primary storage and server equipment into the CITES data center in the DCL building.  Some storage and server equipment will remain in the Library, and Library IT professionals will continue to manage and administer the servers and the storage for the Library.

What are the benefits of making this change?  In a nutshell, the Library’s production servers and storage will be housed in a robust data center with more reliable power, cooling, and network connections than the Main Library can now provide.  We will move into a facility that is dedicated to keeping servers and storage running 24/7.  And, we’ll  have made a giant step forward in the campus goal of centralizing storage so that all of us in the Library can concentrate on managing the technology, the systems, and the services that we provide best, using this infrastructure.

A number of people have helped us to get to this point, including Paul Hixson, Panit Lisy, Charley Kline, and Bob Booth at CITES and the Office of the CIO; further,the Operations and the Executive committees for Data Center Consolidation have helped to make possible our transition into the CITES data center.  Many in Library IT have invested significant time and energy into getting to this point and further, including Jason Strutz, Tim Vruwink, Chuck Kibler, Lee Galaway, Bill Mischo, Tom Habing, and Robert Ferrer.   We work with an important group of stakeholders in the Library as well, to determine and support storage needs, including Digital Content Creation, Preservation, Archives, Rare Book and Manuscript Library.  Please stay tuned for more specific updates!

 

PaperCut, the New Face of Public Printing

The Library is in the process of upgrading the public printing system.  Our old system, LibPrint, was developed by Library IT staff more than a decade ago.  LibPrint has served the Library user community quite well, but we’ve identified a software product called PaperCut that has proven it can offer everything LibPrint did, and more.

PaperCut offers many features that improve on LibPrint, including fast printing with minimal waiting, double-sided printing, and an improved administrative interface that helps Library staff quickly diagnose and fix printing problems. LibPrint would sometimes become so bogged down that patrons might have to wait 15 minutes for a print job to come out.  As we continue to train Library staff about the new system, we expect that wait time to completely disappear.

As of July 1, we have installed PaperCut in the Undergraduate, International and Area Studies, Map and Geography, and Classics Libraries. This month, we plan to upgrade Funk-ACES, Grainger, and MPAL.  By August 31, all Library public printing will be managed using PaperCut.

I’d like to thank the UGL staff and faculty for their patience and assistance with the pilot PaperCut installation.  Gregg Homerding and Paula Adams deserve special recognition for coordinating communications and drafting and reviewing documentation.  Also, this transition would not have been possible without the efforts of the Workstation and Networking Support group, including Eric Mosher, Jake Metz, Bryan Choi, Jackson Deremiah, and Elzabad Kennedy.

Welcome

Welcome you to the Library IT blog.  Technology is an important component in the way the University of Illinois Library provides access to collections, services, and a staff of experts across many subjects and functions.  The Library’s Gateway Web site is visited over 100 million times a year by users world-wide.  We all know that technologies in libraries change on a daily basis.  The technologies that support our Library’s services comprise over 50 distinct systems and services, supported here in the Library, or externally.  Staff across the Library work directly with Library IT to make sure these systems meet our collective needs.   Our goal is to provide  Library staff and users (both in-person and virtual) with the most effective technologies possible to enable forward-reaching service, research, and education programs.

This blog opens up a channel for the Library to share information about local projects and services, campus IT activities, and technology news in the broader library and technology community.  It won’t take the place of the Library IT Help Desk web site, the OTRS ticket system, or the frequent posts Library IT makes to the LIBNEWS-L listserv.  Through the blog we seek to keep Library staff and users up-to-date on Library technology activities, about new products and services, and existing systems and services.  I invite you to tune in and to share your perspectives and ideas.