Workshop – Data Infrastructure: The Importance of Quality and Integrity

Registration Now Open for CENDI NFAIS Workshop —
Data Infrastructure: The Importance of Quality and Integrity

Thursday, November 20, 2014   9:00 am – 4:30 pm
This One-Day Workshop is Co-sponsored by CENDI and NFAIS, and is Hosted by NTIS at the US Patent and Trademark OfficeMadison Auditorium, USPTO, 600 Dulany Street, Alexandria, VA  22314


This one-day workshop is a must for anyone involved in creating, managing, or using scientific data. The Open Data movement dramatically is changing the flows as well as the roles of all involved.   This workshop will explore the technical, financial, political, and social/cultural forces that must be considered when assessing the quality and integrity of the data.  We will investigate with major stakeholders how can we rely on the quality and integrity of the data that are becoming increasingly available.  Mark your calendar now to reserve the date of this informative workshop.  Registration will open September 15, 2014, to accommodate those who need to pay before the new fiscal year begins.

 Speakers will be announced very soon.  They are industry and government experts who will make this day engaging, informative, and constructive.  Mark your calendars and stay tuned for further developments as the agenda is completed.  Registration opens September 15, 2014.

For More Information visit original announcement at

Help Obtaining Data is Available from the Library

This fall marks the fifth annual Data Purchase Program, where the University Library accepts applications from campus researchers for purchasing data that will be useful to them in their research.  The data must under $5,000, must be used for teaching or research, and it must be available to all of campus.  Some vendors are only willing to sell access for one person, but often we can negotiate campus access.

The library has purchased a large variety of data: from tax assessor’s data for the Chicago area to satellite imagery of a river in Argentina and the locations of villages in the state of Himachal Pradesh, India.  A full list of purchased data is on the program description page at

 The deadline for first consideration is September 29, but the Data Services Committee will consider applications that come in later as long as we have funds available and can complete the purchase by the end of the fiscal year.

If you are interested in applying for the Data Purchase Program, the online application is at  If you have questions about the program or need help identifying data for your research, please contact Karen Hogenboom, Numeric and Spatial Data Librarian, at  We look forward to connecting you with the data you need!

New Open Access Data Policy from the DOE

In response to the Office of Science and Technology Policy (OSTP)’s 2013 memo, “Expanding Public Access to the Results of Federally Funded Research,” the U. S. Department of Energy (DOE) has released their Public Access Plan.

The DOE is one of the first federal agencies to release their new OSTP-mandated open data policy. The  DOE describes their DOE Public Access Plan as one that:

“commits the Department to deploying a web-based portal, the DOE Public Access Gateway for Energy and Science (DOE PAGES), that will make scholarly scientific publications resulting from DOE research funding publicly accessible and searchable at no charge to readers; and to instituting data management principles and requirements that ultimately will apply to proposals for research funding submitted to all DOE program offices.”

Read the full DOE Public Plan here.

H/T to Scholarly Kitchen for the news.

New Blog from the Associate Director for Data Science (ADDS), NIH

Philip E. Bourne, Director for Data Science (ADDS) at NIH, has launched a new blog, PEBOURNE, with the goal to “be transparent and informative” and “welcome your input at any time.” The “Ten Weeks as ADDS” posting provides highlights from a 2012 report of the Data and Informatics Working Group and outlines Dr. Bourne’s vision for the future of Data Science at NIH.

Dr. Bourne identifies 5 major themes that will drive activities for the near term:

  1. BD2K – fostering innovation through partnership with the extramural biomedical research community. BD2K seeks to develop better ways to tackle the challenges (and harness the potential) of biomedical big data, with the goal of establishing a national infrastructure to support biomedical research.
  2. Sustainability – partnering with the community to address the challenges of maintaining the rapidly growing digital assets that are generated as part of biomedical research.
  3. Training – preparing the workforce to address the challenges and opportunities of biomedical research as a digital enterprise.
  4. Evaluation & Reward – defining the means to evaluate the value of data scientists, data, software and other digital assets to the research enterprise and getting all scholars to appreciate that value.
  5. Communication & Outreach – working with partners – other federal agencies, the private sector, both nationally and internationally, inside and outside of biomedicine, to learn from experience and maximize the value of the digital enterprise, within and across disciplines.

For additional details visit PEBOURNE: Professional Developments worth Sharing.

Data Citation Index

The University Library now subscribes to the Data Citation Index from Thomson Reuters (which also provides Web of Science).  You can also access the Data Citation Index by searching for it in the Library’s Online Journals & Databases system.

The goal of the Data Citation Index is to support data discovery, reuse and interpretation.  To achieve this, the Data Citation Index brings together results from data repositories across disciplines.  The rough breakdown of repositories by discipline is: life sciences (48%), physical sciences (23%), social sciences (20%), arts & humanities (7%), and multidisciplinary (2%). Examples of repositories included are: Gene Expression Omnibus, WormBase, Dryad, NOAA National Geophysical Data Center, Inter-University Consortium for Political and Social Research (ICPSR), Archaeology Data Service, and figshare.

The Data Citation Index provides suggested citations for the data, based on the data citation recommendations of

The Data Citation Index also provides links between the data and the articles that cite it.  For example, search for “GSE2814” to see mouse liver tissue expression data that has been cited by 6 articles in Web of Science.  Because data citation is not standardized or common practice, most data in the Data Citation Index has not been cited very often.  So currently, this is not a very robust feature, but it has interesting potential.

Meet the new GIS Specialist

Please join the Library in welcoming James Whitacre, who started in June as the new GIS Specialist.  He previously worked at the Carnegie Museum of Natural History in Pennsylvania, and has a master’s degree in Geography from Indiana University of Pennsylvania.

He is available for consultations with researchers and students from across campus who are using GIS, and also will teach campus-wide workshops on GIS data and tools. Overall, he will work to build up the Library’s GIS data services for all users.

You can find James in the Scholarly Commons and feel free to contact him with your GIS questions!

Social Science Data Repositories

The various disciplines of the social sciences yield research data as diverse as the repositories which preserve and make these data sets accessible. Among the many fields that comprise the social sciences–archaeology, geography, sociology, economics, political science, and psychology, to name a few–the numeric and spatial data they produce document a vast array of research on human behavior, culture, society, landscapes, and economic structures. Given the number of options, social science researchers must consider which repository is best suited for the long-term preservation and curation of their data.

Some social science data repositories are broad in scope; the ICPSR (Inter-university Consortium for Political and Social Research) at the University of Michigan, for instance, preserves data related to geography and environment, economic behavior and attitudes, community and urban studies, and education. Other repositories provide data curation services for a single discipline such as tDAR (The Digital Archaeological Record) for archaeology.

Different social science repositories will have varying requirements for data deposit. The ICPSR provides a comprehensive guide for preparing data for preservation and archiving. Such guidelines will outline specifics such as acceptable file formats for data submission,  requirements for metadata descriptions, and deposit of supporting documentation (such as laboratory notebooks) that aid in illustrating the context under which the data was created.

A list of social science repositories/resources for finding and depositing data can be found below:


ADS (Archaeology Data Service)

CoPAR (Council for the Preservation of Anthropological Records)

Open Context

Registry of Anthropological Data wiki

tDAR (The Digital Archaeological Record)

Criminal Justice

NACJD (National Archive of Criminal Justice Data – associated with ICPSR)

U.S. Bureau of Justice Statistics

Demographics/Government Data Repositories

Minnesota Population Center, University of Minnesota

U.S. Bureau of Labor Statistics

U.S. Bureau of Transportation Statistics

U.S. Census Bureau


National Bureau on Economic Research


National Center on Education Statistics

General Social Sciences

Australian Data Archive

CESSDA (Consortium of European Social Science Data Archives)

ICPSR (Inter-university Consortium for Political and Social Research) at the University of Michigan. ICPSR also manages the NCAA Student-Athlete Experiences Data Archive

IQSS (The Institute for Quantitative Social Science) at Harvard University)

Pew Research Center

UK Data Archive

UK Data Service

UNESCO Institute for Statistics





National Oceanographic Data Center



Sociology/Public Opinion

American National Election Studies

National Data Archive on Child Abuse and Neglect at Cornell University

Roper Center Public Opinion Archives

While this is not a complete list, other data repositories can be found through data clearinghouses, such as OpenDOAR and Databib. The Scholarly Commons also maintains a list of Geospatial Data Repositories and Numeric Data Repositories. Likewise, the College of Liberal Arts and Sciences’ ATLAS provides a  list of repositories relating to public opinion and government data.

For more information on the Scholarly Commons’ services relating to the social sciences, please visit the Numeric and Spatial Data Services site.


Stay Current with the IASSIST Blog

If you are interested in data archiving or social science data in general, bookmark the blog of the International Association for Social Science Information Services and Technology (IASSIST) at  The blog covers:

Blog entries summarize and review reports about these topics. Conferences and meetings are announced on the blog, and there are a few “think pieces” about issues related to social science data.  Posts are by members of IASSIST, who are data archivists, data librarians, IT professionals, and users of social science data.  IASSIST’s membership is worldwide, so the blog is also a good place to find out what new developments are taking place in other countries.

“The Signal”: The Library of Congress’ Blog on the Digital Past and Future

Perhaps one of the most informative and engaging resources on digital preservation and stewardship is the Library of Congress’ blog, The Signal. Through interviews, discussions on digital preservation tools and resources, videos and tutorials, and highlights of the work of Library of Congress staff members, The Signal engages with archivists, data curators, and digital asset managers as they negotiate best practices and methodologies in the changing digital landscape.

The Signal profiles educational resources and professional reports to provide much-needed guidance for the archival and digital curation communities, including features on the National Digital Information Infrastructure Preservation Program (NDIIPP) report, Preserving.exe: Toward a National Strategy for Preserving Software; the Digital Preservation Coalition Technology Watch report Preserving Email, authored by Chris Prom from the University of Illinois; Personal Digital Archiving guidelines; and even considers Wikipedia as an emerging resource on digital preservation.

The Signal interweaves interviews with leading practitioners and theorists–such as Matthew Kirschenbaum, Wolfgang Ernst, Lisa Green, Emily Gore, and Cal Lee–with reviews of digital curation tools and new vital resources like BitCurator and DROID, a file analyzer that uses the PRONOM technical registry for identifying files with unknown formats.

A blog on digital preservation and stewardship would not be complete without discussions on digital data. The Signal’s writers discuss recent initiatives like the National Agenda for Digital Stewardship’s File Formats Action Plans; and examine issues such as how to define “Big Data,” whether data as cultural objects will elicit the same emotional responses as their analog counterparts, Open Data, the importance of scaling high-performance computing resources to meet changing researcher needs, and data infrastructure projects. Likewise, the blog also explores ways we can analyze data, including through visualization tools like Viewshare for archival collections.

Juxtaposing the importance of saving the technologies of yesterday with the technologies of today, The Signal is a rich resource for those who want to learn about new tools and resources in the new information economy.