New Open Access Data Policy from the DOE

In response to the Office of Science and Technology Policy (OSTP)’s 2013 memo, “Expanding Public Access to the Results of Federally Funded Research,” the U. S. Department of Energy (DOE) has released their Public Access Plan.

The DOE is one of the first federal agencies to release their new OSTP-mandated open data policy. The  DOE describes their DOE Public Access Plan as one that:

“commits the Department to deploying a web-based portal, the DOE Public Access Gateway for Energy and Science (DOE PAGES), that will make scholarly scientific publications resulting from DOE research funding publicly accessible and searchable at no charge to readers; and to instituting data management principles and requirements that ultimately will apply to proposals for research funding submitted to all DOE program offices.”

Read the full DOE Public Plan here.

H/T to Scholarly Kitchen for the news.

Data Citation Index

The University Library now subscribes to the Data Citation Index from Thomson Reuters (which also provides Web of Science).  You can also access the Data Citation Index by searching for it in the Library’s Online Journals & Databases system.

The goal of the Data Citation Index is to support data discovery, reuse and interpretation.  To achieve this, the Data Citation Index brings together results from data repositories across disciplines.  The rough breakdown of repositories by discipline is: life sciences (48%), physical sciences (23%), social sciences (20%), arts & humanities (7%), and multidisciplinary (2%). Examples of repositories included are: Gene Expression Omnibus, WormBase, Dryad, NOAA National Geophysical Data Center, Inter-University Consortium for Political and Social Research (ICPSR), Archaeology Data Service, and figshare.

The Data Citation Index provides suggested citations for the data, based on the data citation recommendations of DataCite.org.

The Data Citation Index also provides links between the data and the articles that cite it.  For example, search for “GSE2814” to see mouse liver tissue expression data that has been cited by 6 articles in Web of Science.  Because data citation is not standardized or common practice, most data in the Data Citation Index has not been cited very often.  So currently, this is not a very robust feature, but it has interesting potential.

Social Science Data Repositories

The various disciplines of the social sciences yield research data as diverse as the repositories which preserve and make these data sets accessible. Among the many fields that comprise the social sciences–archaeology, geography, sociology, economics, political science, and psychology, to name a few–the numeric and spatial data they produce document a vast array of research on human behavior, culture, society, landscapes, and economic structures. Given the number of options, social science researchers must consider which repository is best suited for the long-term preservation and curation of their data.

Some social science data repositories are broad in scope; the ICPSR (Inter-university Consortium for Political and Social Research) at the University of Michigan, for instance, preserves data related to geography and environment, economic behavior and attitudes, community and urban studies, and education. Other repositories provide data curation services for a single discipline such as tDAR (The Digital Archaeological Record) for archaeology.

Different social science repositories will have varying requirements for data deposit. The ICPSR provides a comprehensive guide for preparing data for preservation and archiving. Such guidelines will outline specifics such as acceptable file formats for data submission,  requirements for metadata descriptions, and deposit of supporting documentation (such as laboratory notebooks) that aid in illustrating the context under which the data was created.

A list of social science repositories/resources for finding and depositing data can be found below:

Archaeology/Anthropology

ADS (Archaeology Data Service)

CoPAR (Council for the Preservation of Anthropological Records)

Open Context

Registry of Anthropological Data wiki

tDAR (The Digital Archaeological Record)

Criminal Justice

NACJD (National Archive of Criminal Justice Data – associated with ICPSR)

U.S. Bureau of Justice Statistics

Demographics/Government Data Repositories

Data.gov

Minnesota Population Center, University of Minnesota

U.S. Bureau of Labor Statistics

U.S. Bureau of Transportation Statistics

U.S. Census Bureau

Economics

National Bureau on Economic Research

Education

National Center on Education Statistics

General Social Sciences

Australian Data Archive

CESSDA (Consortium of European Social Science Data Archives)

ICPSR (Inter-university Consortium for Political and Social Research) at the University of Michigan. ICPSR also manages the NCAA Student-Athlete Experiences Data Archive

IQSS (The Institute for Quantitative Social Science) at Harvard University)

Pew Research Center

UK Data Archive

UK Data Service

UNESCO Institute for Statistics

Geosciences 

GeoCommons

GeoGratis

GeoNames

National Oceanographic Data Center

PostGIS

ShareGEO

Sociology/Public Opinion

American National Election Studies

National Data Archive on Child Abuse and Neglect at Cornell University

Roper Center Public Opinion Archives

While this is not a complete list, other data repositories can be found through data clearinghouses, such as OpenDOAR and Databib. The Scholarly Commons also maintains a list of Geospatial Data Repositories and Numeric Data Repositories. Likewise, the College of Liberal Arts and Sciences’ ATLAS provides a  list of repositories relating to public opinion and government data.

For more information on the Scholarly Commons’ services relating to the social sciences, please visit the Numeric and Spatial Data Services site.

 

“The Signal”: The Library of Congress’ Blog on the Digital Past and Future

Perhaps one of the most informative and engaging resources on digital preservation and stewardship is the Library of Congress’ blog, The Signal. Through interviews, discussions on digital preservation tools and resources, videos and tutorials, and highlights of the work of Library of Congress staff members, The Signal engages with archivists, data curators, and digital asset managers as they negotiate best practices and methodologies in the changing digital landscape.

The Signal profiles educational resources and professional reports to provide much-needed guidance for the archival and digital curation communities, including features on the National Digital Information Infrastructure Preservation Program (NDIIPP) report, Preserving.exe: Toward a National Strategy for Preserving Software; the Digital Preservation Coalition Technology Watch report Preserving Email, authored by Chris Prom from the University of Illinois; Personal Digital Archiving guidelines; and even considers Wikipedia as an emerging resource on digital preservation.

The Signal interweaves interviews with leading practitioners and theorists–such as Matthew Kirschenbaum, Wolfgang Ernst, Lisa Green, Emily Gore, and Cal Lee–with reviews of digital curation tools and new vital resources like BitCurator and DROID, a file analyzer that uses the PRONOM technical registry for identifying files with unknown formats.

A blog on digital preservation and stewardship would not be complete without discussions on digital data. The Signal’s writers discuss recent initiatives like the National Agenda for Digital Stewardship’s File Formats Action Plans; and examine issues such as how to define “Big Data,” whether data as cultural objects will elicit the same emotional responses as their analog counterparts, Open Data, the importance of scaling high-performance computing resources to meet changing researcher needs, and data infrastructure projects. Likewise, the blog also explores ways we can analyze data, including through visualization tools like Viewshare for archival collections.

Juxtaposing the importance of saving the technologies of yesterday with the technologies of today, The Signal is a rich resource for those who want to learn about new tools and resources in the new information economy.

New Version of the DMPTool Now Available

A new version of the DMPTool is now available!  The DMPTool is an online wizard for creating data management plans (DMPs), and it supports plans for a variety of funding agencies, including individual National Science Foundation (NSF) directorates and divisions, the National Institutes of Health (NIH), and the National Endowment for the Humanities (NEH) – Office of Digital Humanities.

New DMPTool features and functionality are highlighted in a short video and include:

  • Co-Owners: Plan creators can designate individuals as co-owners of specific plans, which allows the co-owners to edit and provide feedback on the plans within the DMPTool.
  • Reviews: Plan creators can request feedback on their plans.  At UIUC, librarians who are knowledgeable about data management requirements and resources will conduct the reviews.  Reviewers will comment on plans within the DMPTool.
  • Institutional Customization: The University Library added language and links for institutional resources, such as IDEALS (the institutional repository), and contact information for local data management assistance.
  • Updated Interface: The new version displays resource links and suggested responses in tabs and has new visuals.

Whether you are a new or existing DMPTool user, log in and see what the new version has to offer!  Since UIUC is a contributing institution, UIUC faculty, staff and students can log in to the DMPTool with their NetID and password.

If you would like additional assistance with creating a DMP, you can email (researchdata@library.illinois.edu) or call (217-244-1331) the Library.

Want Your Data to Last? Pay Attention to File Formats

When managing your own digital research data, do not take the file formats you use for granted.  A file format is a standardized way to structure the data stored in a computer file, and is most easily recognizable by the dot and “extension,” typically of two to four letters, at the end of its name (for example, birthdayParty.jpg indicates that this is a JPEG image file).  The long-term usability of your data often hinges on it being stored using a well-chosen archival file format.  As mentioned on the UIUC Scholarly Commons‘ Data Management webpage on File Formats and Organization:

“Our ability to preserve digital objects is dependent, among other things, on whether the file format used:

  • Is openly documented (more preservable) or proprietary (less preservable);
  • Is supported by a range of software platforms (more preservable) or by only one (less preservable);
  • Is widely adopted (more preservable) or has low use (less preservable);
  • Is lossless data compression (more preservable) or lossy data compression (less preservable); and
  • Contains embedded files or embedded programs/scripts, like macros (less preservable).”

data_management_format_matrix

Confidence in particular file formats may differ between domains.  If interested in learning more, seek out best practices documentation in your field, or request a consultation in the Scholarly Commons.

Data Literacy – New England Collaborative Data Management Curriculum (Soutter)

The New England Collaborative Data Management Curriculum (NECDMC) is a rich collection of  data management instruction materials. The project is an example of the best in collaboration – led by the Lamar Soutter Library at the University of Massachusetts Medical School and in partnership with the Woods Hole Oceanographic Institution’s Marine Biological Laboratory, Northeastern University, Tufts University, and the University of Massachusetts at Amherst.

The course’s Six modules address issues and shared challenges related to data management and incorporate best practices. The content is licensed through Creative Commons Attribution-Non Commercial-ShareAlike 3.0 United States License and users are free to adapt the content as needed.

In addition to the six modules and the cases, the partners have developed Research Cases, Activities, and Data Management Plan examples. Information about joining the collaboration can be found here.

Module Titles:

  • Module 1: Overview of Research Data Management
  • Module 2: Types, Formats, and Stages of Data
  • Module 3: Contextual Details Needed to Make Data Meaningful to Others
  • Module 4: Data Storage, Backup, and Security
  • Module 5: Legal and Ethical Considerations for Research Data
  • Module 6: Data Sharing & Reuse Policies
  • Module 7: Archiving and Preservation (In Progress)

OCLC Research Report on data management

OCLC Research recently published Starting the Conversation: University-wide Research Data Management Policy, a “call for action that summarizes the benefits of systemic data management planning and identifies the stakeholders and their concerns.” While the policy suggests that library directors may be uniquely poised to initiate the conversation, given their experience with data on all levels, the policy offers much more in the way of identifying key stakeholders and considerations for University data management.

The policy outlines several goals that provide context for the conversation:

  • Clear expectations that will ease the way for data managers.
  • Uniform requirements that will facilitate data understandability and sharing among researchers.
  • Consistent data management standards training and tracking programs that can foster harmony within the university.
  • A standardized approach to data management that will ease compliance and improve management of and access to the university’s intellectual assets.
  • Positive impacts and efficiencies that can benefit all research conducted at the university, not just that funded by agencies that require a data management plan

The report also identifies key campus stakeholders and provides a list of questions that could frame the conversation.

The report is a relatively short 25 pages and is worth a look for anyone involved in the development and implementation of university-wide data services.

Download the report (pdf)

 

Data Cite – Find, Identify, and Cite Datasets

Data Cite a non-profit organization created to establish easier access to research data,  increase acceptance of research data as legitimate, citable contributions to the scholarly record, and support data archiving.  This organization seeks to bring institutions, researchers and other interested groups together to address the challenges of making research data accessible and visible.  Through collaboration, researchers find support in locating, identifying, and citing research datasets with confidence.

Data Centers are provided persistent identifiers for datasets, plus workflows and standards for data publication. Journal publishers receive support to enable research articles to be linked with data.  Data Cite works with organizations, data centers, and libraries that host data in efforts to assign persistent identifiers to data sets.

Data citation is important for data re-use, verification and tracking.  Citable datasets become legitimate contributions to scholarly communication, paving the way for new metrics and publication models that recognize and reward data sharing. More information on  DataCite services, resources and events can be found  https://www.datacite.org/.

Joint Declaration of Data Citation Principles

The Data Citation Synthesis Group of Force11 (Future of Research Communications and eScholarship) has released the Joint Declaration of Data Citation Principles, which are intended to encourage “good practice” with respect to data citation.  From the preamble:

Sound, reproducible scholarship rests upon a foundation of robust, accessible data.  For this to be so in practice as well as theory, data must be accorded due importance in the practice of scholarship and in the enduring scholarly record.  In other words, data should be considered legitimate, citable products of research.  Data citation, like the citation of other evidence and sources, is good research practice and is part of the scholarly ecosystem supporting data reuse.

The group includes representation from major publishers, the National Academies, DataCite, the Research Data Alliance, and others.  The Data Citation Synthesis Group are seeking endorsement of the document from individuals and organizations concerned with data use and attribution.