OCLC Research Report on data management

OCLC Research recently published Starting the Conversation: University-wide Research Data Management Policy, a “call for action that summarizes the benefits of systemic data management planning and identifies the stakeholders and their concerns.” While the policy suggests that library directors may be uniquely poised to initiate the conversation, given their experience with data on all levels, the policy offers much more in the way of identifying key stakeholders and considerations for University data management.

The policy outlines several goals that provide context for the conversation:

  • Clear expectations that will ease the way for data managers.
  • Uniform requirements that will facilitate data understandability and sharing among researchers.
  • Consistent data management standards training and tracking programs that can foster harmony within the university.
  • A standardized approach to data management that will ease compliance and improve management of and access to the university’s intellectual assets.
  • Positive impacts and efficiencies that can benefit all research conducted at the university, not just that funded by agencies that require a data management plan

The report also identifies key campus stakeholders and provides a list of questions that could frame the conversation.

The report is a relatively short 25 pages and is worth a look for anyone involved in the development and implementation of university-wide data services.

Download the report (pdf)


Data Cite – Find, Identify, and Cite Datasets

Data Cite a non-profit organization created to establish easier access to research data,  increase acceptance of research data as legitimate, citable contributions to the scholarly record, and support data archiving.  This organization seeks to bring institutions, researchers and other interested groups together to address the challenges of making research data accessible and visible.  Through collaboration, researchers find support in locating, identifying, and citing research datasets with confidence.

Data Centers are provided persistent identifiers for datasets, plus workflows and standards for data publication. Journal publishers receive support to enable research articles to be linked with data.  Data Cite works with organizations, data centers, and libraries that host data in efforts to assign persistent identifiers to data sets.

Data citation is important for data re-use, verification and tracking.  Citable datasets become legitimate contributions to scholarly communication, paving the way for new metrics and publication models that recognize and reward data sharing. More information on  DataCite services, resources and events can be found  https://www.datacite.org/.

Identify yourself! Use the ORCID Registry

Your name identifies you as the author of your work, but does it do so unambiguously?  Does it appear in the same form on all your work? Has your name changed through life transitions?  Do others have the same or a very similar name?  Could someone interested in your contributions to the scholarly record easily use your name to find your work?  The reality is that names are neither consistent nor unique, and thus make poor identifiers.  This is an issue that affects all involved in the research process: funders, institutions, publishers, and you.

ORCID iD iconThe Open Researcher and Contributor ID (ORCID) was established in 2010 to address the challenge of identity for researchers.  Working “for the benefit of all stakeholders, including research organizations, research funders, organizations, publishers, and researchers,” the ORCID community strives to establish a “permanent, clear and unambiguous record of research and scholarly communication by enabling reliable attribution of authors and contributors.”

The ORCID Registry is free to individuals, and to date, there are more than 550,000 researchers with ORCID identifiers in the Registry.   Click here to add yourself to the ORCID Registry and obtain your ORCID identifier.  The Registry can hold your name, education, institutional affiliation(s), corresponding websites, funding you have received, and your work.  You can control which information in your record is public or private.  You can import the citations for your work from a variety of partnering organizations including professional societies and commercial databases such as Web of Science and Scopus.  If you already have a ResearcherID or Scopus Author ID, you can import information from those records, and include those IDs in the ORCID Registry.  The ORCID identifier is a 16-digit string preceded by “http://orcid.org” to create a URI for the corresponding record in the registry; e.g., http://orcid.org/0000-0001-5109-3700.  Once you have your ORCID identifier, you can include it on manuscript or proposal submissions to unambiguously identify the work as yours.

ORCID identifiers are compatible with the ISO Standard (ISO 27729), also known as the International Standard Name Identifier (ISNI).  ORCID identifiers are used by the Clearinghouse for Open Research of the United States (CHORUS) and SHared Access Research Ecosystem (SHARE) systems to facilitate mandated sharing of funded research.  Application programming interface (APIs) for communication between the ORCID Registry and other systems are available, and the community is committed to the development of Open Source code to facilitate communication across systems.

“ORCID identifiers are part of a larger community effort to create interoperable research infrastructures through adoption and use of trusted persistent identifiers and standard vocabularies and record formats to promote data quality in the collection, management, exchange and aggregation of research information.”

The ORCID members and sponsors include research institutes, scholarly societies, commercial publishers, academic institutions and their libraries, digital repositories, and government agencies and funders such as the National Institutes of Health.  As of November 2013, there were more than 90 ORCID member organizations, and more than 50 ORCID integrators had added ORCID identifier functionality to their systems. ScieNCV, an inter-agency researcher profile system in use by the National Institutes of Health and coming to the National Science Foundation in 2014 includes ORCID identifiers.  Going forward, researchers should expect an increasing number of submission venues to accept and make effective use of ORCID identifiers.

Further reading:

International Association for Social Science Information Services and Technology

The International Association for Social Science Information Services and Technology (IASSIST) is an organization for “professionals working in and with the field of information technology and data services to support research and teaching in the social sciences.”

IASSIST members come from diverse fields such as data archives, statistical agencies, libraries, government, and universities. IASSIST helps to connect the interests of the following communities:

  • Social science researchers and scientists
  • Information specialists
  • Methodologists and computing specialists

The stated goals of IASSIST are to:

  • “Promote a network of excellence for data service delivery;
  • “Improve the infrastructure in the field of social sciences;
  • “Provide opportunities for collegial exchange of sound professional practices.”

IASSIST offers many resources for research data education and networking with other professionals.  The IASSIST annual meeting is held at sites around. The 2014 meeting will be June 3-6, 2014 in Toronto, Canada. Members can  subscribe to the IASST-L discussion list, and join one of IASSIST’s many committees and interest groups, such as the Data Citation and Data Visualization interest groups.

Learn more about IASSIST at http://www.iassistdata.org/ (information also available in French, German, Portuguese and Spanish).

Joint Declaration of Data Citation Principles

The Data Citation Synthesis Group of Force11 (Future of Research Communications and eScholarship) has released the Joint Declaration of Data Citation Principles, which are intended to encourage “good practice” with respect to data citation.  From the preamble:

Sound, reproducible scholarship rests upon a foundation of robust, accessible data.  For this to be so in practice as well as theory, data must be accorded due importance in the practice of scholarship and in the enduring scholarly record.  In other words, data should be considered legitimate, citable products of research.  Data citation, like the citation of other evidence and sources, is good research practice and is part of the scholarly ecosystem supporting data reuse.

The group includes representation from major publishers, the National Academies, DataCite, the Research Data Alliance, and others.  The Data Citation Synthesis Group are seeking endorsement of the document from individuals and organizations concerned with data use and attribution.


Research Data Alliance

The Research Data Alliance (RDA) was established as a organization to promote the “cross-disciplinary and cross-border” sharing of research data through a network of focused  Interest Groups and Working Groups. Participants include researchers, scientists, data practitioners and other interested stakeholders from academia, industry and government around the world.

The RDA was founded in 2013 in an international collaboration between the European Commission, the National Science Foundation and National Institute of Standards and Technology in the United States, and the Australian Government’s Department of Innovation.

The RDA Council provides oversight of working groups to ensure sustainability and alignment with RDA goals. Working Groups and Interest Groups facilitate knowledge exchange and sharing, and also discuss barriers and possible solutions to enhance global data sharing.  The third biannual plenary meeting of the RDA will be in Dublin, Ireland on March 26-28, 2014.

Participation is open to any organization or individual willing to abide by the RDA principles, so it’s easy to get involved:  Go to the RDA website to register as a member and then subscribe to a Working Group or Interest Group to learn about the latest developments in research data for your specialized area of interest.

March 15: Computational Social Science Workshop


90% of the world’s data was generated over only the past 2 years. The volume of data, the variety of data available, and the velocity with which it is being generated create both challenges and opportunities for researchers. New methods of analysis are necessary to make sense of our data-rich world, and so the Scholarly Commons and I-CHASS are offering a FREE workshop for graduate students that gives an overview of cutting edge computational data analysis approaches and tools so that you can get started using these exciting new research methods in your own work.

The workshop is on Saturday, March 15th, from 9:00am to 4:30pm at the I-Hotel. Lunch, snacks, and coffee breaks will be provided. The topics that will be covered tentatively include: text analysis, network analysis, geographical information systems (GIS), data visualization, machine learning, and the locating, preparing, and managing of large datasets on and offline.

More information is at http://chass.illinois.edu/index.php/workshops/ . If you have any questions about the conference, please contact Harriett Green at green19@illinois.edu. This workshop is free and open to all graduate students. Conference registration is limited, so register early at this link.

Sponsored by the Institute for Computing in Humanities, Arts, and Social Science, the Scholarly Commons, and the Department of Communication.

Digital Library Federation

The Digital Library Federation (better known as the DLF) is a professional organization focused on advancing “research, teaching and learning through the application of digital library research, technology, and services.”  The DLF exists as a program within the Council of Library and Information Resources (CLIR), an independent non-profit organization based in Washington, D.C. that serves as a neutral body to support the work of libraries and information providers. Members of the DLF are institutions only, but anyone can participate in their annual conference, the DLF Forum.

The annual DLF Forum provides a place for librarians, information scientists, IT professionals, and other interested researchers to discuss the latest issues surrounding digital technologies, education, and libraries, focusing on issues such as data curation, linked data, learning technologies, and digital humanities. The DLF supports other professional events and workshops throughout the year as well, including the upcoming Leadership, Technology, and Gender Summit.

The DLF and CLIR also have published a number of expert reports and studies on research data management and digital scholarship. Here are a few reports on research data that might be of interest to you:

Research Data Management: Principles, Practices, and Prospects.

The Problem of Data

“Rome Wasn’t Digitized in a Day”: Building a Cyberinfrastructure for Digital Classicists

Digital Forensics and Born-Digital Content in Cultural Heritage Collections

One Culture. Computationally Intensive Research in the Humanities and Social Sciences.

Databib: A Directory of Research Data Repositories

Databib is a collaborative, annotated directory of research data repositories.  It currently includes over 600 data repositories, which are international in scope and cover a variety of disciplines.

Databib’s repository records include a URL, subject tags, a brief description, and information about how the existing data can be reused and whether new data can be deposited.  The records can be searched using a basic keyword search or via an advanced search targeting specific fields. They also can be browsed alphabetically or by broad subjects.

With its useful search functionality and regularly updated content, Databib can be a helpful tool to discover new research data repositories and to identify appropriate repositories where you can submit or acquire data.

Databib can also help increase the visibility of research data repositories:  Anyone can submit a new record or edit an existing record, and an editorial board then reviews additions and changes before they are accepted into Databib.

Openness is one of the guiding principles of Databib:  All Databib data are made available to the public domain using the Creative Commons Zero protocol.  All of the records can be downloaded in RDF/XML format.  Databib also supports OpenSearch, which enables users to search Databib directly from their browsers without having to return to the Databib website.

Originally sponsored by an IMLS Sparks! Innovation National Leadership Grant awarded to Purdue University and Pennsylvania State University, Databib is now guided and maintained by international advisory and editorial boards.  The Purdue University Libraries hosts Databib.  Databib is endorsed by DataCite, which features the Databib list of repositories on its website.

It is also worth noting that the Registry of Research Data Repositories at re3data.org is a similar initiative, funded by the German Research Foundation DFG for 2012-2014.  The goal of re3data.org is to create a global registry of research data repositories.

If you have questions about using Databib to submit or acquire data, contact the Library at researchdata@library.illinois.edu.

Using EZID to obtain DOIs for your data – a pilot research data service

The University Library is undertaking a pilot for assigning DOIs to data resources created by researchers at the University of Illinois at Urbana-Champaign.  The pilot will take place during the spring and summer of 2014, and will be open to University of Illinois at Urbana-Champaign students, faculty, and staff.

What is a DOI?

The Digital Object Identifier System (DOI) is an ISO standard (ISO 26324) which is well-established and in current use by more than 5,000 naming authorities, including most publishers of scholarly content. DOIs frequently appear in bibliographic citations, and can be used to access a cited work. For example, the article with the DOI 10.1016/j.egypro.2013.06.482 can be accessed using the URL http://dx.doi.org/10.1016/j.egypro.2013.06.482. The DOI standard is the foundation of CrossRef’s reference linking service, which allows location and tracking of both cited and citing references in the scholarly record.

Why are we doing this?

A critical component of data management is a “persistent approach to access, identification, sharing, and re-use” of data. DataCite was founded in 2009 with three fundamental goals:

  • establish easier access to scientific research data on the Internet,
  • increase acceptance of research data as legitimate, citable contributions to the scientific record, and
  • support data archiving that will permit results to be verified and re-purposed for future study.

DataCite allows the assignment of unique and persistent identifiers for data, in the form of Digital Object Identifiers (DOIs), by establishing a persistent association between a character string and a resource (e.g., data). Use of DataCite’s registry to mint DOIs for University of Illinois data resources was among the recommendations of the eResearch Services Task Force and the Campus Data Stewardship Committee’s proposal for our Campus Research Data Service.

How will we do it?

Libraries are using the EZID (easy-eye-dee, http://ezid.lib.purdue.edu/ezid/) user interface and web Application Programming Interface (API), developed by the California Digital Library, to provide access to the DataCite metadata registry service.  EZID provides an easy way to add metadata to the DataCite registry to describe resources and obtain and manage persistent identifiers, and is a means of implementing the eResearch Task Force and Campus Data Stewardship Committee recommendations.

Who may participate?

Individual students, faculty and staff as well as research groups/programs at the University of Illinois at Urbana-Champaign that have created data resources are eligible to participate in the pilot.

What types of resources are eligible for inclusion in the pilot?

Data resources such as simple data sets (e.g., spreadsheets, CSV files), visualizations, software/code, or collections are eligible.   We will exclude resources which already have DOIs, and resources that are not data resources (e.g., article preprints or reports).   As a requirement of the pilot, the resource must be accessible via the World Wide Web, and the creator must supply basic metadata about the resource to include in the DOI registry.  Where appropriate, participants will be strongly encouraged to deposit their data resources into our institutional repository, IDEALS, to ensure their long-term preservation.  If IDEALS deposit is not an option (e.g, for data sets that are actively growing), participants will be responsible for keeping the DOI registry up to date in the event the resource is moved to a new URL.

Apply to participate in the EZID Pilot!