The tables in the following subsections depict which criteria from the evaluation rubric we were able to express via SPARQL queries.
We want to stress that some queries could be formulated less strict, i.e. OPTIONAL
blocks could be inserted for triple patterns that join elements from the maDMP schema that are, by the schema definition, optional. However, as this project is more of a proof-of-concept-kind, this could be easily be done when extending or building upon the work at hand - in order not to “lose” any results/information about the corresponding maDMP.
The queries can be found in the corresponding directory in the GitHub repository. In the next subsections, the queries are referred to by their filename without extension, e.g. a reference to 5-a-1
is points to the query file 5-a-1.sparql
.
Requirement | Covered In | Remarks |
Administrative information |
Provide information such as name of applicant, project number, funding programme, version of DMP.Provide information such as name of applicant, project number, funding programme, version of DMP. | 0-1 , 0-2 , 0-3 | Query 0-1 returns the basic information, i.e. the author, title, created date and language of the maDMP as well as the ID of the corresponding DMP. Query 0-2 gathers all important information available for the corresponding project, whereas query 0-3 collects information about the funding of the project. |
Data Description and Collection or Re-Use of Existing Data
Requirement | Covered In | Remarks |
1a How will new data be produced and/or how will existing data be re-used? |
Explain which methodologies or software will be used if new data are collected or produced. | / | Information provided by the methodology field in the dataset structure - however, this field is only specified in the funder extension and is not included in the RDA-DMP Common Standard; therefore, it can not be translated when converting the JSON files to a JSON-LD format and in consequence, not be queried. |
State any constraints on re-use of existing data if there are any. | / | Not really covered by maDMP. |
Explain how data provenance will be documented. | / | Not really covered by maDMP. |
Briefly state the reasons if the re-use of any existing data sources has been considered but discarded. | / | Not really covered by maDMP. |
1b What data (for example the kind, formats, and volumes) will be collected or produced? |
Give details on the kind of data: for example, numeric (databases, spreadsheets), textual (documents), image, audio, video, and/or mixed media. | 1-b-1 | Queries all declared datasets and displays their title, type and identifier. |
Give details on the data format: the way in which the data is encoded for storage, often reflected by the filename extension (for example pdf, xls, doc, txt, or rdf). | 1-b-2 | Returns the data formats of each specified distribution (including the respective access URL and description of the distribution). |
Justify the use of certain formats. For example, decisions may be based on staff expertise within the host organisation, a preference for open formats, standards accepted by data repositories, widespread usage within the research community, or on the software or equipment that will be used. | / | Not really covered by maDMP. |
Give preference to open and standard formats as they facilitate sharing and long-term re-use of data (several repositories provide lists of such ‘preferred formats’). | / | Not directly covered by maDMP; difficult to cover with a simple SPARQL query. |
Give details on the volumes (they can be expressed in storage space required (bytes), and/or in numbers of objects, files, rows, and columns). | 1-b-3 | Displays for each defined distribution its size in bytes. |
Documentation and Data Quality
Requirement | Covered In | Remarks |
2a What metadata and documentation (for example the methodology of data collection and way of organising data) will accompany the data? |
Indicate which metadata will be provided to help others identify and discover the data. | 2-a-1 , 2-a-2 | 2-a-1 collects all information provided by the metadata field, i.e. a description (optional), the used standard and the language. 2-a-2 displays the specified keywords for each defined dataset. |
Indicate which metadata standards (for example DDI, TEI, EML, MARC, CMDI) will be used. | 2-a-1 | Information about the used metadata standards is covered in this query. |
Use community metadata standards where these are in place. | 2-a-3 | Example query for testing whether certain community standards (Dublin Core, DDI, EML, TEI or MARC) are used. This can be arbitrarily modified based on which standards are preferred. |
Indicate how the data will be organised during the project mentioning, for example, conventions, version control, and folder structures. Consistent, well-ordered research data will be easier to find, understand, and re-use. | 2-a-4 | Displays whether the given distribution hosts support versioning. The other information is not really covered by maDMP; if it is included in the maDMP, then probably in the data_quality_assurance field which is covered by query 2-b-1 . |
Consider what other documentation is needed to enable re-use. This may include information on the methodology used to collect the data, analytical and procedural information, definitions of variables, units of measurement, and so on. | / | This information would (if anything) probably be included in the methodology field in the dataset structure - however, this field is only specified in the funder extension and is not included in the RDA-DMP Common Standard; therefore, it can not be translated when converting the JSON files to a JSON-LD format and in consequence, not be queried. |
Consider how this information will be captured and where it will be recorded (for example in a database with links to each item, a 'readme' text file, file headers, code books, or lab notebooks). | / | Not really covered by maDMP. |
2b What data quality control measures will be used? |
Explain how the consistency and quality of data collection will be controlled and documented. This may include processes such as calibration, repeated samples or measurements, standardised data capture, data entry validation, peer review of data, or representation with controlled vocabularies. | 2-b-1 | The best one can do is with the data_quality_assurance element. |
Storage and Backup During the Research Process
Requirement | Covered In | Remarks |
3a How will data and metadata be stored and backed up during the research? |
Describe where the data will be stored and backed up during research activities and how often the backup will be performed. It is recommended to store data in least at two separate locations. | 3-a-1 | Retrieving information about backups is only possible by querying the host element. If provided, the query returns the backup type and frequency for each specified host, as well as some information about the host. |
Give preference to the use of robust, managed storage with automatic backup, such as provided by IT support services of the home institution. Storing data on laptops, stand-alone hard drives, or external storage devices such as USB sticks is not recommended. | / | Not really covered by maDMP. |
3b How will data security and protection of sensitive data be taken care of during the research? |
Explain how the data will be recovered in the event of an incident. | / | Not really covered by maDMP. |
Explain who will have access to the data during the research and how access to data is controlled, especially in collaborative partnerships. | 3-b-1 | The best one can do is with the security_and_privacy field. Information about the availability of data hosts is included in query 3-a-1 . |
Consider data protection, particularly if your data is sensitive (for example containing personal data, politically sensitive information, or trade secrets). Describe the main risks and how these will be managed. | 3-b-2 | Description of risks and countermeasures are not really covered by maDMP. Information about whether data are sensitive is covered. |
Explain which institutional data protection policies are in place. | / | Information provided by the related_policy field in the dmp structure - however, this field is only specified in the funder extension and is not included in the RDA-DMP Common Standard; therefore, it can not be translated when converting the JSON files to a JSON-LD format and in consequence, not be queried. |
Legal and Ethical Requirements, Code of Conduct
Requirement | Covered In | Remarks |
4a If personal data are processed, how will compliance with legislation on personal data and security be ensured? |
Ensure that when dealing with personal data, data protection laws (for example GDPR) are complied with. (including sub-points) | 5-a-2 | If anything, information about consent for preservation or sharing and anonymization would be included in the preservation_statement which is already covered by query 5-a-2 . The other aspects are not really covered by maDMP. |
4b How will other legal issues, such as intellectual property rights and ownership, be managed? What legislation is applicable? |
Explain who will be the owner of the data, meaning who will have the rights to control access. (including sub-points) | 3-b-1 , 5-a-3 | If anything, access restrictions would be included in the security_and_privacy field which is already covered by query 3-b-1 . Descriptions of the licenses in place are queried with query 5-a-3 . |
Indicate whether intellectual property rights (for example Database Directive, sui generis rights) are affected. If so, explain which and how will they be dealt with. | / | Not really covered by maDMP. |
Indicate whether there are any restrictions on the re-use of third-party data. | / | Not really covered by maDMP. |
4c What ethical issues and codes of conduct are there, and how will they be taken into account? |
Consider whether ethical issues can affect how data are stored and transferred, who can see or use them, and how long they are kept. Demonstrate awareness of these aspects and respective planning. | 4-c-1 , 4-c-2 | Query 4-c-1 checks whether ethical issues exist. Query 4-c-2 returns a description of the specified ethical issues, if there are any, as well as the ethical issues report, if there is one. |
Follow the national and international codes of conducts and institutional ethical guidelines, and check if ethical review (for example by an ethics committee) is required for data collection in the research project. | / | Not really covered by maDMP. |
Data Sharing and Long-Term Preservation
Requirement | Covered In | Remarks |
5a How and when will data be shared? Are there possible restrictions to data sharing or embargo reasons? |
Explain how the data will be discoverable and shared (for example by deposit in a trustworthy data repository, indexed in a catalogue, use of a secure data service, direct handling of data requests, or use of another mechanism). | 5-a-1 | Displays information about distribution such as Host, access URL and distributed file formats. |
Outline the plan for data preservation and give information on how long the data will be retained. | 5-a-2 | The best one can do is with the preservation statement. However, note that this query does not return anything for our input files in JSON-LD format, probably because the preservation_statement field in the JSON files is ignored by the DCSO-JSON tool (see "Known issues" in the README of the tool) and hence, not converted. In consequence, this field can obviously not be queried. |
Explain when the data will be made available. Indicate the expected timely release. Explain whether exclusive use of the data will be claimed and if so, why and for how long. Indicate whether data sharing will be postponed or restricted for example to publish, protect intellectual property, or seek patents. | 5-a-3 | Gathers information about the data usage constraints (license, embargo period, data access, release data). |
Indicate who will be able to use the data. If it is necessary to restrict access to certain communities or to apply a data sharing agreement, explain how and why. Explain what action will be taken to overcome or to minimise restrictions. | / | Not really covered by maDMP. Possible information contained in maDMP is already queried by 5-a-3 . |
5b How will data for preservation be selected, and where data will be preserved long-term (for example a data repository or archive)? |
Indicate what data must be retained or destroyed for contractual, legal, or regulatory purposes. | / | Not explicitly covered by maDMP; maybe with 5-a-2 |
Indicate how it will be decided what data to keep. Describe the data to be preserved long-term. | / | Not explicitly covered by maDMP; maybe with 5-a-2 |
Explain the foreseeable research uses (and/or users) for the data. | / | Not covered by maDMP. |
Indicate where the data will be deposited. If no established repository is proposed, demonstrate in the DMP that the data can be curated effectively beyond the lifetime of the grant. It is recommended to demonstrate that the repositories policies and procedures (including any metadata standards, and costs involved) have been checked. | 5-b-1 | Enumerates all information available about the hosts mentioned in the maDMP. |
5c What methods or software tools are needed to access and use data? |
Indicate whether potential users need specific tools to access and (re-)use the data. Consider the sustainability of software needed for accessing the data. | 5-c-1 | A stripped down version of 5-a-1 , with focus on the distributed file formats. This is the best we can get with the maDMP since file formats indicate software/tools to be used to read the files. |
Indicate whether data will be shared via a repository, requests handled directly, or whether another mechanism will be used? | 5-a-1 | There is no dedicated field in the maDMP for this. However, 5-a-1 obtains data the requested information can inferred from. |
5d How will the application of a unique and persistent identifier (such as a Digital Object Identifier (DOI)) to each data set be ensured? |
Explain how the data might be re-used in other contexts. Persistent identifiers (PIDs) should be applied so that data can be reliably and efficiently located and referred to. PIDs also help to track citations and re-use. | 5-d-1 | This is a slightly modified version of 5-a-1 , with an emphasis on the employed PID system. |
Indicate whether a PID for the data will be pursued. Typically, a trustworthy, long-term repository will provide a persistent identifier. | 5-d-2 | Tests whether there exists a distribution with a host that specifies the use of a PID system. |
Data Management Responsibilities and Resources
Requirement | Covered In | Remarks |
6a Who (for example role, position, and institution) will be responsible for data management (i.e. the data steward)? |
Outline the roles and responsibilities for data management/ stewardship activities for example data capture, metadata production, data quality, storage and backup, data archiving, and data sharing. Name responsible individual(s) where possible. | 6-a-1 , 6-a-2 | The queries show all available information about the contact person and contributors. |
For collaborative projects, explain the co-ordination of data management responsibilities across partners | 6-a-2 | Depicts information about contributors defined by the maDMP. |
Indicate who is responsible for implementing the DMP, and for ensuring it is reviewed and, if necessary, revised. | / | Not explicitly covered by maDMP, but 6-a-1 and 6-a-2 give a good indicator of who might be responsible. |
Consider regular updates of the DMP. | / | Not covered by maDMP. |
6b What resources (for example financial and time) will be dedicated to data management and ensuring that data will be FAIR (Findable, Accessible, Interoperable, Re-usable)? |
Explain how the necessary resources (for example time) to prepare the data for sharing/preservation (data curation) have been costed in. | 6-b-1 | Not explicitly covered by maDMP; related information may be found by 6-b-1 . |
Carefully consider and justify any resources needed to deliver the data. These may include storage costs, hardware, staff time, costs of preparing data for deposit, and repository charges. | 6-b-1 | Lists everything related to costs that is captured by the maDMP. |
Indicate whether additional resources will be needed to prepare data for deposit or to meet any charges from data repositories. If yes, explain how much is needed and how such costs will be covered. | 6-b-2 | Specifies equipment needed or used to create or process the data. |
Summary
Overall, the Science Europe Evaluation Rubric defines 6 broad categories in its assessment guideline. The following table gives of the spectrum we were able to cover with our queries.
Category | Number Of Subitems | Largely Covered Subitems | Percentage |
0 General Information | 1 | 1 | 100 % |
1 Data Description and Collection or Re-Use of Existing Data | 9 | 3 | 33 % |
2 Documentation and Data Quality | 7 | 5 | 71 % |
3 Storage and Backup During the Research Process | 6 | 3 | 50 % |
4 Legal and Ethical Requirements, Code of Conduct | 6 | 3 | 50 % |
5 Data Sharing and Long-Term Preservation | 12 | 8 | 67 % |
6 Data Management Responsibilities and Resources | 7 | 5 | 71 % |
Sum | 48 | 28 | 58 % |