Controlled vocabularies
Controlled vocabularies for metadata play a crucial role in systematically organising and managing information, particularly in the context of datasets. A controlled vocabulary consists of predefined terms with assigned meanings, ensuring consistency when describing and categorizing information.
In addition, persistent identifiers (PIDs) are used to provide robust links and avoid disambiguation when referring to definitions of individuals, organisations or places, and may also be used in a similar manner.
A selection of best practices regarding controlled vocabularies and persistent identifiers are listed below. These will make it more likely that Researchdata.se will be able to reuse the metadata when harvesting from a data source.
Organisations
For organisations, providing a ROR ID PID is the preferred method for identifying the organisation.
Examples:
https://ror.org/05ynxx418
https://ror.org/040wg7k59
https://ror.org/03zttf063
Persons
To identify a person, use ORCID, a PID for researchers and other contributors.
Examples:
https://orcid.org/0000-0002-9227-8514
Structure of the ORCID Identifier
Subjects & keywords
There are multiple vocabularies used to categorise and tag resources. Some of them cover several disciplines, while others are specific to a limited set of research domains.
AAT – The Art & Architecture Thesaurus
https://www.getty.edu/research/tools/vocabularies/aat/
Example term:
http://vocab.getty.edu/page/aat/300191918
Example usage
AGROVOC – Vocabulary for Agricultural Sciences
https://agrovoc.fao.org/browse/agrovoc
Example term:
http://aims.fao.org/aos/agrovoc/c_2536
Example usage
ALLFO – Allmän finländsk ontologi
Example term:
http://www.yso.fi/onto/yso/p13693
Example usage
ELSST – The European Language Social Science Thesaurus
Example term:
https://elsst.cessda.eu/id/4/a74cd285-d1c6-4e55-8c0d-faf0fe94399f
Example usage
<subjects>
<subject
subjectScheme="ELSST The European Language Social Science Thesaurus"
schemeURI="https://elsst.cessda.eu"
valueURI="https://elsst.cessda.eu/id/4/a74cd285-d1c6-4e55-8c0d-faf0fe94399f"
classificationCode="urn:ddi:int.cessda.elsst:a74cd285-d1c6-4e55-8c0d-faf0fe94399f:4"
xml:lang="en">ENVIRONMENT</subject>
</subjects>
{
"@context":"https://schema.org/",
"@type": "Dataset",
"keywords": [
{
"@type": "DefinedTerm",
"@id": "https://elsst.cessda.eu/id/4/a74cd285-d1c6-4e55-8c0d-faf0fe94399f",
"inDefinedTermSet": "https://elsst.cessda.eu",
"termCode": "urn:ddi:int.cessda.elsst:a74cd285-d1c6-4e55-8c0d-faf0fe94399f:4",
"name": "ENVIRONMENT"
}
]
}
EnvThes – Environmental Thesaurus
https://vocabs.lter-europe.net/envthes
Example term:
http://vocabs.lter-europe.net/EnvThes/20800
Example usage
FISH Thesaurus of Monument Types
https://collectionstrust.org.uk/resource/thesaurus-of-monument-types-fish
Example term:
http://purl.org/heritagedata/schemes/560/concepts/142104
Example usage
GCMD Vocabulary for Earth Science
https://www.earthdata.nasa.gov/learn/find-data/idn/gcmd-keywords
Example term:
https://gcmd.earthdata.nasa.gov/kms/concept/b3b14df8-5197-4a26-ae61-882fdba706f3
Example usage
<subjects>
<subject
subjectScheme="GCMD Vocabulary for Earth Science"
schemeURI="https://gcmd.earthdata.nasa.gov/kms/concepts/concept_scheme/sciencekeywords"
valueURI="https://gcmd.earthdata.nasa.gov/kms/concept/b3b14df8-5197-4a26-ae61-882fdba706f3"
classificationCode="b3b14df8-5197-4a26-ae61-882fdba706f3"
xml:lang="en">FOOD STORAGE</subject>
</subjects>
{
"@context":"https://schema.org/",
"@type": "Dataset",
"keywords": [
{
"@type": "DefinedTerm",
"@id": "https://gcmd.earthdata.nasa.gov/kms/concept/b3b14df8-5197-4a26-ae61-882fdba706f3",
"inDefinedTermSet": "https://gcmd.earthdata.nasa.gov/kms/concepts/concept_scheme/sciencekeywords",
"termCode": "b3b14df8-5197-4a26-ae61-882fdba706f3",
"name": "FOOD STORAGE"
}
]
}
GEMET – General Multilingual Environmental Thesaurus
http://www.eionet.europa.eu/gemet/
Example term:
http://www.eionet.europa.eu/gemet/concept/1500
Example usage
ICD-10 – International Classification of Diseases
https://icd.who.int/browse10/2019/en
Example term:
https://icd.who.int/browse10/2019/en#/I20-I25
Example usage
INSPIRE glossary
https://inspire.ec.europa.eu/glossary
Example term:
http://inspire.ec.europa.eu/glossary/Aqueduct
Example usage
LCSH – Library of Congress Subject Headings
http://id.loc.gov/authorities/subjects/
Example term:
https://id.loc.gov/authorities/subjects/sh2009009655
Example usage
MeSH – Medical Subject Headings
Example term:
http://id.nlm.nih.gov/mesh/D003069
Example usage
NASA Thesaurus / NASA STI Thesaurus
https://sti.nasa.gov/nasa-thesaurus
SSIF 2025 – Standard för svensk indelning av forskningsämnen 2025
Example usage
Geography
Geonames
Commonly used identifier service for countries, regions, cities, and other places.
Examples:
https://sws.geonames.org/2661886/ - Sweden
https://sws.geonames.org/2699050/ - Kronoberg
https://sws.geonames.org/2701727/- Karlshamn
Example usage
Language
Use ISO 639-3 language codes.
Examples: eng, deu, swe
Example usage
License
For recommendations on open licenses and marks in a Swedish context, see the following summary by Digg:
Rekommendation om öppna licenser och immaterialrätt
Examples:
https://creativecommons.org/publicdomain/zero/1.0/
https://creativecommons.org/licenses/by/4.0/
Example usage
Dataset type
To mark a specific type of dataset, the EU Vocabulary Dataset Type provides a way to classify a dataset as syntetic data, test data etc.
A good use case for providing a dataset type is to mark syntetic datasets.