Data quality

Ongoing cleanup of Master data

Aarhus University has been using PURE for more than 15 years. All this time, researchers, secretaries and others with PURE access have had the opportunity to create master data. Master data is, for example: external organisations, journals and publishers. Master data is the foundation for good data quality and for easy registration of publications and activities. Good data quality is a prerequisite for conducting analyses of publishing channels, collaboration partners and much more.

Over time, large amounts of master data have been created, some of them of relatively low quality, which makes it more difficult to register activities and research output, but also to analyse the above-mentioned relationships. Look at the example to the right for instance. A search for external organisations with the words "Oxford University" yields several poor results. One way this can be seen is in the form of 'clusters', ie. several organisations combined into one organisation. This makes it impossible to correctly measure how many times AU has co-published with the University of Oxford. Another example is the bottom one, where the external organisation is a 'free-text' description of an activity. This activity is now listed in PURE as an external organisation.

It is organisations like these, which the PURE team has started to clean up. This will make it easier to register publications, activities, etc., but also make it possible to make valid and good analyses. We will concentrate our efforts on cleaning up external organisations, journals and in the long run also publishers. It may result in some information on your publications, activities, etc. changing or disappearing. If important information is missing, you can enter this in the free text fields or contact the PURE team to talk about what should happen to the individual registration.

Below you can read more about how you can support good data quality, as well as what our cleanup work can mean for your content.

When you register an activity

A big part of the poor data quality stems from activities. Here, you are often required to add a type of master data, which does not necessarily make sense for the individual activity.

For example, if you want to register that you have been a member of a PhD assessment committee, you will be asked to select an Event, Organisation or External organisation. What you choose will be the title of your activity. Therefore it may seem natural to create a new external organisation for each assessment committee in which you participate. But these fields are not free-text fields.

This is a major problem for data quality and makes registering external organizations harder elsewhere.


To solve these problems, we have set up grouping organisations for several types of assessment committees - see the picture to the right for examples. Notice that at the bottom of each organisation it says "Temporary organisation". This describes the committee's temporary nature, not the organisation's registration in PURE. It is also a strong indication that the organisation has been set up centrally by the PURE team, if you look for other potential grouping organisations. These grouping organisations will primarily be used for retrospective clean-up.

If you have been part of a PhD assessment committee, you should not choose this external organisation. Choose an existing external organisation, for example "University of Oxford" and then desribe the committee in the free text field "Description". Of course, you can also add people and organisations under "People/Organisations".

Remember to always fill in "Information on secondary employment".

When you register research output

At Aarhus University, researchers and employees collaborate with universities, hospitals, NGOs etc. from all over the world. In order for us to meaningfully analyse that, we need to focus on parent organisations.

On most publications, the authors' affiliation is not just to an university, but also a department, centre or perhaps a specific laboratory. We cannot support this level of organisational hierarchy, neither organisationally or analytically. Therefore, enter 'only' the university, hospital etc.

If an author has more than one affiliation, these are not written together in a 'cluster'. They should be added as different organizations.

Journals and book series

As a general rule, we do not store journals and book series without an ISSN number. If you can't find the book series or journal you've published in, you can choose to create a new one. Remember to assign an ISSN number. If you are in doubt, you can add the grouping-journal "Ikke-valideret tidsskrift (Non-validated journal). The journal is checked in connection with the Pure team's validation of publications.

Consequences for your content

Activities

As mentioned under "When you register an activity", external organisations that we cannot validate and maintain as a stand-alone organisation may be merged into grouping-organisations. This has consequences for both the amount of information and how it is displayed on the website.

There is a risk that essential details will be lost if they were not stored elsewhere in the registration. For example, in the form of an organisation called "Member of PhD assessment committee by University of Oxford". The information that it took place at Oxford disappears if merged into the grouping organisation. It is a consequence of improper use of master data, which is regrettable. Use free text for anything that isn't master data.

The grouping-organisations also provide a less detailed display on the website, as exemplified in the image to the right. This is because the organisation is used as the title. The same problem arises with editorial work on journals and book series when merged into the "Ikke-valideret tidsskrift". We are in dialogue with the supplier of the system about getting the opportunity to choose a title for these activities that can be used for a better display.

Publications

In the ongoing clean-up of journals and book series, some journals/book series that are either discontinued or impossible to validate will be deleted or changed to a consolidated journal called "Non-validated journal". On several publications, affiliations, especially for external authors, will change or disappear altogether. This happens most often on older publications. The affiliations are only visible in the backend and are used solely for analyses of collaborations.