Pseudonymisation of DICOM files
- Details
- Last Updated on Tuesday, 15 May 2012 08:15
- Written by P.F.C. Groot
dcm2nn-gui is a file-conversion tool for removing sensitive information from DICOM files. Good Clinical Practice (GCP) protocols require anonymisation (also de-identification or pseudonymisation) of data used in scientific research. However, most applications that support anonymisation of DICOM files offer limited functionality (i.e. only remove a limited number of predefined tags), are user unfriendly (i.e. require elaborate manual editing) or are expensive (i.e. part of a PACS). dcm2nn-gui was developed to fix all of these drawbacks and provides a flexible and efficient tool for researchers.
How does dcm2nn-gui work?
Input specification
Folder: After downloading and installing (see below) you can start the application by selecting the application icon from the start menu. dcm2nn-gui starts with a dialog window for specifying the DICOM input files. Recursive directory and wildcard lookup is supported. Regular expressions can be used to look for files that should match a very specific filename pattern. Typically, dicom files have no filename extension (e.g. when exported on a CD/DVD medium) or use .dcm to indicate the file type.
Filter file: this is an optional reference to a text file that can be used to specify which individual DICOM attributes should be removed, changed or maintained. An example file is provided in the directory where dcm2nn-gui was installed (dcmfilterdict.txt).

File search pattern: When only a part of a set of files should be included it is sometimes possible to use a wildcard expression. Accepted wildcards are an asterisk * (to indicate zero or more arbitrary characters) or a question mark ? (to indicate a single arbitrary character). For example: to search only for files that start with IM_ you could use the wildcard expression IM_*.

More complex filname expressions can be entered by using a regular expression (for more information: Qt).

After clicking the 'Find' button, a new progress window will be opened. This window displays status information during the collection of DICOM files. Each matching DICOM file will be opened to read a few specific DICOM attributes that are required to assemble related files into studies and series (i.e. using DICOM Study and Series Instance UIDs). Also, a few additional attributes - such as Patient Name - are loaded so they can be edited in the next window.

Output specification
When all matching DICOM files are analysed, a new window will be opened to display a table showing all recognised DICOM series. At this point you will have to replace patient related information with pseudonymised values, like study IDs and subject numbers. At the left side of the first column you can use the checkbox to include or exclude specific studies from the conversion. Below the table are some parameters available to configure the pseudonymisation process.

Options:
Output folder: Enter the directory path for the output files in this text box. This must be an existing path.
Names and structure: Select one of the predefined listed options, or define your own output naming and structure. Predefined options are:
- Keep original folder naming
dcm2nn-gui will automatically use the same subdirectory hierarchy of the input directory for the output files. - Use PatientID_PatientName\StudyDate_StudyInstUID\SOPClass\SeriesNr_SeriesDescr\InstanceNr
This will create a recognisable output tree with explicit Patient and scan information. However, in some cases the output paths of the individual files are not unique. So, when using this option you should make sure that different input files are not mapped to the same output file. (I.e. do not use the 'overwrite existing files option') - Use StudyUID\SeriesUID\InstanceUID
With this option dcm2nn will create folder and filenames using the (unique) DICOM identifiers. - Use PatientName(PatientID)-StudyDate\SeriesUID\InstanceUID
This option is almost the same option as 3). The only difference is that the top-level folder contains patient and study date information. (This will be based on the de-identified patient information.) - Use DICOMDIR structure
dcm2nn-gui will create files and folders with DICOMDIR compatible names. A DICOMDIR index file will automatically created. - Custom output path
Up to 5 user defined output paths can be predefined. A custom output path can be used to build new folder structure and filenames using DICOM attribute values. A dialog window assist with defining (symbolic) output paths:
Remove private fields: 'private fields' are manufacturer specific attributes that are not part of the DICOM standard. Some manufacturers include sensitive information in private fields, so it's good practice to exclude those attributes unless you require the information included in private attributes. Note that the filter file can be used for per-attribute configuration.
Remove unspecified fields: 'unspecified fields' are the attributes that are not explicitly configured in the filter file. Unspecified attributes will be removed from the output when this option is checked.
New unique IDs: normally this option should be enabled to make sure that the new files will get new unique study, series and instance IDs. This is important to make sure that the pseudonymised files will not have the same 'identity' as the original files.
Overwrite existing files: only check this option when you would like to overwrite existing files in the output directory. Note that you might overwrite newly created files when this option is eneabled and when the selected file and directory namings do not generate unique paths. (Select 'Keep original folder naming' or include InstanceUID in the filename to prevent this.)
Filename Extension: use this button group to specify the filename extension of the new dicom files:
- Keep extension (i.e. copy extensions from input files)
- set to .dcm (i.e force extension to dcm)
- no extension (i.e. remove any extension)
- other: (append a custom file extension)

Acknowledgement
dcm2nn-gui uses the dcmtk open source C++ library for reading and writing DICOM files. The platform independent Qt library is used for the graphical interface.
Known issues and wish list
1) Attributes like Patient Name and Patient ID can only be changed by editing the values manually. The upcoming version will have a feature to automatically generate new study attributes.
Downloads:
Setup for Windows XP/Vista/7: dcm2nn-gui-setup.msi (version 1.1.2, 2012-05-03)
Linux versions will be available soon...
Free Software Disclaimer
The free software programs provided by 3TMRI.nl may be freely distributed, provided that no charge above the cost of distribution is levied, and that the disclaimer below is always attached to it.
The programs are provided as is without any guarantees or warranty. Although the author has attempted to find and correct any bugs in the free software programs, the author is not responsible for any damage or losses of any kind caused by the use or misuse of the programs.
The author is under no obligation to provide support, service, corrections, or upgrades to the free software programs. For more information, please send and email to the
This email address is being protected from spambots. You need JavaScript enabled to view it.
support mailbox.
