Attachment Migration

This appendix describes an ExtraView command-line utility that moves file attachments from internal storage within the database, to the file system.

By default, file attachments uploaded to ExtraView are stored as binary objects in the database. The ExtraView Administrator has the choice to configure the system to store some or all of the uploaded attachments to a file system accessible by the application server.

Within ExtraView, the program keeps track of where all particular attachment are located. Prior to the conversion, all attachments are marked as being stored “internally” – within the database. As part of the migration process, as each individual file is extracted and copied to the final location on the external file system, the system marks that attachment as now being stored “externally”. This allows the migration process to be done as a background task, without impacting the availability of ExtraView or its attachments to the end users.

This provides flexibility – the Administrator can choose to have all files stored internally in the database, all files stored externally in the file system, or have a mixture of some file types stored within the database, and some file types stored on the file system.

Configuring External Storage

Note: It is strongly recommend that you back up your database prior to performing any of the procedures described below.

To configure attachment storage options, log in to ExtraView with the Administrator role. Click on the Admin navigation button, then the System Controls tab, then select All Behavior Settings, and edit the following settings:

  • ATTACHMENT_REPOSITORY_OPT - This setting controls whether attachments, document field and image fields are stored internally within the database, externally on the file system, or in some combination of the two methods.

    If the value of this setting is INTERNAL (the default value), then all attachments, document fields and image fields are stored internally within the database. If the value is set to EXTERNAL, then all the attachments, document fields and image fields are stored on the external file system.

    Alternatively, you may provide a comma separated list of file extensions and then all files with these extensions are stored in the database while all others are stored externally in the file repository.

    For example, if you set the value of this setting to txt, doc, docx, htm, html then files of these types will be stored within the database. This strategy leaves the files stored internally in the database as searchable by keywords, while image and video files with extensions such as jpg and mp4 are stored externally.

    Regardless of the setting, ExtraView will continue to store dynamically generated thumbnail-sized image files in the database as these are relatively small files and for performance reasons must be more readily available.

  • ATTACHMENT_REPOSITORY_ROOT – This setting defines the location outside the database where attachments, document fields and image fields will be stored. You must ensure that the path is valid from the application server(s) that are running ExtraView and that you have all the permissions to read and write to the storage. In a clustered or load-balanced environment, you must ensure that all instances of ExtraView can read and write to the same file system using this path.

    Please note – if you change this setting, new attachments, document field and image fields will be saved to the new location, and existing attachments, document fields and image fields will still reference the old location. You will need to ensure that the previous file path is still accessible by the ExtraView instance, or these existing attachments will not be available to end users.

    Set up a separate backup method for this external storage as backing up your database will no longer backup the file attachments

  • ATTACHMENT_REPOSITORY_DMAX - This setting defines the maximum number of files or directories that will be created under one node of the external directory structure. The default for this value is 999. It is not likely that this value needs to be altered.

Sizing the file system

You should ensure that there is adequate space for future growth when sizing the file system to be used for storing attachments externally. To do this, have your DBA connect to your ExtraView database using Query Analyzer (MSSQL) or SQL Plus (Oracle) and run the following queries:

-- bring back the total size of the attachments stored in the database, in bytes.
select sum(file_size) from attachment;
-- bring back the number of attachments in the database
select count(*) from attachment;

Migration Utility Options

There are several forms of the doAttachmentMigration command. The utility is found in the directory named WEB-INF/data. All require one or more parameters passed as part of the ExtraView command line utility:

  • Move a single attachment by providing a single attachment ID:

    doAttachmentMigration instanceName -id 10000
     

  • Move a batch of attachments by providing a range of attachment IDs:

    doAttachmentMigration instanceName -from 10000 -to 50000
     

  • Move attachments according to their content type:

    doAttachmentMigration instanceName -type content-type ...

    The content type may contain valid database wildcard characters. This typically means that a percentage sign (%) will match multiple characters and an underscrore (_) will match a single character. Further, you can specify multiple -type parameters in a single command. For example, the following will migrate all the files that contain binary and zip within their content types:

    doAttachmentMigration instanceName -type %binary% -type %zip%
     

  • Migrate a list of attachments, from a file containing the list of ID's:

    doAttachmentMigration instanceName -id_list filename

    The filename contains a list of the IDs that are to be migrated.

Attachment ID values can be found by having your DBA run the following SQL query against the ExtraView database:

select min(attachment_id), max(attachment_id) from attachment;

Optional parameters can be added inside the double quotes:

-report <filename> - Generates a specific location for the report file. By default the report will be dumped into ExtraView log file.

-delete - If specified, deletes the attachment content blob from the database after migration. We would not generally recommend this if you have a large number of attachments, as it is much more efficient to simply have your DBA truncate the table when you have completed the migration and you have validated your results.

-testOnly - This allows a dry run migration of the attachments without any modification to the database or the repository.

-noValidation - This prevents the CRC-32 check on the stored migrated attachments, and will be somewhat faster. Use of this parameter is not recommended, because the extra time to validate the file content vis-à-vis the blob content is nominal, and the consequences of storing a blob in the file system incorrectly due to I/O error or network error could include a loss of valuable data. If the CRC-32 validation fails, the attachment is not migrated and an error is posted to the log and/or report file.

Running the Migration Utility

The migration command line utility is included in the ExtraView application package. It requires Java be installed on the application server to run. We provide shell and batch file scripts that you can use to run the utility.

If you have deployed ExtraView in a WAR/EAR file, you will need to set up an exploded directory to run this script. The scripts are located in the evj/WEB-INF/data folder on your application server.

Microsoft Windows with Apache Tomcat

Edit the doAttachmentMigration.bat file

Locate the lines

set JAVA_HOME=C:\ExtraView\java\jdk1.7.0
set TOMCAT_HOME=C:\ExtraView\Tomcat7.0
set EV_BASE=C:\ExtraView\Tomcat7.0\webapps\evj

and modify the paths to match your installation

Save and exit the edit session.

Open a Command window in the evj\WEB-INF\data folder and type the following command:

doAttachmentMigration instanceName "parameters"

where "parameters" is a double-quote enclosed list of parameters as per the options listed above.

Linux / Unix with Tomcat

Edit the doAttachmentMigration.sh file

Locate the lines

TOMCAT_HOME=/usr/local/extraview/tomcat
JAVA_HOME=/usr/local/extraview/java
EV_BASE = $TOMCAT_HOME/webapps/evj

 

Set TOMCAT_HOME, JAVA_HOME and EV_BASE to the correct paths for your installation.

Save and exit the edit session.

At the command prompt, type the following command:

sh doAttachmentMigration.sh instanceName "parameters"

where "parameters" is a double-quote enclosed list of parameters as per the options listed above.

For All Platforms

Run the script you just edited.  You must provide some additional parameters as shown here:

REM required:
REM -from (the starting attachment id for the migration) and -to (the ending attachment 
id for the migration)
REM or
REM -id (an attachment id for the migration)
REM or
REM -type content_type [-type content_type2[ ...]] (content type may be a wild card using
 database wild cards -- i.e. '%')
REM or
REM -id_list <file> (the name of a file containing attachment id's - one per line)

If you do not know the attachment IDs you may run this query:

select min(attachment_id), max(attachment_id) from attachment;

You might also simply substitute 1 and a very high number.

Linux / Unix with Weblogic

Edit the doAttachmentMigrationWeblogic.sh file

Locate the lines JAVA_HOME=/usr/local/extraview/java
WL_HOME=/usr/local/bea/weblogic81
EV_BASE=/path/to/extraview/application/evj52-x.jar

Set JAVA_HOME, WL_HOME and EV_BASE to the correct paths for your installation.

Save and exit the edit session.

At the command prompt, type the following command:

sh doAttachmentMigrationWeblogic.sh instanceName "parameters"

where "parameters" is a double-quote enclosed list of parameters as per the options listed above.

Validation and Completion

While the migration utility runs, it will print out information to the console. It will also write detailed log entries to a file in the evj/WEB-INF/log folder or to the file defined in the -report parameter option. Once the process has completed, you can check the log file to determine if there were any problems.

If all attachments were migrated, you can verify that the attachments are available to download and view through the ExtraView web interface. Once you are satisfied, you can have your DBA remove the binary attachments from the database.

If you have selected the option to have all attachments stored in the file system, your DBA can simply run the command

update attachment set thumbnail_id = null where thumbnail_id is not null;
truncate table attachment_content;

Note: this will remove all thumbnails. The thumbnail will be regenerated dynamically the next time the attachment is referenced, e.g., as part of a detailed report.

If you have selected the option of having some types of file stored in the database and some files stored externally, your DBA will need to run some variant of

delete from attachment_content
where attachment_id in
( select attachment_id from attachment where STORED_INTERNAL = 'N' );

Note: We strongly recommend that you back up your database prior to performing any of the procedures described above (particularly before deleting any attachments). Please contact support@extraview.com if you would like assistance with this step.