Cass, Inc. Making Data Work

Sample Data Processing Specification

Multiple input files in disparate formats need to be cleaned and combined to produce a single file which is formatted for further use.

1) Data acquisition: Files are transferred via: email, FTP, diskette, ZIP disk, CD-ROM, tape, etc. Files may also be compressed.

2) Data setup: Each file is loaded onto the processing system and converted to a uniform format. A wide variety of input formats are supported including: ASCII (fixed and delimited), EBCDIC, Spreadsheet, Word Processing, Database, list image, and more.

3) Initial processing: Files are scanned to ensure that they contain only legal ASCII characters, input record counts are noted (by file), file source names/codes are assigned, files are visually checked for errors and discrepancies.

4) Data repair: Fix bad zips where possible (for example, ZIP codes that were stored in Excel lose leading 0′s), Split full-names into components as needed (prefix, first, middle, last, suffix), Split combined city/state/zip fields into components.

5) Data Enhancement: Code data using USPS (United States Post Office) ZIP+4 database, Optional process to code data using USPS NCOA (National Change of Address) database, Convert data to proper case (using intelligent upper/lower case processing), Abbreviate/Expand data fields as necessary (VP to Vice President), Determine contact gender and append prefixes (Mr./Ms.), Create salutations if required (Dear Mr. Smith, Dear John, Dear Customer, etc.). Gift array processing including gift upgrade calculations.

6) Code: Code records by source (filename, salesperson, etc), or by region (based on state, ZIP, etc), Code records based on a combination of field values and variables. Associate conditional text based on complex criteria for personalized communication.

7) Process: Merge/Purge – intelligent duplicate elimination based on customizable criteria within the file and/or among the separate files. Suppressions – remove records which match a “suppression” file which might contain clients or competitors for example, Merge data from the component files (one file might have the contact’s phone number, another file might have the contact’s address), Select a partial set of records (for example Nth selection).

8) Report: Provide record counts and reports, Sorting and profiling by component fields, List all unique values in a field with frequency counts (all titles, all companies, etc.), Provide cross-tabs (for example Company names by State)

9) Output: Re-configure fields to match output requirements (field names, lengths, order), Separate foreign and other records if required, Convert data to required file format for output use.

Data Enhancement Example:

Input:
DAVID CASS
PRES.
NEC CORP
3 MASS AV
ACTON, MA 1720

Output:
Mr. David Cass
President
NEC Corporation
3 Massachusetts Avenue
Acton, MA 01720-5521

Duplicate Processing Example:

Input records:
BETH SMYTHE, 3 MASS AVE, PO BOX 44, ACTON, MA 01720
B SMITH, PO BOX 44, ACTON, MA 01720
MS. BETH K. SMITH, 3 MASS AVE, ACTON, MA 01720

All 3 records detected as duplicates, and only one of the three is output depending on client selected file priorities and or preference for specific field values such as highest sale, most recent transaction, etc. In addition, it is possible to generate a list of duplicates and manually select the preferred record.

Sample Spec