[SPC09] Document Management Improvements

October 21, 2009

On this second day of SharePoint Conference in Las Vegas, I attended two sessions that were specific on document management improvements in SharePoint 2010:

“Growing SharePoint From Small Libraries to Large Scale Repositories and Archives” by Adam Harmetz (Microsoft)
“Document Management Deep Dive” by Ryan Duguid (Microsoft)

The main takeaways from these two session are these:

There are new SharePoint features that make it easier to manage and apply metadata to the content:
- Unique Document Numbering
- Metadata Service for centralized management of content types and metadata
- Tagging and Rating
- Taxonomy navigation based on the current metadata
There are also improvements on the file record / archive features, too:
- Multiple “Send To” locations
- Automatic routing of the documents, based on the metadata
- Automatic document obsoletization and archiving

There are 4 main approaches on how to use these settings in Document Management, depending on the scale of the document repository that we need:

| Document management approach | Number of documents stored | How SharePoint 2010 helps | |---|---|---| | Ad Hoc Team Library | Up to 200 documents | Focus on easy and lightweight features, while the metadata are centralized. Content types are now centralized into Metadata Service, across the site collection boundary.Taxonomy can be used in columns and is hierarchical and multi-lingual.Default values for columns at document library level can be set. This allows document library to automatically populate the values of the content type that’s shared across the organization.Managed Keywords are suggested as you are typing.The tags are now surfaced in the “Save document” shell window.The tags appear in tag cloud for the site. The tags can be further drilled down, subscribed to and filtered. Tag security: only private and public tags are covered. | | Managed Library | Hundreds to thousands | More structured than the ad hoc library. (Document Center). The system should help the user to use and create metadata. Unique document IDs for the enterprise can be generated. You can get the document by the ID using a “Get Document by ID” web part.Metadata navigation can be used as a "virtual" folder structure. You can still use the static folders. Key Filters can be used to further slice the navigation result set. These settings are defined on the document library settings. The taxonomy is defined by each user, but the structure is centralized. Metadata can be centrally managed or user-managed (“folksonomy”). Document archiving can now be done directly in the library. | | Repository / Archive | Millions to tens of millions | There should be a best-practice team that manages the metadata for the repository. The users submit documents (finished ones) for broad consumption. The end users don't even know what are they looking for. For instance, a Knowledge Management Repository. Content type and metadata classification is essential. Additional filters are now indexed in the background, as compound indexes in SQL Server). The metadata navigation results are now shown in batches (and announced to the user), so not to hamper the performance when navigating huge archives. Furthermore, the navigation results are cached across users. Search can be contextual to the navigation results. (very cool feature) Content Organizer settings: content can be sent to the archive manually, by the workflow, uploaded etc. The document can be routed to the corresponding folder by Content Organizer component. Folders can be used to set permissions on a larger basis. The retention policy is also bound to the folder. Content Query Web Part (CQWP) is now suited for document library results. The demo shows "Suzie recommends" and "Newest added biographies" CQWP results. | | Massive, Distributed Archive | Hundreds of millions of documents | The documents are added automatically. High-volume back-end systems are involved instead of the end user. Logical organization and hierarchy are a key. Content Organizer can route content to correct site collection in the archive. Content Type Syndication enables central management of distributed archive. FAST search is used to retrieve content. |

There are several back-end optimizations that make these scale-out scenarios easier: database reorganization, compound indexing, remote BLOB storage in SQL Server 2008, per-item throughput maximization.

Office 2010 can interrogate SharePoint Metadata Service in background, which allows the user to create new document based on the latest content type document templates.

Word Automation Services are also available to automatize Office document conversion (into PDF, XPS and so on), printing and document composition.