Monday, December 15, 2008

First Cut on Knowledge Management

A Brief Treatise on Knowledge Management
by Chris Manning
copyright Manning Applied Technology, all rights reserved
www.appl-tech.com
14 December 2008

Introduction

This white paper addresses knowledge management. The subject area is multifaceted and also includes or touches on artificial intelligence, project management, information management, and collaboration tools. This paper is cast in terms of engineering, but the concepts will be useful in almost all fields of endeavor. Critiques of this document are actively solicited from interested parties. Some reviewers will have useful insights on where technology is going in this subject area and may be able to point out relevant new developments.

Humans have only scratched the surface of how to use computers efficiently. The terminus of that trajectory is artificial intelligence, when the systems that humans have designed continue to optimize and extend themselves. "Real men write self-modifying code" is how one prescient programmer put it. Very few do knowledge creation, management, use and reuse as well as Toyota. Their development of personnel is equally noteworthy. The instant project in particular and collaborative projects in general will benefit greatly from effective knowledge management. Any approach undertaken at present requires discipline in two key areas; first, the knowledge must be entered into a data system in a useful format, and second, users must access and employ the knowledge base appropriately. Both of these tasks hinge on user interfaces and training. If it were easy, everyone would do it. Engineers are intrinsically adept at doing design work, discussing design work, and project history, but perhaps are not adept at entering data in computer readable formats. The modest fraction of engineers and scientists that write well tend to become managers.

Purposes

The advantages of effective information management in an engineering project can be summarized succinctly, roughly in order of importance:

1.efficient design execution; avoiding duplication of effort
2.management of subsequent manufacturing processes
3.troubleshooting of fielded systems
4.extension of system capabilities (i.e., subsequent reengineering)
5.knowledge and design reuse
6.estimation of return on investment (i.e., estimating overall project cost)
7.estimation of related project costs (similar future efforts)
8.failure analysis, forensics

Countless related areas can be identified. These include early identification of critical problems, sharing vendor/part information and design memes between projects and avoiding duplication of effort. In a sense, there are two types of duplication of effort, which both can be reduced by positive and negative feedback: sharing of successful design elements and sharing of problems to be avoided.

To the extent that project files, knowledge and information comprise company-proprietary information, access must be closely guarded. Further, to the extent that these files contain critical information that would be very expensive to replace, they must be backed up at independent locations. Access control is a straightforward function, as is backup. Both are routinely practiced in corporate information technology (IT) departments. However, for collaborative projects, access must be extended to subcontractors. Corporate IT policy must be satisfied by any vehicles used, and by their method of use.

Traditional solutions

Knowledge and information traditionally have been managed by writing reports and compiling build data repositories to hold assembly manuals, schematics, board layouts, and software files. These approaches are an acceptable baseline and enforce the capture discipline, to the extent that knowledge and information can be and are embedded successfully in these vehicles. A widespread and useful approach to managing data in these formats is the use of version control sofware. The two most common examples are Subversion from the open-source community and SourceSafe from Microsoft.

Improvements to the traditional approach

Thus, some aspects of improved knowledge/information management are straightforward. Simply putting project files in a common data structure and providing search tools (e.g., Google desktop tools) is a powerful step in the right direction. Each of the purposes described above may require navigating different paths through the data set. Ideally, each of the navigation paths could be customized and captured in a suitable user interface. This aspect of the subject area could be summarized as information management.

A further improvement to knowledge management that can be implemented now is recording conferences and meetings, then putting the resulting files in the same data structure with other project files. To be useful, these files must be readily searchable. Recently, Matt Williamson, one of MAT's contractors who is very knowledgeable on tech gadgets, demonstrated Google's new voice recognition features on an iPhone. Not only are Google already using speaker-independent speech-to-text capability for internet searching, but also are beginning to use speech-to-text tools to make the archive of Youtube videos readily (text) searchable without the need for manual keyword entry. Audio capture with speech-to-text conversion will be a very powerful tool for interfacing knowledge from humans, who are very comfortable in discussing projects, history and why things were done in a particular way. In the near future, it will be feasible to extract rules embodying the knowledge from the captured speech. This will be another important step on the path to artificial intelligence (AI). The text-searching capability will make it convenient to find segments by audio pattern matching (e.g., "please find the segment where Mike asked about 33 ohm resistors on the telecon last month.") Google's roadmap to the future started with text pattern matching and now includes audio pattern matching. It can be anticipated that Google's business plan includes picture pattern matching, then video pattern matching and ultimately artificial intelligence. A key aspect of AI is pattern recognition and matching.

One state-of-the-art practice at MAT is scanning scientific papers and documents using a Canon MP-780 all-in-one fax/copier/printer/scanner. The software provided with the MP-780 generates pdf files having embedded text generated via optical character recognition (OCR). The resulting files are text-searchable, using either the built-in Windows file search capability (slow) or Google desktop tools (fast). The files and Google search readily can be served over a network. This approach is very powerful and clearly a useful approach to project information management, particularly dealing with legacy paper. Packing lists are routinely scanned. In the past, MAT have written many comprehensive project reports, which already are computer files. The use of Google tols makes the information readily available via keyword search. MAT also has several internal Wiki pages. One is used for the routine task of tracking lunch orders. Another is used for experimenting with knowledge capture.

Commercial and open-source implementations

Collaboration tools can be divided into two areas, though not very cleanly; they are on-line, or real-time, and off-line. Some packages may include both types. On-line tools include videoconferencing and desktop sharing, which allows participants at different locations to access and share any information that can be displayed on a computer screen or in front of a video camera. The off-line aspect of collaboration tools is routine information sharing, which includes conference and meeting scheduling, tracking of project information and milestones, and access to project files. It is quite likely that fuel prices will move to record highs after the current worldwide recession. Combined with dropping bandwidth costs, this will drive significantly more business to videoconferencing.

A number of real-time collaboration tools have been on the market for several years to a decade. One of the earliest was Microsoft Netmeeting, which supported desktop videoconferencing with screen sharing at least as early as 1998. Desktop videoconference tools have become much more popular in recent years. Webex was acquired by Cisco in 2007 after achieving roughly 2/3 market share. Competitors in this space include Infinite Conferencing, Vidyo and Hewlett-Packard, and many others. Simple and free tools include the various instant messenger offerings, such as MSN Messenger, which support video, audio and text, but not screen sharing. It seems reasonable to guess that real-time tools will be integrated as part of the overall collaboration toolset, which will facilitate data capture and archiving of conference sessions. Gmail chat already allows archiving of sessions.

A wiki is a system of essentially blank web pages with a simple editor that allows users to create content and hyperlinks between pages. It satisfies many of the criteria described above, including the potential to embed knowledge and flexible navigation. The pages can include links to drawings, schematics, and any other file that are part of a design process. For each different type of navigation required, it is possible to set up a separate wiki page, or set of wiki pages, that provide the navigation. Confluence is a wiki-based tool, which is one of the simplest solutions to the problem of knowledge capture and information access. Confluence is offered by Atlassian (San Francisco, CA) and can be installed locally or run on their servers.

Another tool that combines information, project and knowledge management is product lifecycle management (PLM) software. Generally, PLM products also are advertised as collaboration tools. In a sense, this is a tautology, because almost all product development is collaborative. Examples include Teamcenter (Siemens), CATIA (Dassault Systemes), Windchill (Parametric Technologies Corporation) and numerous others. Teamcenter began as an a joint venture enhancement to computer drafting/CAD tools and was eventually acquired by Siemens. Teamcenter is used by several major automobile manufacturers, while CATIA is used by Boeing and Ford Motor, among others. The scale and expense of these enterprise-scale solutions may be inappropriate for small projects and small companies. Clearly this space is highly competitive, so it can be anticipated that these tools will continue to mature. These packages includes a suite of tools addressing most, if not all, aspects of project management.

Redmine (www.redmine.org) is a rapidly maturing free and open-source (FOSS) project that combines wikis, roadmaps, bug tracking, issue tracking, feature requests and forums with user data stored in a Source Control Management tool, such as Subversion or Git, documents, and other files. It is slightly programmer oriented, but any small- to medium-sized company can benefit from having all of features packaged into a single service. From an SBIR proposal point of view, a project planning tool that can capture the essence of a project would be very helpful for writing proposals.

Arguably, knowledge management is a database problem in which the knowledge elements are catalogued. The critical component that goes beyond simple database technology is expressing the relationships between the elements. The complexity of the connections is the essence of intelligence.

Conclusions

Significant value can be realized by effective use of knowledge management tools. At present, no clear winner has emerged, particularly for small projects. The market for these tools is not yet mature. One approach to the instant project is to put together inexpensive, off-the-shelf open-source tools, such as wikis and Subversion, or use the Redmine open-source toolset. Combined with Google desktop tools, this approach is very inexpensive and powerful. Clearly, all companies will have to choose standardized toolsets in the near future. Because the investment of project effort scales with the number of projects, larger companies must choose sooner and better.

Aknowledgements
Stefan Natchev for encouragement and for providing information on Redmine
Tom Old and Bob Hertel for providing input on Teamcenter
Sami Nuwayser for general critique and lots of good ideas

7 comments:

  1. Interesting ideas. The key challenge that I deal with in the current paradigm is not access to data, but difficulties in limiting the incoming data stream to that I would consider high priority for a given task or question.

    [moderator: thus, one function of intelligence is to filter down to only the relevant information]

    ReplyDelete
  2. I read your white paper on Knowledge Management (KM). Thanks for sending it to me. It reads well and is a good overview of the available systems. The paper could be a turned into a useful guide to companies who want to compete with larger companies. This is probably a market or market un-met need.

    I run the xxx at our xxx in xxx and have outsourced this function to the xxx since 1999. Many of the terms that you used in your paper as knowledge, we would classify as data or information in our model. The organization of the data would be information. Analysis and interpretation would be knowledge; however, the definition of KM has now included many terms.

    ReplyDelete
  3. I need to add comments on Skype, LIMS, RFID/ultrasonic tool/inventory tracking, and lean concepts

    ReplyDelete
  4. Interesting paper. When I was Tools Guru at Thinking Machines and Sun, I worked on a lot of these problems.

    1. Companies can sometimes get too obsessed with design reuse. When a project is behind schedule, no manager will spend 1 extra man-hour on making a design more reusable than it needs to be to meet the existing design target. When confronted with a preexisting design which is similar to one needed for a current project, the engineer will dutifully copy the previous design and customize to his needs. Perhaps rewriting it completely, and make no effort to make it backwards compatible with the previous design. To be reusable, it must also be maintainable. That means it needs a regression test suite that verifies than any future modification to the design will not break the earlier uses of the design. Simple case in point. How many different spark-plugs do they stock at a car parts store? Worse yet, how many different spark-plugs can you find behind the parts counter at a single franchise (yes even Toyota stocks more than one type) The cost advantages should be obvious to any automobile CEO.

    What is valuable is good documentation on how the design works, lessons learned during it's design after the fact. What is more reusable is design, verification and simulation techniques more than specific designs. A well documented and modular design is more likely to copied and modified, not to mention more maintainable in the long run.

    2. Access control and backups SHOULD be practiced in corporate settings, but I would venture to guess that this is at best true for a majority of Fortune 500 companies. Thousands of smaller businesses have IT departments run on-demand by local tech gurus in their spare time. With no formal training on best practices for these very two topics, thousands of companies are tripping over the same pit-falls every day.

    Just this month, I had this happen to me. There is a company which specializes in CVS (a precursor to SVN) I have bookmarked a page of their wiki/website for their good on-line reference to CVS commands (because we at xxx still use CVS, blech). Their wiki went off-line because their ISP (yes the paid professionals) did not have working backups. Two weeks later, the wiki is still down. Two groups of professions (The CVS services company and their ISP) failed in the basic duties of backing up data properly.

    Given Windows is the vast majority of desktop and laptop computers today, what stop someone from coping data to a USB memory and transferring it to anther computer? Once on that USB Stick, there is no longer any access control.

    Access control and backups would seem to be simple problems, but they are not. Corporate Search engine technology (i.e. Google) make the access problem worse. When Google answers the query, it typically shows URLs for all documents, not limited the results to those you are personally authorized to see. Otherwise it would need to maintain a different index for each actual combination of access rights found in a company. The fact that many of the worlds biggest web sites have a special backdoor for Google to use so their site can be easily be searched without supplying a username and password is a big problem. It is even a bigger problem in the corporate world with tight security controls required by Sarbanes-Oxley. Some people would say that the target URL found by the google search is in itself a security breach. Even if I can not access that URL with my current rights, I know where to attack.

    Remember hat the biggest threat to security is the employee, not the external hacker. Most data theft is done my people inside the firewall.

    3. Google is good for searching unstructured data. While it might seem to be valuable to search all your recorded conversations for a reference to 33 ohm resistors. More valuable is a quick search of all your schematics for the same part, which will probably take a man-day to complete, because Google can not read your schematics and possible not your BOMs. Has Cadence provided a google desktop extension to read their schematics?

    4. Videoconferencing. It is nice to have your son talk to the grandparents on skype. But engineers rarely need to see each others faces. The value of conferencing (with or without video) depends on the exact details. One to one, one to many, many to many. Voice is the most important aspect. Must be clear and full-duplex.

    Some of the worst examples are when you have a conference room filled with engineers and a speakerphone and then 2-3 more engineers working from another location. people in the conference room do not speak clearly enough or loud enough. People in remote location end up being second-class citizens to the meeting. Ok the meeting is over. Dave in the other end? You still there? We are ending the meeting now. It takes real training and awareness to properly manage a meeting in this environment.

    For presentations, the sharing of slides in realtime is critical. Much more valuable than talking heads. Most video conferencing presentation end up pointing the camera at the wall screen where the slides are. Very poor results as the video camera is not intended to pick up the 8 point font used on the slides. An it is not bandwidth efficient to share slides in this manner. Much better is for the remote people to see the same actual presentation (powerpoint, openoffice, or PDF) as everyone else.

    4a Desktop sharing: See VNC. Open source and free. Works on and across all platforms Mac, Linux, Solaris, yes, even a windows version is available if you insist

    5. What I do find to be very valuable is corporate instant messaging. More valuable than I first expected.

    Sam: got a sec
    Chris: otp, 1 sec
    Sam: no prob
    Chris: ok I am off the phone, whats up
    Sam: I am working on that business prezzo for next week. Did we say the next years projected profits are 20 Million or 20 Billion?
    Chris: Billion
    Sam: Great thanks, I will send the draft 2 you soon. l8r

    6. Wikis are nice, but my experience is that only the techno people actually edit them. Average office worker will not click the edit button. My recent experience with SharePoint was not positive. Completely too technical and is probaby a windows only solution.

    7. You mention SVN and VSS. The real professional granddaddy is ClearCase and it is far more powerful than the others. That said there are other commercial products which are popular in engineering environments (like Perforce)

    8. Knowledge management. So perhaps what we need is something which at first glance looks like a wiki. But a search engine reads it on a constant basis then finds and augments the wiki pages with additional information which may be of interest to wiki's readers. Then some people will vote on the value of the information and the search engine LEARNS how to identify better things to add to the wiki in the future. And the engine is extensible with an open API so companies like ProE and Cadence can supply extensions on how to search additional files types. And it also leverages a centralized directory server to hold authentication and authorization information so that results are always limited to those which you are allowed to see.

    ReplyDelete
  5. add secure email, email archiving, remote hardware operation, robotics

    ReplyDelete
  6. I checked out your blog on information management. Very interesting. I have a few comments (mostly from the Devil's advocate side):

    1) Congrats on scanning your papers so they are searchable. This is probably your biggest "bang for the buck" you can get!

    2) I am a big advocate on the return on investment issue. I look at your white paper as a "Gold Dredge Proposal". A gold dredge is used in places where the mother lode (in our case drawings, code comments etc.) has aready been carted out, and it is not economical to mine in the conventional manner. The gold dredge is an efficient way of getting the next order of magnitude of gold (in our case information) out of the land. In other words great idea, and I expect the value is there, but it will have a ceiling.

    I say this because in my world, really the only thing I can imagine being captured in addition to what I record is meetings. Typically these are at the level of "where are we with respect to the schedule, and what do we need to keep on track" - not really valuable technical info. Most of my technical discussions occur over email, and indeed I search these once in a while for "33 ohm resistors".

    3) Unlike most of the people in the technical world, I am a bit of a curmudgeon when it comes to talking about Moore's law. There hasn't been a lot of innovation in computer development in the last 30 years - just a reduction in component size. Atoms have a finite size, so chips, memory, and computer speed will soon approach asymptotic limits. When this happens, I believe we will either get a new paradigm (say optical computers), or learn to lean up our data and applications. To be perfectly fair though, I don't see what you are talking about as being a space or speed pig, and doable with almost present day technology. Soooo ... ignore what I said as I was just thinking out loud ...

    4) I don't really see your connection between AI and information management. Do you mean the application to owner independent transcription software and search algorithms?

    ReplyDelete
  7. I was thinking you might spend a moment on the taxonomy of the discussion up front, before reviewing examples and recommendations. Knowledge Management is big stuff for us here at (big player), and we've got a whole organizational area devoted to it. One of their first efforts was to define exactly what we would mean by the term, and differentiate things like information management and other productivity strategies from the overall concept. You've packed a lot of heavy thinking into this piece, but there doesn't seem to me to be a framework from which to make sense of the "whole" of it, and I found myself simply taking it as a list of somewhat random thoughts around a general subject.

    "Knowledge Management" is taken here as trying to find a way to store and organize what usually gets lost between being stuck in people's heads, or stuffed in uncatagorized documents on disorganized servers somewhere in the bowels of the beast. It's information management, certainly, but it's also the methods by which people document what they do and why they do it, so that others can leverage what's being done more easily. It involves a lot of templates, and a lot of structural discussions about project work and related unstructured data (powerpoints, spreadsheets, etc. etc.) that needs to be managed consistently in order to be recombined further down the line. It's especially important when trying to firehose a product portfolio of hundreds of products into the hands of a single unprepared salesperson, in order to present a sensible and cohesive whole to a prospect or customer. If the sales presentations aren't consistent, they can't be chopped up and re-purposed to respond to the real unanticipatable needs that always differ from the way it was planned to be rolled out.


    The art is in balancing fealty to the "process", as opposed to remaining focused on a particular goal. Having to slow something down to follow a template is costly, but, many times, having a bunch of spaghetti to throw at a wall hardly leads to better results.

    Would it be possible to separate the conversation into main themes? E.g. purely "process" focused initiatives, like the Google phone example, as opposed to data or information management techniques, like insisting that all internal documents follow a set template.

    Also important here is a vetting of all terms used both internally and externally. This enforces a consistency of language that is deceptively destructive if it's not followed. All branding gets consistently enforced this way, and all pieces work together as opposed to creating potential confusion based on differences in product names, feature descriptions, etc. Customers need consistency or they get confused by the sheer volume of stuff we put out, much like trying to understand a conversation in a crowded room, even when we're just our own crowd. (x0,000 employees with multi-million dollar marketing budgets can be very noisy). Outputs in this area include a regular update to the encyclopedia of approved names and terms, and a workflow process for new ones to get registered, and old ones to be retired.


    We also have councils that manage internal portfolio processes for categorizing and organizing our understandings of our markets and our products, along with the opportunities that exist at the intersections. Efforts are made to connect this to the development planning process, too. Everything gets put in a database repository, and nothing can get discussed until it's entered into the record and quantified in all the required fields. (Market size, target customer profile, functional scope both from a business and a technology perspective, competitors, etc.)

    Lastly, I would say to your conclusion and its discussion of the possibility of a tool (singular) to help with all this, that KM has to be an "all of the above" endeavor. Tools can certainly help, but it's the process and discipline around the use of those tools that's proved to be even more important.

    ReplyDelete