Microsoft Research (MSR) is dedicated to basic and applied research in computer science. We develop new technologies for improving the way computers interact with people. We have outstanding researchers in a wide variety of disciplines, both from leading academic institutions and industrial research labs. We are committed to supporting and participating in the worldwide research community, and we collaborate with many universities worldwide.
We founded Microsoft Research in 1991 because we felt that the growth of the PC software industry and Microsoft in particular created a unique opportunity to do basic and applied research in computer science which could make a positive difference in people's lives and set new directions for the field. We started small, with a handful of scientists working primarily in natural language technology and programming environments. MSR has since grown to include over a dozen research groups and nearly 200 researchers in areas as diverse as speech recognition, decision theory and 3D graphics and animation. Our impact on Microsoft's products has already been felt. If you use Microsoft Windows 95 and Microsoft Office 97 you have likely taken advantage of features which have been enhanced by technology developed within Microsoft Research.
Our researchers actively serve on conference program committees, editorial boards and advisory panels. They publish at conferences, in journals, and write for magazines that serve the research community. We believe that such professional service is an important component of our work.
Advanced Interactivity and Intelligence
Decision Theory and Adaptive Systems
The Decision Theory & Adaptive Systems Group (DTAS) is investigating the use of probability and utility theory to enhance computer applications and platforms. Explicit consideration of user preferences and key uncertainties associated with particular tasks and contexts is a central element of DTAS projects. The group has a special focus on extending the flexibility and responsiveness of operating systems and user interfaces. Areas of attention include automated reasoning and inference under uncertainty, learning models from data, data mining and knowledge discovery in databases, information retrieval, automated diagnosis and decision support, and automated learning for custom-tailoring software to user work patterns and preferences.
- Information access and management. We are pursuing principles and applications of technologies that allow users to access, filter, and manage information. As part of our early work in this area, DTAS developed the algorithms and assessment methods used in the Answer Wizard, a free-text help facility unveiled in Office '95 products.
- Intelligent user interfaces. DTAS is working on methods, languages, and architectures for integrating multiple sources of information to enhance user interfaces. DTAS' Lumiere Project has focused on the construction and integration of Bayesian models of a user's needs for assistance. Lumiere research led to the Office Assistant, a Bayesian help system in Office '97. Ongoing work on intelligent user interfaces includes integrating consideration of acoustic and visual events into analyses of user goals.
- Diagnostics and troubleshooting. We have developed and applied diagnostic reasoning methods to a range of problems, extending from software debugging to troubleshooting software and hardware systems. In our collaboration with Microsoft Technical Support, we have developed decision-theoretic troubleshooters that are available via the worldwide web. Visit Microsoft Technical Support Troubleshooters to access several decision-theoretic troubleshooters that have been deployed in an operational setting. Another diagnostics project, named Aladdin, has explored the application of decision-theoretic case-based reasoning to troubleshooting and customer support, the result of a collaboration between DTAS and Microsoft Technical Support.
- Learning models from data. Several research and development efforts focus on the development of methods for building predictive models from data. Some of this work builds on foundations of Bayesian statistics.
- Data mining and knowledge discovery in databases. Drawing upon related work in learning, we are investigating methods, tools, and applications of data mining and discovery in databases (KDD) for discovering useful relationships in large datasets.
- Optimization of computational processes. We are exploring the use of flexible computational methods and decision theory to identify bottlenecks, and to optimize the functionality of operating systems and applications.
Natural Language Processing
Natural Language Processing should make it possible for people to use computers in much the same way that they would use a human assistant to get their work done. This is not easy for the machine. It's ironic that natural language, the symbol system that is easiest for humans to learn and use, is hardest for a computer to master. Long after machines have proven capable of inverting large matrices with speed and grace, they still fail to master the basics of our spoken and written languages.
The goal of the NLP group is to design and build a computer system that will analyze, understand, and generate natural languages. Our system takes input text, and moves through various stages of linguistic processing, from lexical/morphological analysis through syntax, semantics, and eventually pragmatics and discourse. Our approach makes heavy use of the information available in online dictionaries and other written works, from which we are able to extract a rich knowledge base, which can then be used to bootstrap into increasingly more advanced stages of machine understanding. The programming system, and the underlying linguistic principles, apply to all natural languages. We are empirically oriented, and do not follow any of the currently received linguistic theories in detail. However, we are happy to use good linguistic ideas wherever they can be found.
The Microsoft Speech Technology Group engages in research and development of spoken language technologies. We are interested in not only creating the state of the art spoken language components but also how these disparate components come together to form a unified, consistent integration in the multimodal computing environment. Below is a list of projects we are pursuing to help us reach our vision of a fully speech enabled multimodal conversational computer.
- (Whisper) Speech Recognition
- (Whistler) Speech Synthesis (Text-to-Speech)
- (LEAP) Spoken Language Understanding
- (Dr. Who) Multimodal Conversational User Interface
Our group is interested in telepresence: being there without really being there (or then!). We are interested in real-time conferencing tools (video conferencing, audio conferencing, whiteboards) and on-line access to stored audio/video data.
The eagerly anticipated convergence of digital computers, communications, and consumer electronics will bring computing to vast numbers of new people and nurture the development of software very different from today's desktop, productivity-oriented applications. The rapid widespread acceptance of this technology will depend, to a large extent, on the ease-of-use and appeal of the user interface, as well as attractive and compelling applications. The User Interfaces group is exploring the nature of these new interfaces, a number of the key underlying technologies, and selected applications of these technologies.
We are exploring new metaphors, architectures, and the incorporation of a variety of multisensory and multimodal inputs and outputs such as speech, natural language, animation and sound.
- Development of social user interfaces based on life-like computer characters which can interact and build a rapport with the user. These characters may be assistive agents or used within interactive entertainment scenarios.
- Incorporating speech recognition, natural language understanding, and conversation (or discourse) management into a user interface to allow spoken conversation with a computer agent. An example of this is Persona: Conversational Interfaces
- Reactive 3D-animation including complex animated behavior, real-time specification and synchronization of various time-based streams.
VWG experiments with technologies that support the use of the Internet as a social medium. We are structured as a product incubation group, which includes both a research and a development effort.
Before 1996, the group developed two chat interfaces: Microsoft V-Chat and Comic Chat (now Microsoft Chat). More recent work has focused on a Virtual Worlds Platform and the development of two Virtual Worlds applications with different interfaces: 3D Virtual Environments and Online Learning/Town Meetings. The platform provides multi-user synchronous and asynchronous communications, object persistence, web integration, and easily customizable user interfaces. It is built on Microsoft's COM, Active X and Direct X technologies. VWG's technologies can be applied to information delivery, social support, learning, commerce, and entertainment.
Much of our work has been informed by MUDS and MUD architecture. We have also worked closely with sociologists to understand the implications of various features on emergent group behavior.
- Microsoft V-Chat is a multimedia, multi-user, social environment that lets people communicate online from within a 2D or 3D environment using graphical representations of themselves, known as avatars. Microsoft V-Chat avatars have a full range of gestures that allow users to express themselves online more fully. V-Chat users can select from a wide variety of existing avatars, or create and publish their own custom avatars using the V-Chat Avatar Creation Wizard. Sounds, animation, and visual imagery create mood and context for these social environments. V-Chat is a front end user interface to IRC, and users can also view conversations in a text-only mode.
- Microsoft Chat. In Microsoft Chat, your online conversations are the beginning of an interactive comic strip that unfolds in real time. Like other chat clients, you type in the text to communicate. Comic style balloons display your conversation, and gestures generated by conversation semantics give your character a variety of emotions and movements. The character you have selected, along with other comic characters, comes alive panel by panel. The Microsoft Chat program interprets key words and symbols to draw your character and integrate it into each panel. Microsoft Chat offers a wide variety of comic characters and backgrounds. The original Microsoft Chat artwork was created by Jim Woodring and the Microsoft Chat user interface was designed and patented by David Kurlander. Microsoft Chat is also a front end user interface to IRC.
- Virtual Worlds Platform is a research platform for developing technology that facilitates the creation of multi-user, distributed, applications on the Internet. It is being designed as a general-purpose system. The Virtual Worlds platform exploits Microsoft’s ActiveX and DirectX technologies allowing great flexibility in the design of user-interfaces and the support of multimedia.
The Virtual Worlds Platform provides a synchronous, persistent, distributed object system in which users can easily program and prototype shared environments. Changes that are made by one user in an environment can immediately be perceived by other users of the environment. The platform supports multiple media types including text, graphics, voice and video, and dynamic end-user object creation with behaviors. A variety of shared environments have been built ranging from immersive 3D environments where people can see and "directly" interact with other people, to 2D environments that have synchronous communication within a virtual classroom.
Recognizing that virtual worlds applications will include a variety of experiences, we have architected our system in such a way that the core technology infrastructure is separate from the interface, allowing for very diverse and customizable interfaces. These interfaces can be tuned to specific uses of the technology. Additionally, the core technology is flexible and extensible, enabling developers to create enhanced functionality.
Sample Applications (Customizable User Interfaces)
Snow Crash, by Neal Stephenson, is a science fiction book that paints a picture of 3D cyberspace worlds that are inhabited by our virtual selves. As communication bandwidth increases for home desktop PCs and desktops become more powerful multimedia graphics devices, 3D virtual worlds are increasingly possible. 3D environments have many useful applications including guided tours, simulations, and realistic environments.
Social Support System in the Healthcare Industry
We are collaborating with The Fred Hutchinson Cancer Research Center to develop a virtual world called "Hutch World" that is based on the actual Hutch outpatient lobby. This password protected world extends the social support network for Hutch patients, their friends and loved ones. Participants will include families and friends of patients, patients, and the Fred Hutchinson staff and volunteers. Hutch World incorporates such features as an auditorium, mail room and a school for children, all modeled on existing facilities at the Hutch.
Collaboration and Education in the Financial Industry: The Electronic Mercantile Exchange is a prototype environment built by Acknowledge Systems using the Virtual Worlds Platform. Acknowledge Systems developed online environments for the Financial Services Industry and worked with a major Futures Exchange in the creation of this prototype. Among the features of the environment are a collaborative research center and a simulated trading floor that allows users to take part in a multi-user mock trading session, complete with 3D graphics and audio. Used for marketing and education, the environment allows visitors to see, hear, and participate in simulations of these complex markets.
Interactive Music: The Sound Space is a collaborative musical and graphical environment where avatars join together to explore a live, on-line 'album' of multiple songs. Within each soundscape, avatars can mix sounds and compose their own musical patterns that are shared in real-time. Even though all of the avatars can affect the music individually, all the changes they make are instantly updated across the network, guaranteeing that the music is truly collaborative and heard the same way for all participants. The space exemplifies the capabilities of V-Worlds to host a dynamically changing environment where avatars can be expressive and change their world in real-time.
Multi-player Games: Multi-user role-playing and social games on the Internet are becoming increasingly popular. The Virtual Worlds Platform provides the underlying infrastructure for developers and designers of these games.
2D Business, Online Learning and Town Meeting Interface
Flatland is an application for remote collaboration and presentation built on the Virtual Worlds Platform technology. It offers: real-time and stored video presentation, a URL or PowerPoint display window, and dynamic audience feedback tools including chat, voting, and real-time Q&A. The user interface can be customized to the type of presentation and presenter style.
Although the world is full of visual information, most computers are completely blind. What if your computer could recognize you, understand your facial expressions and your gestures? What if you could build 3D models of objects just by pointing a camera at them? What if you could edit images and video by referring to which object you wanted to manipulate rather than which pixels?
The Vision Technology group was established in 1995 to investigate and develop these and similar technologies.
- We have developed a Vision Software Development Kit, to support computer vision research and development under Windows. This toolkit, called VisSDK, supports images of any pixel type (through C++ templates), and is fast enough for real-time image processing. It also features a device-independent image capture interface.
- Image-Based Modelling and Rendering addresses one of the fundamental problems of computer vision: given multiple images of a scene from different, possibly unknown vantage points, how can the images be used to infer the 3D structure of the scene? Over the past several years, the group has made steady progress on this problem. Along the way, they have been successful at creating seamless Image Mosaics, which stitch together overlapping images to form a single panoramic view of an indoor or outdoor scene. Image mosaics offer a compact representation of the data contained in many images and also allow users to view scenes from perspectives other than from those of the original images.
- Intelligent Video Analysis addresses the problem of analyzing the static and dynamic information present in video sequences. This research aims at defining and extracting the fundamental components of scene information captured in video. For instance, some of the questions asked are: how can we separate the static 2D and 3D visual information from the dynamic information about moving objects? How can we compactly represent information about action and behaviors? Can we define data organization(s) for video information that allows access to these key information components and develop tools to manipulate them? Based on the answers to these types of questions, we aim to develop technology that will be useful for Interactive Video Production and Manipulation, including the seamless integration of real and synthetic visual information, Video Storage and Access, including access from distributed databases, Efficient Video Transmission, including video over the internet, and New Ways to Browse and View Video.
- Vision-Based User Interfaces allow computers to recognize people and interpret what they are doing, using fast algorithms for real-time detection and recognition of people and their gestures. Work by this group has resulted in intelligent AVI (movie) players which recognize when a person is facing the monitor, and interactive educational programs which allow children to play virtual musical instruments.
- The EasyLiving project seeks to build intelligent computing resources into a home or office environment so that useful human interaction with the environment can occur in a natural, intuitive manner. Many of the advances made in vision-based user interfaces will figure prominently in the environment's observation of people. Some of the central issues include coordination of cameras aimed at different portions of an indoor environment, robust tracking of people as they travel from room to room, and general architectures for combining observations from multiple sensory modalities.
Cryptography is the discipline which studies methods to enhance user and system privacy and security. The Cryptography group within Microsoft Research serves multiple roles:
- researching new cryptographic methods and applications,
- working with standards bodies to develop security protocols, and
- providing internal consulting on Microsoft products.
Current cryptography projects within the Microsoft Research include
- Electronic Cash and related Electronic Commerce Infrastructure
- Internet Security Protocols
- High-performance Encryption Methods
- Public-Key Cryptography and Infrastructures
- randomness and Computational Complexity
- Theoretical Cryptography
- Computational Number Theory
- Intellectual Property and Content Protection Mechanisms
We define signal processing as the science and art of converting signals from sensors into useful information. Such information is often rendered as an output signal that is consumed by a human, but we also view signal processing as providing sensory data to computers so they can make intelligent decisions. Examples of signals include:
- audio and video - sounds and images; for communications, storage, or broadcasting.
- biometric information - temperature profiles, muscle activity, body motion, etc.
- communication signals - modulated carriers, radar, sonar.
- signals associated with other physical phenomena - geophysical, underwater, etc.
One aspect of signal processing involves filtering, denoising and compression of signals. Another aspect is signal understanding: extracting higher-level semantic meanings from signals. We believe that combining both of these aspects of signal processing can create state-of-the-art systems that will enhance the user experience of personal computers.
The goal of the Signal Processing group is to develop technologies, algorithms, architectures, and systems that allow efficient processing of signals by personal computers. We also intend to advance the state-of-the-art in the theoretical foundations for signal processing.
The theory group is a new fundamental research group at MSR. It focuses on a statistical physics approach to discrete probability theory, combinatorics and theoretical computer science.
Traditionally, statistical physics describes systems in nature containing many degrees of freedom. Canonical examples include the many molecules in a glass of water or the many elementary magnets in a piece of ferromagnetic material. There are two important consequences of the large number of degrees of freedom. First, even though the precise behavior of a single constituent cannot be predicted particularly well, the average behavior of a constituent can be predicted with remarkable precision. This is analogous to the fact that the behavior of a large number of voters can be predicted with much higher accuracy than that of any given voter. Second, systems with large numbers of degrees of freedom are capable of exhibiting phase transitions: sudden changes in behavior in response to small changes in external parameters. A typical example is the solidification of water as the temperature falls below the freezing point.
As computer systems and networks become increasing large and complex, statistical physics becomes the appropriate language to describe them. Probabilistic methods can be used to predict the average behavior of these systems, and to identify potential phase transitions. Combinatoric methods can be used to "count" configurations contributing to a given type of behavior.
Ultimately, the theory group will have members in statistical physics, probability theory, combinatorics and theoretical computer science.
Systems and Architecture
The charter of the Microsoft Research Database Group is to increase the usefulness of database systems to both business users and individuals by creating, extending, and applying database technology. To that end, we consult with the database product groups at Microsoft and have initiated two exploratory research projects. We are located in Redmond, Washington, which is in the greater Seattle area.
- AutoAdmin: The long term goal of AutoAdmin is to make database systems self-administering and self-tuning in all their dimensions. Initially, the project is focusing on the physical database design problem (index and materialized view selection). Surajit Chaudhuri, Paul Larson, and Vivek Narasayya collaborate on the AutoAdmin project.
- Phoenix: The long term goal is to improve application availability and error handling robustness. Initially, the project is focusing on exploiting database recovery techniques for enabling applications to survive system crashes. David Lomet and Roger Barga work on the Phoenix project.
Our research addresses a wide range of topics in 3D graphics and animation. These range from low level graphics technology concerning how to quickly and accurately draw primitives such as triangles to the screen, to high level graphics issues such as how to create and control human-like figures in a virtual 3D setting. We believe that interactive 3D graphics and animation will be an important component of future user interfaces directed towards business, consumer and entertainment applications. Our projects include advanced rendering approaches, image based rendering, multi-resolution representations and modeling techniques, geometry compression, and human figure animation. We are also working to define declarative methods of specifying complex, interactive temporal behavior, as well as defining an overall systems architecture to support high performance interactive graphics.
Billions of clients will need millions of servers. Most servers will be tiny, but some will be huge. Ideally, the whole spectrum can be built from modular components. We are exploring techniques to build large servers as arrays of commodity processors, disks, and interconnects - Scalable Networks and Platforms (SNAP) . The resulting computer cluster should be as easy to program, manage, and use as a single system. In addition, by using spare modules and redundant storage, the cluster should mask component failures and so provide highly-available services. We are working with the NTclusters group to help define the requirements for clusters. We are working with the SQLserver database team to add fault-tolerance, scaleability, and parallelism to SQLserver. And, we are working with the Distributed Transaction Coordinator to help add ACID transactions to NT and Microsoft's Component Object Model (COM) infrastructure. In each case, we are building prototypes to demonstrate our ideas.
We believe you can build supercomputers as a cluster of commodity hardware and software modules. A cluster is a collection of independent computers that is as easy to use as a single computer. Managers see it as a single system, programmers see it as a single system, and users see it as a single system. The software spreads data and computation among the nodes of the cluster. When a node fails, other nodes provide the services and data formerly provided by the missing node. When a node is added or repaired, the cluster software migrates some data and computation to that node.
Systems and Networking
The Systems and Networking Research Group in Redmond explores advanced and speculative systems and networks and systems-related problems. We build real systems and networks to test and evaluate the ideas we explore.
Systems and Networking research is also conducted both at our Cambridge, U.K. and Bay Area research labs.
- Millennium: We are building a new self-organizing, self-tuning distributed system. A short position paper describes our long-term goals for the project.
- Consumer Real-Time: This project's goal is to make it possible to develop independent real-time applications independently, while enabling their predictable concurrent execution, both with each other and with non-real-time applications. This is related to the earlier Rialto work, but using Windows NT instead of the Microsoft Interactive TV kernel.
- IPv6: We are building a prototype implementation of IP version 6 (the next generation Internet Protocol). Please see our web page.
- MCoM: We are investigating issues in providing location-transparent and location-aware tether-less access to distributed information via heterogeneous computing and wireless communication devices. Our goal is to create a software architecture with an accompanying set of algorithms that allows the system to adapt to the radical differences in the communication substrate. The adaptation will be transparent to the user so that applications are not only unaware of the device's mobility but can also benefit from it. As part of this project we are building hardware that enables HPCs and PDAs to form both managed and ad-hoc multimedia wireless networks while dealing with continuous and on-off wireless connectivity.
- MMLite is an object architecture that stresses adaptability, minimalism, and reusability. Components that are typically designed-into an operating system, such as virtual memory management and interprocess communication, will be loadable in this system. We explore object mutation for interposition, dynamic software upgrades, runtime code generation, code specialization, and object mobility. An initial version is used in some DirectX accelerator boards.
- Gigabit Networking: We are building systems support for gigabit networking. One contribution was helping to author the specification for the Virtual Interface Architecture.
- Operating System Directions for the Next Millennium
- The Architecture of a Distributed Virtual Worlds System
- From MUDs To Virtual Worlds
- Design Principles for Online Communities
- Planning-Based Control of Interface Animation
- Microsoft Windows Highly Intelligent Speech Recognizer: Whisper
- Lumiere Project: Bayesian Reasoning for Automated Assistance
- Proposal for a Bayesian Network Interchange Format
- MMLite: A Highly Componentized System Architecture
- Distributed Schedule Management in the Tiger Video Fileserver
- CPU Reservations and Time Constraints: Efficient, Predictable Scheduling
- Anonymous Communication and Anonymous Cash
- Cryptographic Defense against Traffic Analysis
- Dense Probabilistic Encryption
- On the Power of Quantum Computation
- General Linear Secret Sharing
- Intentional Programming - Innovation in the Legacy Age