Scalable Quantum Approaches in Language Representation

Quantum mechanics is a very successful scientific theory for making predictions about systems with inherent ambiguity in them. That natural language bears similarities with such a system is at least plausible. Quantum modelling based on quantum mechanics is being applied to domains such as artificial intelligence, human language, cognition, information retrieval, and social interaction. The most recent advances of theory and experimentation for applying quantum mechanics include:

  • Use of quantum algorithms to address, or to more efficiently solve, problems in non-quantum domains (including contrasts between classical vs. quantum methods),

  • Practical applications to quantum domains, such as implementation of artificial intelligence, information retrieval, and language modelling
The promise of quantum modelling are improved methodologies to capture the subtleties and ambiguities of human language, resulting in optimised algorithms for text processing. The purpose of the project is to investigate methods borrowed from the domain of quantum mechanics in a wide range of large-scale language technology applications.

The latest trends indicate the rise of a heterogeneous platform in which multi-core CPUs and GPUs work together in a distributed-memory parallelism. CPU-based parallelism has been utilized for decades, and while not without its own problems, it is a mature field and a multicore CPU enables developing faster algorithms with reasonable effort. In this paradigm, there is a considerable overhead on dividing the problem, distributing the bits along a small number of CPU cores, then collecting and merging results. This type of parallelism is available in a wide range of programming languages, but in any case, the source code needs to be modified to some extent. GPU-based parallelism is a completely different approach. The overhead of splitting the work is minimal, the number of cores is massive, but the kind of computations that can be split is limited to simple, single-pass operation. This heterogeneous computing environment has to be studied at different levels to find a scalable implementation: low-level linear algebra, numerical methods, kernel methods and manifold learning need to be studied, as well as higher level load distribution, such as MapReduce. The constraints are as follows:
  • Text processing is typically a data-intensive task, and several distributed algorithms have been proposed to deal with large-scale collections on a grid or in a cloud computing environment. MapReduce was originally developed to this end, and mature libraries, such as Cloud9, are readily available. Other libraries, such as Mahout, facilitate the development of complex language technology applications.

  • General-purpose computing on the GPU requires considerable effort from developers. Initial results in text processing, however, indicate that the improvement in execution time can be considerable.
  • Quantum methods, on the other hand, rely on linear algebra and other numerical libraries, many of which have already been optimized to utilize the power of GPUs. Graphics processors are ideally suited for computations that can be run on numerous data elements simultaneously in parallel. This typically involves arithmetic on large data sets (such as matrices) where the same operation can be performed across thousands of elements at the same time.
 SQUALAR intends to bring the best of two worlds together. By bridging data-intensive text processing with sophisticated quantum modelling of languages, we expect to see major advances in language technology.

The challenges, however, are far from being trivial. The major frameworks of GPGPU programming, CUDA and OpenCL, require wrapping in Java, which is the environment of Hadoop, the most mature open source MapReduce implementation.