Cross Language Translation in a Web-based Environment
Cross Language Translation in a Web-based Environment
One of the major problems with cross-language translations involving those that are rarely used together (i. e. Finish ?? Lithuanian) is that there are no dictionaries available or it is extremely difficult to find one. The main problem is that there are not enough people to create a market and no one would invest in creation of such kind of dictionaries. English-based dictionaries, however, is of abundance. This project tackled the above cited problem in Cross Language Translation using English as its base dictionary.
Artificial intelligence through Neural Networks was used as it appeared well-suited to problems of this nature. For this reason, artificial intelligence through neural networks was investigated as a potential tool to improve translation accuracy but future implementation was left as a possibility. WordNet® was also investigated as source of defining English words and possible tool to achieve greater accuracy in cross-language translations. ACKNOWLEDGMENTS
I would like to take this opportunity to thank all those who have contributed in any way, shape or form to the completion of this project report, those at Zodynai. org and the Anglia Ruskin University for their advice and support. Thanks to my friends and family who (once again) tolerated my lack of time and sometimes grumpy demeanour. More thanks to friends at Anglia Ruskin University for their ideas and criticism. Your support (directly and indirectly) is greatly appreciated. T A B L E O F C O N T E N T S ABSTRACT Acknowledgements
TABLE OF CONTENTS INTRODUCTION OBJECTIVES METHODOLOGIES DESIGN AND IMPLEMENTATION RESULTS AND DISCUSSION CONCLUSION AND EVALUATION BIBILIOGRAPHY APPENDICES A Project Specification B User/Software Installation Guide A C Software Installation Guide B D Poster and Visual Presentations E Source Code F Abbreviations and Acronyms INTRODUCTION The multilingual world and the barriers it entails. A large number of people across the world converse in English thus it serves as the primary lingua franca for developments in the research world.
Most publications and journals are published in such language. This leaves publications in other languages inaccessible and apparently, information in English is withheld from those millions who do not speak English (Diekema 2003) Recent trends promote the construction of a far-reaching complex infrastructure for transporting information across boundaries. Apparently, language shares a vital portion in the hindrances presented by National borders.
Whilst the fact is inevitable that English remains the most spoken language in the whole world and though, it is true that the spread of ‘World English’ can promote cooperation and equity, longstanding linguistic competition threatens to be even more divisive in a globalizing world. (Maurais et al n. d. ) A lot is currently going on to overcome these linguistic barriers. The most efficient approach to overcoming such is with cross-language translation and in this literature it will well be mostly in the web-based-online-dictionary aspect of such approach.
English has always been the main focus of information retrieval, well, that is by tradition. Many of them retrieval algorithms and heuristics stem from English speaking countries and thus are based on the said language. Over the years, these retrieval methods have been adopted by other language communities, creating a wide selection of language-specific monolingual retrieval systems. However, to ensure complete information exchange, information retrieval systems need to be multilingual or cross-lingual. (Diekema 2003)
There are a lot of ways to pin down the hindrance of being in this multi-lingual world, the barrier of being in a world divided by being in English- or Non-English-speaking territories. And, as presented, the most researched approach is through Cross-Language translation. OBJECTIVES Main problem that we are facing when translating with English based dictionary is connecting various language data bases. One of solutions for this program is to create a base dictionary (as seen in Figure 1). Figure 1 Other major problem is that performing translation a lot of noise is created. As seen in Figure 2.
Figure 2 We can examine this in greater detail in Figure 3. Word Autobusas is translated from Lithuanian language to Russian based on English language. Two different possible translations occur (bus, omnibus) when translating Lithuanian -> English. First word “bus” translated from English to Russian has three meanings “автобус” ”омнибус” ”шина” word translated. As first few are synonyms third one has totally different value and meant “Topology bus”. As you can see on reverse translations ”шина” will going to give you four different meanings translated in Lithuanian language.
Figure 3 We are going to investigate WordNet and Neural networks approach for possible solution of this problem. METHODOLOGIES Cross-Language Information Retrieval, its promise. Information retrieval entails an individual querying about something of interest to him. Inevitably, since we are life forms known to be ever inquisitive, we do Information Retrieval in every aspect of our living. This event so commonly happens in a lot of situation and may be best displayed in a Library when a student picks his book of choice.
Formally, let us define Information Retrieval (IR) as the process in which users with information need query a collection of documents to find those documents that satisfy his need. (Diekema 2003) In the electronic realm, the user queries by typing in related words, the system then processes these keywords to create a representation understandable by the system. In the course of the procedure, the system usually strips off non-bearing fragments of the query keywords such as articles like determiners, prepositions, and pronouns. The document collection undergoes the same process resulting to a list of document representations or a catalogue.
To find documents that are similar to the query, the ‘stripped off’ query representation is then matched against the catalogue. When a certain degree of similarity between the catalogue and the ‘stripped off’ query has been established, the documents with the uppermost similarity scores (depending on the settings, say top 10) are shown to the user as results. This occurs typically during browsing through the internet and Google. comTM best displays this example. A development of IR is CLIR – the Cross-Language Information Retrieval, which, as the name implies, is information retrieval in a multi-linguistic environment.
Consequently, CLIR techniques simplify searching by multilingual users and allow monolingual searchers to judge relevance based on machine translated results and/or to allocate expensive translation resources to the most promising foreign language documents. (Diekema 2003) Simple IR systems only consist of a Query, an Input Cleanser, a Matcher, the Document database and the Output, in logical order. The addition of Language Translators would make this system a Cross-Language Information Retrieval system.
Of course the Document database would now contain multi-lingual entries as well and the output is to be presented in the way the query has been placed in the input. Figure 4 would show the Cross-Language Information Retrieval system in schematics. The method Cross-language Retrieval Systems promises users to state their queries in their native language and retrieve documents in all the languages supported by the system. (Diekema 2003) Artificial Intelligence and Machine Learning. Artificial intelligence (AI) results to simulation of intellectual practice such as comprehension, rationalization and learning symbolic information in context.
In AI, the automation or programming of all aspects of human cognition is considered from its foundations in cognitive science through approaches to symbolic and sub-symbolic AI, natural language processing, computer vision, and evolutionary or adaptive systems. (Neumann n. d. ) AI considered being an extremely intricate domain of problems which during preliminary stages in the problem-solving phase of this nature, the problem itself may be viewed poorly. A precise picture of the problem can only be seen upon interactive and incremental refinement of course, after you have taken the initial attempt to solve the mystery.
AI always comes hand in hand with machine logistics. How else could mind act appropriately but with the body. In this case, a machine takes the part of the body. In a bit, this literature will be tackling about AI implemented through Neural Network. The author deems it necessary though to tackle Machine learning and thus the succeeding paragraphs. Machine Learning is primarily concerned with designing and developing algorithms and procedures that allow machines to “learn” – either inductive or deductive, which, in general, is its two types.
At this point, we will be referring to machines as computers since in the world nowadays, the latter are the most widely used for control. Hence, we now hone our definition of Machine Learning as the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. (Dietterich n. d. ) Machine learning techniques are grouped into different categories basing on the expected outcome.
Common types include Supervised, Unsupervised, Semi-supervised or Reinforcement learning. There is also the Transduction method and the ‘Learning to learn’ scheme. A section of theoretical computer science, Computational Learning Theory is the investigation on the computation of algorithms of Machine Learning including its efficiency. Researches on Machine Learning focuses mainly on the automatic extraction of information data, through computational and statistical methods. It is very much correlated not only to theoretical computer science as well as data mining and statistics.
Supervised learning is the simplest learning task. It is an algorithm to which it is ruled by a function that automatically plots inputs to expected outputs. The task of supervised learning is to construct a classifier given a set of classified training examples (Dietterich n. d. ). The main challenge for supervised learning is that of generalization that a machine is expected in approximating the conduct that a function will exhibit which maps out a connection towards a number of classes through comparison of IO samples of the said function.
When many plot-vector pairs are interrelated, a decision tree is derived which aids into viewing how the machine behaves with the function it currently holds. One advantage of decision trees is that, if they are not too large, they can be interpreted by humans. This can be useful both for gaining insight into the data and also for validating the reasonableness of the learned tree (Dietterich n. d. ). In unsupervised learning, manual matching of inputs is not utilized. Though, it is most often distinguished as supervised learning and it is one with an unknown output.
This makes it very hard to decide what counts as success and suggests that the central problem is to find a suitable objective function that can replace the goal of agreeing with the teacher (Hinton & Sejnowski 1999). Simple classic examples of unsupervised learning include clustering and dimensionality reduction. (Ghahramani 2004) Semi-supervised learning entails learning situations where is an ample number of labelled data as compared to the unlabelled data. These are very natural situations, especially in domains where collecting data can be cheap (i.
e. the internet) but labelling can be very expensive/time consuming. Many of the approaches to this problem attempt to infer a manifold, graph structure, or tree-structure from the unlabelled data and use spread in this structure to determine how labels will generalize to new unlabelled points. (Ghahramani 2004) Transduction is comparable to supervised learning in predicting new results with training inputs and outputs, as well as, test inputs – accessible during teaching, as basis, instead of behaving in accordance to some function.
All these various types of Machine-Learning techniques can be used to fully implement Artificial Intelligence for a robust Cross-Language translation. One thing though, this literature is yet to discuss the planned process of machine learning this research shall employ, and that is by Neural Networks. There is yet to be a precise definition as to what Artificial Neural Network is, though many researchers would agree that it concerns a network of austere processing elements – otherwise known as the neurons, which presents complex behaviour established by the relationship amongst processing and parametrical elements.
The main inspiration that lead to the development of this technique was from the investigation of, no lesser than, our Central Nervous System and the neurons (including their axons, dendrites and synapses) which make up its most important information processing elements. A neural network model would show us that simple nodes are connected forming a network of nodes — thus, its coining as “neural network. ” A Neural Network functions in 2 different manners – learning and testing.
The former would literally mean, the system learns the ways it is supposed to behave while the latter is when rigorous repetition of training would eventually result to a stable system, defined by its giving of constant satisfactory outputs. Most “abstract reasoning” of an Artificial Neural Networks are being implemented through three learning types – supervised, unsupervised and the reinforced learning, as has been introduced in the preceding paragraphs. Supervised learning entails a functional relationship between the input and the output. The system has to learn every possible IO pair that can be thought of.
In case, there is a miss, all that has to be done is to input the said pair into the memory of the system hence when it resurfaces, the system knows how the handle it. Hence, basically, the goal is to ‘teach’ the network to identify the given input with the desired output. (Sordo 2002) This is usually best achieved when function f has already been derived to represent the behaviour of the Neural Network system. For unsupervised learning, we feed an input and a function to the system and record what behaviour the system outputs with such input and function.
To begin with the learning process, there are no IO-pairs as opposed to supervised learning. Ultimately, the main goal of achieving the stable state will be attained through rigorous repetition of test with different sets of inputs. This type of systems – imploring unsupervised learning as its method of learning, are best displayed in statistical modelling, and the likes. Reinforcement learning stems its roots from the related psychological theory that has been conceived even before AI has been.
Dynamically, in this type of learning, the machine interacts with its environment by producing actions a1, a2, … These actions affect the state of the environment, which in turn results in the machine receiving some scalar rewards (or punishments) r1, r2, … The goal of the machine is to learn to act in a way that maximizes the future rewards it receives (or minimises the punishments) over its lifetime. Reinforcement learning is closely related to the fields of decision theory (in statistics and management science), and control theory (in engineering).
The fundamental problems studied in these fields are often formally equivalent, and the solutions are the same, although different aspects of problem and solution are usually emphasised. (Ghahramani 2004) Advantages of investing a system through Neural Networks. Neural networks with always have the outstanding characteristic of deriving intelligence from the usually complicated and, oftentimes, fuzzy data stored in the neurons. These systems, oftentimes, offer to be easy utilities to deduce patterns and perceive trends that are difficult to be noticed by either human observation or by our current computer intelligence.
A trained neural network is regarded as an “expert” in the category of information it has been given to analyze. This expert can then be used to provide projections given new situations of interest and answer “what if” questions. (Chung et al 2007) It is used for adaptive learning on how to handle tasks based on the input provided for training or preliminary experience. It is a self-organizational tool that hones its own picture of the data it receives in as early as learning time. Neural networks another feature is that it is a real-time operation system where all calculation may be performed in parallel.
Fault Tolerance via Redundant Information Coding is another aspect of the neural system where partial destruction of a network leads to the corresponding degradation of performance. However, some network capabilities may be retained even with major network damage. The platform to a successful implementation. Several environments can be used in totally implementing a Cross-Language Translator through with the various and fast developments in computer technology since its introduction. In the succeeding paragraphs we will be tackling some of those that has come the author’s A-list.
Microsoft . NET Framework. This framework form part of Microsoft Windows operating systems, containing a vast number of pre-coded resolutions to general program requirements, and governing the performance of programs written particularly for the framework. This framework is a vital Microsoft contribution and is projected on being utilized by most applications created and to be created for Windows platform. Pre-coded solutions outlining the framework’s Base Class Library (third layer from Operating System in the .
NET Framework) encompass a wide range of software requirements in areas including: cross language translation, user interface, database connectivity, cryptography, data access, web application growth, network communications, and numeric algorithms. This layer contains classes, value types, and interfaces that you will use often in your development process. Most notably within the . NET Framework Base Classes is ADO. NET, which provides access to and management of data. Supervising the software’s runtime requirements, this software is written for the . NET Framework implemented in an environment.
This runtime environment, which is also a part of the . NET Framework, is known as the Common Language Runtime (CLR). The CLR provides the appearance of an application virtual machine, so that programmers need not consider the capabilities of the particular CPU that will implement the program. The CLR also provides other significant services such as security mechanisms, memory management, and exception handling. The class library and the CLR together compose the . NET Framework. The . NET Framework is included with Windows Server 2003, Windows Server 2008 and Windows Vista, and can be installed on some older versions of Windows.
.NET Framework 1. 1 This is the first major . NET Framework upgrade that is accessible on its own as a redistributable package or in a software development kit since its publishing on April 3, 2003. It forms part of the second release of Microsoft Visual Studio . NET – the Visual Studio . NET 2003, and is the first version of the . NET Framework to be included as part of the Windows operating system, shipping with Windows Server 2003. 7 .NET Framework 3. 5 This version was authoritatively released to manufacturing (RTM) on November 19, 2007. As with . NET Framework 3. 0, this version applies the CLR of version 2. 0. It also installs .
NET Framework 2. 0 SP1 adding some methods and properties to the BCL classes in version 2. 0 which are vital for version 3. 5 features such as Language Integrated Query (LINQ). These changes, however, do not involve applications written for version 2. 0 and a separately, new . NET Compact Framework 3. 5 was released in hand-in-hand with this revision to give support for additional features on Windows Mobile and Windows Embedded CE devices. The source code of the Base Class Library in this version has been partially released under Microsoft Reference License. 7 .NET Framework 3. 5 builds incrementally on the new features added in .
NET Framework 3. 0 – for example, feature sets in Windows Workflow Foundation (WWF), Windows Communication Foundation (WCF), Windows Presentation Foundation (WPF) and Windows CardSpace. This version also consists of a number of new features in several technology areas which have been added as new assemblies to avoid breaking changes. They are: (a) deep integration of Language Integrated Query (LINQ) and data awareness which will let to write code written in LINQ-enabled languages to filter, enumerate, and produce projections of several types of SQL data, collections, XML, and datasets by means of the same syntax; (b) ASP.
NET AJAX 3. 5 lets you craft more resourceful, more interactive and highly-personalized Web experiences that work transversely with almost all the most popular browsers; (c) The New Web protocol sustain for building WCF services adding AJAX, JSON, REST, POX, RSS, ATOM, and several new WS-* standards; (d) Full tooling support in Visual Studio 2008 for WF, WCF, and WPF, including the new workflow-enabled services technology; and, (e) New classes in . NET Framework 3. 5 base class library (BCL) that address many common customer requests. Visual Studio 2008 and the .
NET Framework 3. 5. The Microsoft Visual Studio development system is an appropriate development instrument devised to aid developers to tackle complex problems, thus create inventive resolutions. This system’s role is to improve the development process; hence, achieving breakthroughs would be easier and more satisfying. Using the Microsoft Visual Studio Development system will be very productive for this project since it would continually deliver better ways for cross language translations with less energy and with ease from other software.
It has efficient form of code editors, IntelliSense, Wizards, and multiple coding languages in one integrated development environment (IDE) to high-tech applications in life- cycle management. New versions of Visual Studio keep bringing innovative tools to help developers focus on solving problems without wasting time. With this development system, software developers gain from an integrated product experience that spans tools, servers, and services.
Visual Studio products work well together with other Microsoft software, such as Microsoft server products and the Microsoft Office system. Visual Studio offers a comprehensive choice of tools for all phases of software development, testing, deployment, integration, and management. Every kind of software developer, from novice to skilled professional, can use the Visual Studio because it is engineered to support the development across all types of devices such as PCs, servers, the Web, and mobile devices.
Visual Studio is the most reliable tool that is engineered and tested to be always dependable, secure, interoperable, and compatible. Visual Studio offers an unparalleled combination of security features, scalability, and interoperability. Although Visual Studio always incorporates forward- thinking features, it is designed to ensure backward-compatibility everywhere possible.
Being a set of technology capabilities, core products, and best practice guidance, the Microsoft Application Platform (MAP) focuses on aiding IT and development business partners to maximize opportunity. As one of its core products, Visual Studio has always and continues to help spearhead for the right customer links, business efficiencies, and value-added services through provision of a fully integrated and single development environment for all types of advances, including Microsoft Windows, Microsoft Office, Web, and mobile applications.