Best Flask open-source libraries and packages

LinguaNet A Neural Network Based Language Identification System

LinguaNet is a language identification model built on DNNs using Python and TensorFlow. It utilizes character n-grams for accurate language classification. With an 89.2% accuracy, LinguaNet effectively identifies and differentiates languages. The repository includes model details, visualization of learned features, and implementation code.
Updated 9 months ago

LinguaNet: A Neural Network Based Language Identification System

Project Overview

LinguaNet is a language identification model built on Deep Neural Networks (DNNs) using Python and TensorFlow. It employs character n-grams to classify the language of any given text input. The system is capable of ingesting user inputs and efficiently outputting the identified language.

This project includes the visualization of learned features using techniques such as t-SNE and PCA to give insights into how the network differentiates between languages. The model has demonstrated a final accuracy of 89.2%, illustrating its effectiveness in language identification.

Dependencies

Python (>=3.7)
TensorFlow (>=2.4.0)
Scikit-learn (>=0.24.0)
Numpy (>=1.19.5)
Matplotlib (>=3.3.2)

Installation

Clone the repo

git clone https://github.com/mohamzamir/LinguaNet-A-Neural-Network-Based-Language-Identification-System

Example

1. When you load the page, it is visible to you like this

Screenshot 2023-07-08 at 10 24 53 PM

2. When you click on the languages, it gives you example of the languages you can use to recognize.

Screenshot 2023-07-08 at 10 26 04 PM

3. In this example, I have put one Spanish sentence in the Langauge box.

Screenshot 2023-07-08 at 10 27 57 PM

4. When 'Submit' button is clicked, it shows the identified langauge.

Screenshot 2023-07-08 at 10 28 56 PM

Model

The model is a Deep Neural Network (DNN) implemented using Python and TensorFlow. It utilizes character n-grams to create distinct feature sets for different languages, and these features are used to classify the language of a given text input.

To gain insights into the feature representation, we visualized the learned features of the model using techniques such as t-SNE and PCA.

Performance

The LinguaNet model achieved a final accuracy of 89.2%, highlighting its effectiveness in identifying and differentiating languages based on text input.

Future Work

We plan to refine the model further, aiming to increase its accuracy and expand its scope to include more languages. Contributions and feedback are welcome.

License

Copyright [2023] [Amir Hamza]

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at:

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.