1 (edited by rafat 11/09/2024 12:28:43 pm)

Topic: Translation using googletranslate

I was always interested in online translations. So I asked my friend chatgpt to help me translate empty.po to Arabic just to see how they behave as an AI gurus. They failed saying the best for me is to use googletranslate and they offered me help with a python script. The script that uses googletranslate failed also, so they suggested deep-translate. After many tries and errors I came out with this script. I use Linux Zorin which is a flavour of Ubuntu. So for the script to run one needs to install python3 and the other modules for it to work.
So here we go.

to install python 3

 sudo apt install python3 

After python 3 installation

 pip install deep-translator 

And

 pip install polib 

And here is the script which I called translate.py

import time
import re
from deep_translator import GoogleTranslator
from polib import pofile

# Load the .po file
po_file = pofile('empty.po')

# Update the Content-Type in the header to UTF-8
po_file.metadata['Content-Type'] = 'text/plain; charset=UTF-8'

# Initialize the translator
translator = GoogleTranslator(source='auto', target='ar')

# Define a regex pattern to match `&` before or after an alphanumeric character
mnemonic_pattern = re.compile(r"(?<=\w)&|&(?=\w)")

# Counters for statistics
total_entries = len(po_file)
translated_count = 0
failed_count = 0

# Start the timer
start_time = time.time()

# Process each entry in the .po file
for entry in po_file:
    # Translate if msgstr is empty
    if not entry.msgstr:
        try:
            print(f"Translating: {entry.msgid}")  # Debug print
            translated_text = translator.translate(entry.msgid)
            
            # After translating, remove mnemonics from the translated text
            entry.msgstr = re.sub(mnemonic_pattern, "", translated_text)
            
            translated_count += 1
            print(f"Translated: {entry.msgstr}")  # Debug print
            
        except Exception as e:
            print(f"Warning: Failed to translate '{entry.msgid}' due to {e}")
            entry.msgstr = "Translation Failed"
            failed_count += 1
    else:
        # If already translated, remove mnemonics from the existing translation
        entry.msgstr = re.sub(mnemonic_pattern, "", entry.msgstr)

# Ensure all msgstr values are not None before saving
for entry in po_file:
    if entry.msgstr is None:
        entry.msgstr = ""

# Calculate the elapsed time
end_time = time.time()
elapsed_time = end_time - start_time

# Convert elapsed time to hours, minutes, and seconds
hours, rem = divmod(elapsed_time, 3600)
minutes, seconds = divmod(rem, 60)

# Save the translated .po file
output_filename = 'translated_arabic.po'
try:
    po_file.save(output_filename)
    print(f"\nTranslation complete. The file is saved as '{output_filename}'")
except Exception as e:
    print(f"Error saving the file: {e}")

# Print translation statistics
print("\n--- Translation Statistics ---")
print(f"Total entries: {total_entries}")
print(f"Successfully translated: {translated_count}")
print(f"Failed to translate: {failed_count}")
print(f"Elapsed time: {int(hours)}h {int(minutes)}m {int(seconds)}s")

In the script I had to get rid of the & keyboard mnemonic as it does not work in any translation and should be changed in the code. i.e &Items is not translatable to what it should do once one presses Alt-I.

The above script is for Arabic as target='ar'
It works for most languages just by specifying the target language. i.e I tried it for Tamil target='ta'. It works. Do not forget to change the output_filename = 'translated_arabic.po' to whatever you like.


To run the scrip

 python3 translate.py 

As of all online translations its OK at about 70%. The rest is out of context and needs to be edited manually. Thats where one uses poedit to correct the 30%.

Just note. It took 40 minutes for Arabic Translation with 6 errors and 28 minutes for Tamil Translations with no errors. Total strings translated 3498.

Re: Translation using googletranslate

This is a fantastic improvement for translators. Thanks Rafat

Joe

Re: Translation using googletranslate

The removal of & in &Items can be automated as well as long as & is not used anywhere else.
It can also be a php script as part of FA just like:
https://github.com/apmuthu/frontac24/blob/master/FA24Mods/make_coa.php

Re: Translation using googletranslate

Great work

www.boxygen.pk

Re: Translation using googletranslate

Thanks for this wonderful tool, I tied it and once it was completed it gave this error:

Error saving the file: 'latin-1' codec can't encode characters in position 657-659: ordinal not in range(256)

so I used the following for saving and it worked

# Save the translated .po file with UTF-8 encoding
output_filename = 'translated_arabic.po'
try:
    with open(output_filename, 'w', encoding='utf-8') as f:
        f.write(po_file.__unicode__())  # Ensures the file is saved in UTF-8
    print(f"\nTranslation complete. The file is saved as '{output_filename}'")
except Exception as e:
    print(f"Error saving the file: {e}")

Re: Translation using googletranslate

That's a really cool script you've put together! Using deep-translator and polib to handle the .po file translation is a smart move. I've had similar experiences with online translations, they're usually good for a first pass, but manual editing is often needed for context-specific phrases. I've also used poedit for manual corrections, and it's definitely a lifesaver for fine-tuning translations. The fact that your script works for multiple languages by just changing the target language is super handy. One thing I might suggest is adding some logging or a more detailed error handling mechanism. That way, if something goes wrong, you have more information to troubleshoot. Overall, great job on the script!

Re: Translation using googletranslate

Hi guys.. I tried to do some enhancements to the script. So I removed the need to hardcore the .po file and the target language,rather will have  a prompt for both the input .po and the needed translation language. One will be given a list of supported language. At the same time one will be given the option to replace the translated file or give it a different name. Most importantly it will list the line line number of the original file where the translation has failed, at the same same time it will mention 'Translation Failed' in the resultant file.
As Linux is my main system it works perfectly there, one need to test it in Windows CLI (CMD).
I do have a sexy GUI version of the script but its a little more complicated to set the environment for it to work so I will keep it simple for the public and hard for myself.

Here we go. This is the script.

import os
import re
import time
from deep_translator import GoogleTranslator
from polib import pofile

# Define a regex pattern to match `&` only if it's adjacent to a letter or number
mnemonic_pattern = re.compile(r"(?<=\w)&|&(?=\w)")

# Function to remove mnemonics
def remove_mnemonics(text):
    """Remove `&` if it is adjacent to a letter or number."""
    return re.sub(mnemonic_pattern, "", text)

# Function to fetch supported languages from Google Translator
def get_supported_languages():
    try:
        translator = GoogleTranslator()
        return translator.get_supported_languages(as_dict=True)
    except Exception as e:
        print(f"Error fetching languages: {e}")
        return {}

# Function to map msgid to line numbers in the original file
def get_msgid_line_numbers(file_path):
    """Map each msgid to its corresponding line number."""
    line_number_mapping = {}
    with open(file_path, 'r', encoding='utf-8') as f:
        lines = f.readlines()

    current_msgid = None
    for line_num, line in enumerate(lines, start=1):
        if line.startswith('msgid "'):
            current_msgid = line.strip()[7:-1]  # Remove `msgid "` prefix and trailing `"`
            line_number_mapping[current_msgid] = line_num
    return line_number_mapping

# Function to translate a .po file
def translate_po_file(file_path, target_language):
    try:
        po_file = pofile(file_path)
    except Exception as e:
        print(f"Error loading file: {e}")
        return

    po_file.metadata['Content-Type'] = 'text/plain; charset=UTF-8'
    po_file.metadata['Language'] = target_language

    try:
        translator = GoogleTranslator(source='auto', target=target_language)
    except Exception as e:
        print(f"Error initializing translator: {e}")
        return

    line_number_mapping = get_msgid_line_numbers(file_path)
    total_entries = len(po_file)
    translated_count = 0
    failed_count = 0
    failed_translations = []
    start_time = time.time()

    for i, entry in enumerate(po_file):
        print(f"Translating {i + 1}/{total_entries}: {entry.msgid}")

        if not entry.msgstr:
            try:
                translated_text = translator.translate(entry.msgid)
                entry.msgstr = remove_mnemonics(translated_text)
                print(f"Translated: {entry.msgstr}")
                translated_count += 1
            except Exception as e:
                entry.msgstr = "Translation Failed"
                failed_count += 1
                line_number = line_number_mapping.get(entry.msgid, 'Unknown')
                failed_translations.append((entry.msgid, line_number))
                print(f"Failed to translate: {entry.msgid} (Line {line_number}) - Error: {e}")

    #    progress = int((i + 1) / total_entries * 100)
    #    print(f"Progress: {progress}%")

    # Output file logic
    output_filename = f"translated_{target_language}.po"
    output_filename = os.path.abspath(output_filename)  # Normalize for cross-platform compatibility

    if os.path.exists(output_filename):
        choice = input(f"The file '{output_filename}' already exists. Overwrite? (y/n): ").strip().lower()
        if choice != 'y':
            new_filename = input("Enter a new filename: ").strip()
            if not new_filename:
                print("Translation Cancelled.")
                return
            output_filename = os.path.abspath(new_filename)

    try:
        po_content = po_file.__unicode__()
        with open(output_filename, 'w', encoding='utf-8') as f:
            f.write(po_content)
    except Exception as e:
        print(f"Error saving file: {e}")
        return

    end_time = time.time()
    elapsed_time = end_time - start_time
    hours, rem = divmod(elapsed_time, 3600)
    minutes, seconds = divmod(rem, 60)

    print("\nTranslation Complete!")
    print(f"Total entries: {total_entries}")
    print(f"Successfully translated: {translated_count}")
    print(f"Failed translations: {failed_count}")
    print(f"Elapsed time: {int(hours)}h {int(minutes)}m {int(seconds)}s")
    print(f"File saved as: {output_filename}")
    if failed_translations:
        print("\nFailed Translations:")
        for msgid, line_number in failed_translations:
            print(f"  Line {line_number}: {msgid}")

# Function to validate and get file path
def get_file_path():
    while True:
        file_path = input("Enter the path to the .po file (or type 'cancel' to exit): ").strip()
        if file_path.lower() == 'cancel':
            print("Operation cancelled by user.")
            return None
        if os.path.exists(file_path):
            return file_path
        print("Error: File does not exist. Please try again.")

# Function to validate and get target language
def get_target_language(languages_dict):
    while True:
        target_input = input("\nEnter the target language name or code: ").strip().lower()
        
        # Check if it's a valid language code
        if target_input in languages_dict.keys():
            return target_input
        
        # Check if it's a valid language name
        for code, name in languages_dict.items():
            if target_input == name.lower():
                return code
        
        print("Invalid language input. Please enter a valid language name or code.")

# Main function for CLI
def main():
    print("PO File Translator (CLI Version)")
    
    # Get file path
    file_path = get_file_path()
    if not file_path:
        return  # User chose to cancel

    # Get supported languages
    languages_dict = get_supported_languages()
    if not languages_dict:
        print("Error: Could not fetch supported languages.")
        return

    # Display supported languages
    print("\nSupported Languages:")
    for code, lang in languages_dict.items():
        print(f"{code}: {lang}")
    
    # Get target language
    target_language = get_target_language(languages_dict)

    # Translate the file
    translate_po_file(file_path, target_language)

if __name__ == "__main__":
    main()