Topic: Yet another problem with utf-8

Hi, I have read the myriad of posts concerning encoding issues, and from my understanding all this has been resolved in FA2.4

However I have the following issue:

All accents in Spanish language display as should be on screen, however a company name 'Psicopedagógico' note the accented second 'ó' is causing me problems when sending invoice by email. Its being rejected with the following error:-
    SMTP error from remote mail server after end of data:
    550 From contains invalid characters.

Changing the accented 'ó' to a normal unaccented 'o' and emails with invoices work perfectly, so the problem is clearly the accented 'ó'.

I have tried changing in class.mail.inc line 41
from
var $charset = 'iso-8859-1';
to
var $charset = 'utf-8'; //also tried utf8

But the same problem occurs.
Have logged in and out after changes and cleared cache on server and browser.

Setup info:-

This is on a clean install of V2.4.1
Theme default
Database 10.0.27-MariaDB-cll-lve
Datebase encoding - utf8_unicode_ci
Language encoding in po file - utf-8
Language encoding in installed_languages.inc - 'encoding' => 'utf-8',
Server - Linux
Shared hosting.

Am i doing something wrong or is this a bug?

Re: Yet another problem with utf-8

Make sure that the collation of the db is the same as that of the specific field(s) used here and that the said locale is installed in the OS (Linux).

Re: Yet another problem with utf-8

I ran this command in phpmyadmin:-

SHOW FULL COLUMNS FROM 0_sys_prefs

Which gave the following results

Profiling [ Edit inline ] [ Edit ] [ Create PHP code ] [ Refresh ]

+ Options
Field     Type     Collation     Null     Key     Default     Extra     Privileges     Comment    
name     varchar(35)     utf8_unicode_ci     NO     PRI             select,insert,update,references    
category     varchar(30)     utf8_unicode_ci     YES     MUL     NULL        select,insert,update,references    
type     varchar(20)     utf8_unicode_ci     NO                 select,insert,update,references    
length     smallint(6)     NULL    YES         NULL        select,insert,update,references    
value     text     utf8_unicode_ci     NO         NULL        select,insert,update,references

So all seems fine there.

However my hosting company list there only locale as:-

LANG=en_US.UTF-8

And state that all languages will run without a problem.......hmmmmmm

I need a technical argument to refute this - if this is the problem?

Re: Yet another problem with utf-8

Try:

SHOW VARIABLES LIKE 'collation%';
SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';

If they are different, look in your /etc/[mysql/]my.cnf, find the contents below near collation_server:

[mysqld]
init_connect='SET collation_connection = utf8_unicode_ci'
init_connect='SET NAMES utf8'
character-set-server=utf8
collation-server=utf8_unicode_ci 
skip-character-set-client-handshake

Re: Yet another problem with utf-8

The error suggests that it is your 'from' email address of your company that has issues, not the name or email address that you are sending to.  Do you have the full headers of the email that bounced that you can post?  Possibilities are:

- FA is not correctly encoding the from email address in sent mail.
- The remote email server does not support utf8 (very unlikely).

Cambell https://github.com/cambell-prince

Re: Yet another problem with utf-8

I have run the following:-
SHOW VARIABLES LIKE 'collation%'
Profiling [ Edit inline ] [ Edit ] [ Create PHP code ] [ Refresh ]

+ Options
Variable_name     Value    
collation_connection     utf8mb4_unicode_ci
collation_database     utf8_unicode_ci
collation_server     utf8_unicode_ci


Query results operations
Print Print Copy to clipboard Copy to clipboard Create view Create view

Current selection does not contain a unique column. Grid edit, checkbox, Edit, Copy and Delete features are not available. Documentation
Your SQL query has been executed successfully.
SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%'
Profiling [ Edit inline ] [ Edit ] [ Create PHP code ] [ Refresh ]

+ Options
Variable_name     Value    
character_set_client     utf8mb4
character_set_connection     utf8mb4
character_set_database     utf8
character_set_filesystem     binary
character_set_results     utf8mb4
character_set_server     utf8
character_set_system     utf8
collation_connection     utf8mb4_unicode_ci
collation_database     utf8_unicode_ci
collation_server     utf8_unicode_ci

So all seems ok.
I am unable to create a my.cnf file as do not have access to /etc/

Re: Yet another problem with utf-8

Hi campbell, thanks for replying,

Please note that simply removing the accented 'o' in the company name makes the invoice by email function work perfectly; I can change email addresses to either a server defined email addresses or remote email addresses and all works correctly: Therefore the problem must be the accented 'o' in the company name.

Would you agree? Any help very welcome.

Re: Yet another problem with utf-8

The db collation (utf8_unicode_ci) is different from the connection collation (utf8mb4_unicode_ci).

Try some url encoding.

Use the quoted string format:
"psicopedagógico"@gmail.com

Re: Yet another problem with utf-8

I have tried setting class.mail.inc line 41
from
var $charset = 'iso-8859-1';
to
var $charset = 'utf-8'; //also tried utf8
and
var $charset = 'utf8_unicode_ci';
and
var $charset = 'utf8mb4_unicode_ci';


Not sure what you mean about url encoding?

The problem is in company name in company configuration, not in the email address: I have tried quoted format, this returns '&quot' in the email company name, I've also tried psicopedagógico

I assume:-
HTML Entity (decimal)     ó
HTML Entity (hex)     ó
will have the same result as ó unless you can tell me otherwise?

Last edited by poncho1234 (07/07/2017 06:10:09 pm)

Re: Yet another problem with utf-8

To eliminate the database encoding issue (if this is the problem) I hard coded the company name in class.mail.inc line 49 from:-

$this->header = "From: $name <$mail>\n";

to

$this->header = "From: 'Psicopedagógico' <$mail>\n";

I first tried with a normal 'o' and all emails were sent perfectly.

When I replaced the normal 'o' with an accented 'ó', emails failed to send with the same fault code:-
    SMTP error from remote mail server after end of data:
    550 From contains invalid characters.


For me this says that the problem is in the encoding of the email header not with the database.

I re-tried:-

var $charset = 'iso-8859-1';
to
var $charset = 'utf-8'; //also tried utf8
and
var $charset = 'utf8_unicode_ci';
and
var $charset = 'utf8mb4_unicode_ci';

All failed. I've since read that all SMPT headers should be ASCII encoded. (I did try ASCII & even 8BITMIME but maybe unsurprisingly they failed too as believe they don't include an accent 'ó')

I'm now looking for where the encoding from charset happens, And I'can't seem to find it?

Last edited by poncho1234 (07/07/2017 11:52:46 pm)

Re: Yet another problem with utf-8

Ok, I have a partial solution which looks promising:-

From my understanding $name is not encoded so accented char will never work, this post at stackexchange pointed me in the right direction, so changing class.mail.inc line 49 from:-

$this->header = "From: $name <$mail>\n";

to

$this->header = "From: =?UTF-8?B?" .base64_encode($name)."?=";

does work, BUT:-

does NOT include the newline \n so the from $name is followed by the Bcc: bccemailaddress@example.com

So I tried to introduce a new line with the following code:-

$this->header = "From: =?UTF-8?B?" .base64_encode($name)."?=", "\n";
and
$this->header = "From: =?UTF-8?B?" .base64_encode($name)."?=". "\n";

but neither of these work as the from is now the base64 code

If anyone knows a little more php than me can help add a new line please do, will sleep on it.

Last edited by poncho1234 (07/08/2017 02:01:25 am)

Re: Yet another problem with utf-8

From the same link:

You might want to use mb_send_mail(). It uses mail() internally, but encodes subject and body of the message automatically. Again, use with care.

The mb_encode_mimeheader() method is also listed.

Re: Yet another problem with utf-8

Working solution:-

Replace class.mail.inc line 49 from:-

$this->header = "From: $name <$mail>\n";

with either:-

$this->header = "From: =?UTF-8?B?" .base64_encode($name) ."?=". "<$mail>\n";

OR

$this->header = "From: =?UTF-8?Q?" .imap_8bit($name) ."?=". "<$mail>\n"; //you will need the imap module enabled

depending if you want base64 or imap

You should now be able to use any diacritical in your company name.

@apmuthu I'm struggling with the  mb_send_mail() & mb_encode_mimeheader(), from my understanding this encodes the whole header: When I got it to work it actually used my servers email address as the from, whereas in the above solution the actual servers email address is hidden and the company email address is shown as per FA functionality.  Can you help?

Last edited by poncho1234 (07/08/2017 02:47:14 pm)

Re: Yet another problem with utf-8

@joe?

Re: Yet another problem with utf-8

We will have to consult Janusz with this. I will.

/Joe

Re: Yet another problem with utf-8

The other item that needs to be sorted out is the subject; the following works for me; replace class.mail.inc line 77 from:-

function subject($subject)
    {
        $this->subject = $subject;
    }

to:-

function subject($subject)
    {
        $this->subject = "=?UTF-8?B?" . base64_encode($subject) ."?=";
    }

or

function subject($subject)
    {
        $this->subject = "=?UTF-8?Q?" .imap_8bit($subject) ."?=";//you will need the imap module enabled
    }

I believe this only leaves the email addresses, which 'generally' are not allowed to include diacriticals. (though there is some debate about this - Google 'Are accents allowed in email addresses' - for starters)


Very interested in your comments.


What is the history / background? ie Did company name and email subject function with accents before the change to utf-8?

Re: Yet another problem with utf-8

RFC 5336 - SMTP Extension for Internationalized Email Addresses.

IMAP
====

if (extension_loaded('imap')) {

....
....

}

Re: Yet another problem with utf-8

@apmuthu it says its experimental? Are there FA users with unusable email addresses? If yes can you point me to the relevant post or give an example.

I will try to do an if / else tomorrow. Trying to think how I will test it... any ideas?

Last edited by poncho1234 (07/11/2017 01:27:37 am)

Re: Yet another problem with utf-8

Looking at this Stack Overflow answer I would suggest adding the 'Content-Type: text/plain; charset=UTF-8' header to the 'body' parameter in the call to the mail function.

It seems from this post that support for unicode email addresses in the local part is relatively new. e.g. Postfix supported it in 2014.  It very much depends on what email software your server (or ISP) has installed to support the sending of email.  I suspect that if you use gmail, or google apps for business then you wouldn't have any problems (assuming the content header is set correctly).  Using Postfix for example you would need to ensure that version 3.0 or greater is installed.  In the debian world this would be Debian Stretch.  Wheezy and Jessie are both on Postfix 2.x.  You can find further information on SMTPUTF8 support and links to relevant RFC on the Postfix SMTPUFT8 page.

To be safe, I would suggest that all email addresses in *your* organization use the 128 lowercase ascii characters as required under the old standard.

Last edited by cambell (07/11/2017 02:09:00 am)

Cambell https://github.com/cambell-prince

Re: Yet another problem with utf-8

function email($name, $mail)
    {
        $this->boundary = md5(uniqid(time()));

        if (extension_loaded('imap')) {
            $this->header = "From: =?UTF-8?Q?" .imap_8bit($name) ."?=". "<$mail>\n";
        }
        else
            $this->header = "From: =?UTF-8?B?" .base64_encode($name) ."?=". "<$mail>\n";

        $bcc = get_company_pref('bcc_email');
        if ($bcc)
            $this->bcc[] = $bcc;
    }

and

function subject($subject)
    {
        if (extension_loaded('imap')) {
            $this->subject = "=?UTF-8?Q?" .imap_8bit($subject) ."?=";
        }
        else
            $this->subject = "=?UTF-8?B?" . base64_encode($subject) ."?=";

    }

Functions and tested

@campbell, Hi, the problem is not with the body, its with the header and the subject.

Re: Yet another problem with utf-8

Sorry, I mean the additional_headers parameter, not the body.  From the php manual:

"Note: When sending mail, the mail must contain a From header. This can be set with the additional_headers parameter, or a default can be set in php.ini. Failing to do this will result in an error message similar to Warning: mail(): "sendmail_from" not set in php.ini or custom "From:" header missing. The From header sets also Return-Path under Windows."

The content type header should be honored if the mail server supports it.

Cambell https://github.com/cambell-prince

Re: Yet another problem with utf-8

@campbell, tried most of day to get content type working, but thanks for suggestion.