Kutter Martin
2004-09-08 07:34:03 UTC
Hi * !
I ran into a small internationalization issue, today.
I'm running a OI2 site with SPOPS::LDAP as backend. On storing non-ASCII
characters in the LDAP directory server, this complains that properties with
non-ASCII-characters have an "invalid syntax".
I've been able to track this down to a charset problem.
LDAP expects directoryString attributes to be in UTF-8 encoding. The
perl-ldap interface (Net::LDAP) does not provide UTF-8 conversions by
default, so these are to be done by the application using Net::LDAP. This is
no big deal - just a
use Encode;
$value = decode($charset, $value);
for all the fields to set - but one needs to know the request's charset.
The charset used in the HTTP request is specified by the "charset" attribute
in the Content-Type header.
Example:
Content-Type: multipart/formdata; boundary="--------------12345";
charset="EUC-JP"
The default is "iso-8859-1" if no charset is supplied.
The problem is, that the only available solution to get the charset used in
the request is to grab it from the underlying Apache::Request or
CGI::Request handle - not really easy and not really portable:
my $contentHeader = CTX->request->apache->headers_in()->{ Content-Type };
As different charsets in HTTP requests are very likely to happen in i18n'ed
environments, and the problem is very likely to occur in non-LDAP
environments, too, I would suggest an extension to the
OpenInteract2::Request class, that provides access to the Content-Type HTTP
header, like it already does with some other header fields.
Maybe even a more general approach - exposing all HTTP headers in the
request object - could be suitable: This would remove the need to react on
additional HTTP headers by code changes forever.
Regards,
Martin Kutter
I ran into a small internationalization issue, today.
I'm running a OI2 site with SPOPS::LDAP as backend. On storing non-ASCII
characters in the LDAP directory server, this complains that properties with
non-ASCII-characters have an "invalid syntax".
I've been able to track this down to a charset problem.
LDAP expects directoryString attributes to be in UTF-8 encoding. The
perl-ldap interface (Net::LDAP) does not provide UTF-8 conversions by
default, so these are to be done by the application using Net::LDAP. This is
no big deal - just a
use Encode;
$value = decode($charset, $value);
for all the fields to set - but one needs to know the request's charset.
The charset used in the HTTP request is specified by the "charset" attribute
in the Content-Type header.
Example:
Content-Type: multipart/formdata; boundary="--------------12345";
charset="EUC-JP"
The default is "iso-8859-1" if no charset is supplied.
The problem is, that the only available solution to get the charset used in
the request is to grab it from the underlying Apache::Request or
CGI::Request handle - not really easy and not really portable:
my $contentHeader = CTX->request->apache->headers_in()->{ Content-Type };
As different charsets in HTTP requests are very likely to happen in i18n'ed
environments, and the problem is very likely to occur in non-LDAP
environments, too, I would suggest an extension to the
OpenInteract2::Request class, that provides access to the Content-Type HTTP
header, like it already does with some other header fields.
Maybe even a more general approach - exposing all HTTP headers in the
request object - could be suitable: This would remove the need to react on
additional HTTP headers by code changes forever.
Regards,
Martin Kutter