View Issue Details

IDProjectCategoryView StatusLast Update
0002515Composrcorepublic2018-02-09 23:21
ReporterChris GrahamAssigned ToChris Graham 
SeverityFeature-request 
Status closedResolutionwon't fix 
Product Version 
Fixed in Version 
Summary0002515: Drop non-utf8 (non-unicode) support
DescriptionocPortal v9 was ISO-8859-1 by default, as was HTML4. PHP has no core unicode support and we were waiting for PHP6's unicode support, but that was cancelled.

So we went with utf-8 anyway with v10, and at this point our mbstring/iconv unicode support is stable enough to not think about ever going back. mbstring/iconv has been added to the checks run during the installer and during phpinfo. HTML5 is utf-8 by default, and pretty much everyone has adopted it as "the standard".

I can't think of any reason for this to even be configurable anymore. It has a cognitive load, and a performance impact checking the character set and holding non-utf code paths around, so why not just drop non-utf8 entirely? At some point something becomes appropriate as a universal standard and we can just shake off the historic legacy of it all.

If we don't do this, we can at least replace the get_charset() function with a constant and remove the charset language string, making the charset just a _config.php setting that gets copied through to a constant. That would improve performance.
TagsRisk: Deprecates functionality, Type: Performance
Time estimation (hours)2
Sponsorship open

Activities

Chris Graham

2016-04-26 13:48

administrator   ~0003764

The performance impact might be just like 0.2% or something. But if we do as a principle remove 100 useless things that each are 0.2% that's 20%, so it's a policy of simplicity as much as anything else.

Chris Graham

2016-04-26 13:53

administrator   ~0003765

We could also drop this from HTML_HEAD.tpl:
<meta http-equiv="Content-Type" content="text/html; charset={$CHARSET*}" />

Chris Graham

2016-04-26 19:40

administrator   ~0003768

Discussion on http://compo.sr/forum/topicview/browse/deploying/dropping-non-utf8.htm

Chris Graham

2016-05-03 00:51

administrator   ~0003840

If we drop UTF8 we can drop some of the textcode symbols that Comcode supports, like the way to write a euro symbol in ISO-8859-1.

Chris Graham

2016-07-30 15:13

administrator   ~0004187

If we assume UTF-8 we can drop Comcode symbol shortcut syntax...

$shortcuts = array('(EUR-)' => '€', '{f.}' => 'ƒ', '-|-' => '†', '=|=' => '‡', '{%o}' => '‰', '{~S}' => 'Š', '{~Z}' => '&#x17D;', '(TM)' => '™', '{~s}' => 'š', '{~z}' => '&#x17E;', '{.Y.}' => 'Ÿ', '(c)' => '©', '(r)' => '®', '---' => '—', '-->' => '→', '<--' => '←', '--' => '–', '...' => '…');

Chris Graham

2016-10-19 16:08

administrator   ~0004441

We'll keep the non-utf-8 support for now, but it'll be added as non-maintained in the maintenance system.

Some people may be integrating with legacy systems where iso-8859-1 is baked really deep into the content being managed.

Maybe it can be dropped in 20 years ;-).

Issue History

Date Modified Username Field Change
2016-04-26 13:45 Chris Graham New Issue
2016-04-26 13:46 Chris Graham Tag Attached: Type: Performance
2016-04-26 13:48 Chris Graham Note Added: 0003764
2016-04-26 13:53 Chris Graham Note Added: 0003765
2016-04-26 13:59 Chris Graham Description Updated View Revisions
2016-04-26 19:40 Chris Graham Note Added: 0003768
2016-05-03 00:51 Chris Graham Note Added: 0003840
2016-05-03 00:51 Chris Graham Tag Attached: Deprecates functionality
2016-06-08 00:15 Chris Graham Tag Renamed Deprecates functionality => Risk: Deprecates functionality
2016-07-30 15:13 Chris Graham Note Added: 0004187
2016-10-19 16:08 Chris Graham Note Added: 0004441
2018-02-09 23:21 Chris Graham Status non-assigned => closed
2018-02-09 23:21 Chris Graham Assigned To => Chris Graham
2018-02-09 23:21 Chris Graham Resolution open => won't fix