View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0002515 | Composr | core | public | 2016-04-26 13:45 | 2018-02-09 23:21 |
Reporter | Chris Graham | Assigned To | Chris Graham | ||
Severity | Feature-request | ||||
Status | closed | Resolution | won't fix | ||
Product Version | |||||
Fixed in Version | |||||
Summary | 0002515: Drop non-utf8 (non-unicode) support | ||||
Description | ocPortal v9 was ISO-8859-1 by default, as was HTML4. PHP has no core unicode support and we were waiting for PHP6's unicode support, but that was cancelled. So we went with utf-8 anyway with v10, and at this point our mbstring/iconv unicode support is stable enough to not think about ever going back. mbstring/iconv has been added to the checks run during the installer and during phpinfo. HTML5 is utf-8 by default, and pretty much everyone has adopted it as "the standard". I can't think of any reason for this to even be configurable anymore. It has a cognitive load, and a performance impact checking the character set and holding non-utf code paths around, so why not just drop non-utf8 entirely? At some point something becomes appropriate as a universal standard and we can just shake off the historic legacy of it all. If we don't do this, we can at least replace the get_charset() function with a constant and remove the charset language string, making the charset just a _config.php setting that gets copied through to a constant. That would improve performance. | ||||
Tags | Risk: Deprecates functionality, Type: Performance | ||||
Time estimation (hours) | 2 | ||||
Sponsorship open | |||||
|
The performance impact might be just like 0.2% or something. But if we do as a principle remove 100 useless things that each are 0.2% that's 20%, so it's a policy of simplicity as much as anything else. |
|
We could also drop this from HTML_HEAD.tpl: <meta http-equiv="Content-Type" content="text/html; charset={$CHARSET*}" /> |
|
Discussion on http://compo.sr/forum/topicview/browse/deploying/dropping-non-utf8.htm |
|
If we drop UTF8 we can drop some of the textcode symbols that Comcode supports, like the way to write a euro symbol in ISO-8859-1. |
|
If we assume UTF-8 we can drop Comcode symbol shortcut syntax... $shortcuts = array('(EUR-)' => '€', '{f.}' => 'ƒ', '-|-' => '†', '=|=' => '‡', '{%o}' => '‰', '{~S}' => 'Š', '{~Z}' => 'Ž', '(TM)' => '™', '{~s}' => 'š', '{~z}' => 'ž', '{.Y.}' => 'Ÿ', '(c)' => '©', '(r)' => '®', '---' => '—', '-->' => '→', '<--' => '←', '--' => '–', '...' => '…'); |
|
We'll keep the non-utf-8 support for now, but it'll be added as non-maintained in the maintenance system. Some people may be integrating with legacy systems where iso-8859-1 is baked really deep into the content being managed. Maybe it can be dropped in 20 years ;-). |
Date Modified | Username | Field | Change |
---|---|---|---|
2016-04-26 13:45 | Chris Graham | New Issue | |
2016-04-26 13:46 | Chris Graham | Tag Attached: Type: Performance | |
2016-04-26 13:48 | Chris Graham | Note Added: 0003764 | |
2016-04-26 13:53 | Chris Graham | Note Added: 0003765 | |
2016-04-26 13:59 | Chris Graham | Description Updated | View Revisions |
2016-04-26 19:40 | Chris Graham | Note Added: 0003768 | |
2016-05-03 00:51 | Chris Graham | Note Added: 0003840 | |
2016-05-03 00:51 | Chris Graham | Tag Attached: Deprecates functionality | |
2016-06-08 00:15 | Chris Graham | Tag Renamed | Deprecates functionality => Risk: Deprecates functionality |
2016-07-30 15:13 | Chris Graham | Note Added: 0004187 | |
2016-10-19 16:08 | Chris Graham | Note Added: 0004441 | |
2018-02-09 23:21 | Chris Graham | Status | non-assigned => closed |
2018-02-09 23:21 | Chris Graham | Assigned To | => Chris Graham |
2018-02-09 23:21 | Chris Graham | Resolution | open => won't fix |