View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0003147||Composr||core||public||2017-03-21 21:58||2019-12-08 03:43|
|Reporter||Chris Graham||Assigned To|
|Fixed in Version|
|Summary||0003147: Review of cloud filesystem support|
|Description||There are a few possible approaches to automatic synching of the filesystem on the cloud:|
1) Mount the entire install on shared storage
2) Implement Composr's sync_file function, automatically detecting what change was done to a file then synching it out
3) Using a different subpath for all custom folders, mounting it under a path that is a shared storage mount (i.e. at the operating system level)
4) Using a different subpath for all custom folders, mounting it under a path that is a PHP file wrapper, and setting up so URLs under there are picked up by the Apache configuration too
6) Moving everything into the database
7) Use of an internal CDN transfer API instead of direct filesystem writing, with URLs generated according to that API (i.e. no direct correspondence between a URL and any particular file path)
It's tricky to know what to do, but we want something very architecturally clean and maintainable, not lots of different approaches needing expert configuration. If we define some design goals we can eliminate some approaches.
a) Files should be hostable on a CDN so that they may be served geographically close to the user. This will improve page load times.
b) Our CDN may not be able to host every kind of media (e.g. Cloudinary could not host non-images).
c) We need to be able to delete files.
d) It has to be reliable.
e) It has to be scalable.
f) It has to be easy to set up.
g) It can't bloat our code-base too much.
h) It has to be hard for a newbie Composr developer to forget to implement the functionality.
i) It cannot place unreasonable limitations on hardware architecture.
j) It has to have a wide compatibility with actual services people use.
k) It has to have a wide compatibility with actual web hosting people use.
We can therefore eliminate:
1 - This violates 'e' because it is a single bottleneck, and also 'i' because servers would need to be on the same cluster with a very high-performance I/O channel
2 - This violates 'h', developer's can easily forget to call sync_file (they can't if they're running ocProducts PHP, but they're probably not); it also violates 'f'
3 - This violates 'a', 'f', 'h', 'j' and 'k'
4 - This violates 'a' and 'h
5 - This probably wouldn't work at all, as rsync would not know the difference between a delete and a new file appearing on one particular server
6 - This violates 'a', 'i' and 'k' -- putting potentially GB of data into the database is not something we can reasonably expect the majority of users to accept
7 - This works, although will be a lot of work.
I think we should remove the concept of 'sync_file'. Nobody ever used it.
Then I think we need to implement '7', combined with '4'. That is we extend our current CDN transfer system so that CDN transfer hooks can accept control of any path/file-type combinations -- with a native PHP file access API using the PHP file wrappers functionality. CDN transfer hooks would sit behind our file wrapper. URLs would be converted via a conversion functions that go each way.
|Tags||ocProducts client-work (likely), Roadmap: v12, Type: Cross-cutting feature, Type: External dependency, Type: Performance|
|Time estimation (hours)||64|
|related to||0001392||closed||Chris Graham||Composr||Adding images from Photobucket, Flickr and Perhaps Facebook|
|related to||0002020||non-assigned||Composr non-bundled addons||Big syndication expansion and unification (idea staging issue)|
|related to||0002962||non-assigned||Composr||Allow backup to use cdn_transfer mechanism|
|related to||0002980||resolved||Chris Graham||Composr documentation||Social media integration review and documentation|
|related to||0003549||non-assigned||Composr||Don't rely on ID sorting for these tables|
|related to||0003856||non-assigned||Composr||Addon isolation via virtual subtrees|
|related to||0003792||non-assigned||Composr website (compo.sr)||Host on geo-distributed ARM cluster|
||A simple default implementation of a CDN transfer hook (with associated config options) would allow just mapping of files onto a particular directory path and base URL combination.|
A peripheral thing I'd like to solve with this work is case-insensitive filenames. If you develop on Mac or Windows there's a chance you'll mess up with case-mismatches but not notice until it goes to a Linux server.
The filesystem wrapper would have an option to force case-sensitivity, even if just mapping to a case-insensitive filesystem.
I gave this a lot more thought.
$GLOBALS['DYN_MANAGER'] = new DynFileManager();
$this->hook_obs = find_all_hook_obs('systems', 'dyn_file_manager');
function find_path($type, $data_class)
// May return a path that is a filesystem wrapper path; normal file operations can then be done
function find_file_path($type, $subpath, $data_class_filter = null)
function find_url($type, $subpath, $relative = false, $data_class_filter = null)
function find_unique_filename($type, $subpath)
function copy_to($tmp_path, $type, $subpath)
Instead of just "non-custom" and "custom", we now have "system", "system custom" and "user data" - and this is an override chain. Some things are probably only user data, e.g. uploads.
We move everything that changes during run-time and is shared between installs under a '_user_data' directory.
data/data_custom currently conflates too much. Have data/data_custom, resources/resources_custom, scripts/scripts_custom, logs.
uploads/website_specific will have to change, as this is not uploaded. Merge to resources_custom
Move caches and logs under a '_volatile' directory. Actually whether logs should be volatile or not should perhaps be configurable.
Document what '_volatile' and '_user_data' (i.e. _volatile is not to be synced, _user_data is). Document the whole override chain system.
Our API will allow hooks to override the functionality
Options for specifying which directories are 'system data' vs 'user data' (so you can for example decide all theme files and Comcode pages are 'system data')
Other directories are hard-coded as one or the other
Warnings if editing anything that would edit 'system data' and therefore should be done on a development level - but only if an option is enabled for these warnings
Both kinds of data would be managed via the same API
Search both locations for data, but in priority order (even for ones hard-coded - as shared installs may be using stuff as system data).
Themes and translations should definitely be system data, as otherwise it would complicate distribution of them as addons.
What about new data (e.g. a new download) that is being added at the staging stage? Programmer will have to deal with this manually.
This the time to drop non-suexec support? Put check in installer that all files are owned by web user. Quick installer will now not extract using FTP, just FS - and complain if no write access. Remove all chmodding references. Remove abstract file manager. Change written minimum requirements. Remove fix_permissions.
Auto-create missing directories.
Ability to store all user data in DB. Controlled via hidden option, function to switch between that is documented.
Merge in cdn_transfer hook functionality (broadly these will become dyn_file_manager hooks)
Another kind of hook that just listens to changes (dyn_file_manager_sync). Needs to be called by filesystem wrappers and DynFileManager functions.
Re-write the tut_optimisation tutorial. Maybe rename to tut_performance.
Document that '_user_data' can be placed under shared storage. Or you can put in DB. Or you can have an addon that puts it elsewhere (or multiple places). Or you can implement a dyn_file_manager_sync hook. Document advantages - DB may be best because it is synced across machines automatically, so minimises bottleneck.
Document to NOT try and use rsync, as there is no 'master' and thus deletes would be messed up.
get_custom_file_base and get_custom_base_url can go, as 'user data' is now same as custom data. Each install gets it's own '_user_data' directory.
In dev-mode host the entire Composr filesystem under a filesystem wrapper and give errors if file-ops are done on things that should not be.
A peripheral thing I'd like to solve with this work is case-insensitive filenames. If you develop on Mac or Windows there's a chance you'll mess up with case-mismatches but not notice until it goes to a Linux server. The filesystem wrapper would have an option to force case-sensitivity, even if just mapping to a case-insensitive filesystem.
Implement Allow backup to use cdn_transfer mechanism https://compo.sr/tracker/view.php?id=2962 (almost done automatically)
Kill upload_syndication hooks. Over complex and not really user friendly.
This is now covered on this spreadsheet: https://docs.google.com/spreadsheets/d/1_yaJeGzDIsxq33I7Wg9I-lTBDk3YS22WPBwJ971v5tI
||" The filesystem wrapper would have an option to force case-sensitivity, even if just mapping to a case-insensitive filesystem. " - this is now done in debug_fs.php, although it is only implemented as a specific debug option, and not related to the rest of the functionality discussed here.|
A fresh look at all this, which I think is both simpler and more powerful...
We have many different kinds of "content across servers" scenarios we need to support well:
1) Content Delivery Networks [CDN] (locate asset files geographically close to users to minimise site download time)
2) Server farms (spread load across multiple servers)
3) Staging servers (pushing content from a staging server to a live server)
4) Multi-site (having content on a Demonstratr-style master site available to satellite sites - e.g. site-builder scenario)
5) Git (implementing content inside a Git repository then pulling it live)
And here's how we approach them...
1) Content Delivery Networks -- promote smart CDNs that will automatically pull assets from the master site on-demand, with proper cache management (avoiding the need for the server to ever proactively push anything)
2) Server farms -- support mounting network storage onto the new smart filesystem feature or implement hooks on it
3) Staging servers -- new Sync UI feature
4) Multi-site -- new smart filesystem feature
5) Git -- no special approach needed
Have a new UI for synching between a staging site and a live site.
The details of the live site would need configuration, some kind of API-key system.
It would be laid out something like...
<th colspan="3">Repository object [sort]</th>
<th colspan="2">Modification date [sort]</th>
<th rowspan="2">Sync action</th>
<td>3rd Dec 2018 2:03 pm</td>
<td>(only on staging)</td>
<td>(only on staging)</td>
<option disabled>Leave live-only</option>
<option>Copy from staging to live (includes revision history)</option>
<option disabled>Copy from live to staging (includes revision history)</option>
<option disabled>Delete from live</option>
<option>Delete from staging</option>
<option disabled>Delete from both staging and live</option>
For this synching system to work well, we'd ideally want to completely remove IDs from Composr and replace with GUIDs. We currently do have GUIDs as an optional feature, but they're not usually used.
- (much of what is considered in this issue with having a custom class to solve it)
... have all file I/O go through a custom PHP stream-wrapper
Allow configuring into _config.php how other paths mount onto the default base directory (which may themselves by PHP stream-wrappers. FUSE-mounts, or whatever).
Allow mounting multiple paths in the same position, with precedence. This allows a multi-site scenario to work well.
All I/O operations would support hooks, so you can for example write custom sync code to sync onto different servers.
||Also we need to consider that a sync may result in a moniker conflict, and we have to somehow ask how to resolve that.|
|2017-03-21 21:58||Chris Graham||New Issue|
|2017-03-21 21:59||Chris Graham||Relationship added||related to 0001392|
|2017-03-21 21:59||Chris Graham||Tag Attached: Type: Performance|
|2017-03-21 21:59||Chris Graham||Relationship added||related to 0002020|
|2017-03-21 22:01||Chris Graham||Note Added: 0004887|
|2017-04-03 16:15||Chris Graham||Note Added: 0004947|
|2017-05-01 16:04||Chris Graham||Tag Attached: Type: Cross-cutting feature|
|2017-05-01 17:08||Chris Graham||Relationship added||related to 0002962|
|2017-05-11 11:49||Chris Graham||Tag Attached: ocProducts client-work (likely)|
|2017-06-13 13:18||Chris Graham||Note Added: 0005139|
|2017-06-13 13:41||Chris Graham||Note Edited: 0005139||View Revisions|
|2018-03-06 04:45||Chris Graham||Relationship added||related to 0002980|
|2018-03-06 05:19||Chris Graham||Note Added: 0005550|
|2018-03-06 05:53||Chris Graham||Note Edited: 0005139||View Revisions|
|2018-06-22 18:39||Chris Graham||Note Added: 0005744|
|2018-12-24 02:54||Chris Graham||Note Added: 0005892|
|2019-01-08 16:29||Chris Graham||Note Added: 0005896|
|2019-06-27 19:01||Chris Graham||Tag Attached: Roadmap: v12|
|2019-06-27 19:47||Chris Graham||Tag Attached: Type: External dependency|
|2019-07-20 02:46||Chris Graham||Relationship added||related to 0003549|
|2019-07-22 19:25||Chris Graham||Relationship added||related to 0003856|
|2019-12-08 03:43||Chris Graham||Relationship added||related to 0003792|