auth: move CSRF checks from the optional LoginRequired to the more basic BaseController._before
_before is not called for the CSRF-immune JSON-API controller and is thus a good place to check CSRF. This also apply CSRF protection to the login controller.
The flag for needing CSRF checking is stored in the thread global request object when passed from __call__ to _before for regular controllers. It is thus also set for requests to the JSON-RPC controller, but not used.
auth: simplify API key auth - move it out of _determine_auth_user
This gives less of the special handling of API key auth in LoginRequired ... but we still need to disable the LoginRequired CSRF protection for API key auth.
tests: prepare for adding CSRF protection on login forms
CSRF is about avoiding abuse of credentials by doing things in existing sessions. The login form does not have any previous credentials, so there is nothing to abuse and no real need for CSRF protection. But there is still an unauth session, so we *can* have CSRF protection.
CSRF protection is currently in LoginRequired (which obviously isn't applied to the login form), but let's prepare for changing that.
tests: make test_admin_users user_and_repo_group_fail() fixture more stable
When adding authentication_token() to log_user(), database session lifetime will in some cases change:
test_admin_users test_delete_repo_group_err() use the user_and_repo_group_fail() fixture.
Before, it got ObjectDeletedError when trying to delete a deleted RepoGroup and moved on.
After changing log_user(), py.test would emit a warning:
kallithea/tests/functional/test_admin_users.py::TestAdminUsersController::()::test_delete_repo_group_err .../site-packages/sqlalchemy/orm/persistence.py:1340: SAWarning: DELETE statement on table 'groups' expected to delete 1 row(s); 0 were matched. Please set confirm_deleted_rows=False within the mapper configuration to prevent this warning. % (table.description, expected, rows_matched)
Instead, use RepoGroup.get_by_group_name to verify the group exists before trying to delete it.
It seems like other ways of tracking authentication state are better. AuthUser is a *potentially* authenticated user. We prefer to keep it as that, without modifying the AuthUser object if the user actually should be authenticated.
The primariy indicator that a user is authenticated is when the AuthUser is set as request.authuser .
(Alternatively, we could create an AuthenticatedUser sub-class and move things like access control checks there. That would help ensuring it is used correctly, without having to check an is_authenticated flag.)
auth: drop "multiple_counter" from computing permissions
This seems to have been something about having some permissions override existing permissions. It is not clear to me why anybody should want that.
test_user_group_permissions_on_repo_groups.py seems to have been testing for something we don't want. The new behaviour seems more reasonable. The test user is inhering access from the default user, and thus in this case getting read access (except when private).
auth: minor refactoring of computation of admin access for repo owners
Make the flow slightly simpler ... and now when permissions are merged, we only have to set repo owner access once.
BUT: because multiple_counter, we actually don't merge permissions in all cases. This will thus introduce a regression that will be fixed in next changeset.
auth: explicit user permission should not blindly overrule permissions through user groups
Before, explicit permissions of a user could shadow higher permissions that would otherwise be obtained through a group the user is member of. That was confusing and fragile: *removing* a permission could then suddenly give a user *more* permissions.
Instead, change the flag for controlling internal permission computation to *not* use "explicit". Permissions will then add up, no matter if they are explicit or through groups.
The change in auth.py is small, but read the body of __get_perms to see the actual impact ... and also the clean-up changeset that will come next.
This might in some cases be a behaviour change and give users more access ... but it will probably only give the user that was intended. This change can thus be seen as a bugfix.
Some tests assumed the old behaviour. Not for good reasons, but just because that is how they were written. These tests are updated to expect the new behaviour, and it has been reviewed that it makes sense.
Note that this 'explicit' flag mostly is for repo permissions and independent of the 'user_inherit_default_permissions' that just was removed and is about global permissions.
auth: global permissions given to the default user are the bare minimum and should apply to *all* other users too
Drop the "subtractive permission" config option "inherit_from_default" that when set to false would give users less global permissions than the default unauthenticated user.
Instead, think positive and merge all positive permissions.
At the end, filter the global permissions to make sure we for each kind of permissions only keep the one with most weight.
The core of the functionality is to process a list of "raw id"s, log them, and update / invalidate caches.
handle_git_post_receive and scm _handle_push already provide that list directly. Things get much simpler when introducing a new function (process_pushed_raw_ids) just for processing pushed raw ids. That also makes it clear that scm _handle_push doesn't need any repo.
log_push_action remains the native entry point for the Mercurial hook. It was not entirely correct using 'node:tip' - after Mercurial 3.7 and d6d3cf5fda6f, it should be 'node:node_last'.
After several trivial refactorings, it turns out that the logic for creating the hash list for Mercurial actually is very simple ...
locking: drop the pull-to-lock / push-to-unlock functionality
The feature is not worth the maintenance cost. The locking is too coarse and unflexible with insufficient UI and UX. The implementation is also quite invasive in tricky areas of the code, and thus high maintenance. Dropping this will enable other cleanup ... or at least make it easier.
Sometimes, test_delete_ip would fail because UserIpMap entries left behind. It was perhaps because a commit was missing and sessions thus sometimes were leaked?
kallithea/tests/functional/test_forks.py:35: in teardown_method Session().delete(self.u1) data/env/lib/python2.7/site-packages/sqlalchemy/orm/session.py:1871: in delete self._delete_impl(state, instance, head=True) data/env/lib/python2.7/site-packages/sqlalchemy/orm/session.py:1888: in _delete_impl self.identity_map.add(state) data/env/lib/python2.7/site-packages/sqlalchemy/orm/identity.py:149: in add orm_util.state_str(state), state.key)) E InvalidRequestError: Can't attach instance <User at 0x7f93d2f81a10>; another instance with key (<class 'kallithea.model.db.User'>, (10,), None) is already present in this session.
changeset: fix XSS vulnerability in parent-child navigation
The 'Parent Rev.' - 'Child Rev.' links on changesets and in the file browser normally immediately jump to the correct revision upon click. But, if there are multiple candidates, e.g. two children of a commit, then a list of revisions is shown as hyperlinks instead.
These hyperlinks have a 'title' attribute containing the full commit message of the corresponding commit. When this commit message contains characters special to HTML, like ", >, etc. they were added literally to the HTML code.
This can lead to a cross-site scripting (XSS) vulnerability when an attacker has write access to a repository. They could craft a special commit message that would introduce HTML and/or JavaScript code when the commit is listed in such 'parent-child' navigation links.
Escape the commit message before using it further.
The search feature did not correctly escape all arguments when displaying search matches and linking to the corresponding files.
An attacker that can control the contents of a repository could thus cause a cross-site scripting (XSS) vulnerability.
Fix the problem by removing the overall h.literal call that is only needed for the HTML entity » and splitting the link instead.
We take the opportunity to improving the destination of the part before » which is the path to the repository. Instead of pointing to the search result, point to the repository itself. The part after » remains linked to the file containing the search match.
Reported by Bob Hogg <wombat@rwhogg.site> (thanks!).
docs: tweak documentation of Apache+mod_wsgi further
Make the last bullet not only about WSGIScriptAlias but about the entire set of WSGI* directives in the Apache Virtual Host configuration file. This fits better with the already existing explanation of changing user/group settings, and with the upcoming hints about locale settings.
docs: move Apache+mod_wsgi example code to the corresponding bullets
The documentation about Apache+mod_wsgi has bullet points with inline snippets, yet the example WSGI dispatch script is placed at the bottom of the section instead of near its corresponding bullet.
It seems more readable and more according to the logical setup flow to move the code next to its bullet.
Due to the additional indentation required to 'attach' the code to the bullet, this commit is best viewed with the 'ignore whitespace changes' setting.
templates: narrow down scope of webhelpers.html.literal for HTML injection
When using webhelpers.html.literal to inject some explicit HTML code with some variable data, there are two approaches: h.literal('some <html> code with %s data' % foobar) or h.literal('some <html> code with %s data') % foobar
In the first case, the literal also applies to the contents of variable 'foobar' which may be influenceable by users and thus potentially malicious. In the second case, this term will be escaped by webhelpers.
In files_browser.html, the correction of this scope of literal() also means that explicit escaping of node.name can be removed. The escaping is now done automatically by webhelpers as mentioned above.
templates/files: narrow down scope of webhelpers.html.literal
In the 'Show Authors' functionality on a file of a repository, the following construct: h.literal(ungettext('..A..') % (..B..))
can be simplified. Here, literal was used to cater for explicit HTML tags in the (..B..) part only. There is no need to apply literal on the '..A..' part.
A better structure of this code is: h.HTML(ungettext('..A..')) % h.literal(..B..)
Note that we still need to wrap the '..A..' part in webhelpers.html.HTML to make sure the '%' operator will preserve the 'literal' property.
See also the documentation: (the text below for 'literal' also applies to 'HTML') https://docs.pylonsproject.org/projects/webhelpers/en/latest/modules/html/builder.html " When literal is used in a mixed expression containing both literals and ordinary strings, it tries hard to escape the strings and return a literal. However, this depends on which value has “control” of the expression. literal seems to be able to take control with all combinations of the + operator, but with % and join it must be on the left side of the expression. So these all work:
templates, controllers: replace webhelpers.html.literal() with webhelpers.html.HTML() where possible
Usage of webhelpers.literal (h.literal) can be a problem when variables are not correctly escaped. Luckily, this function can be avoided in several cases.
Several users of the construct: h.literal(_('..A..') % (..B..))
can be simplified if (..B..) just contains a call to h.link_to. In this case, there is actually no need to use h.literal, because the object returned by link_to is already a literal. It is sufficient to use webhelpers.html.HTML() like so: h.HTML(_('..A..')) % (..B..)
which is better because it will escape the '..A..' part instead of passing it literally.
The need to wrap the '..A..' part in HTML() is to make sure the (escaped) end result is not a plain string but a 'literal' to avoid double escaping later.
See also the documentation: https://docs.pylonsproject.org/projects/webhelpers/en/latest/modules/html/builder.html " When literal is used in a mixed expression containing both literals and ordinary strings, it tries hard to escape the strings and return a literal. However, this depends on which value has “control” of the expression. literal seems to be able to take control with all combinations of the + operator, but with % and join it must be on the left side of the expression. So these all work:
This same escaping with 'HTML()' was already done by default in mako templates for constructs like ${_("something")} that do not contain format specifiers. When the translated string _does_ contain format specifiers, we want to use the same escaping, but we have to do it explicit and earlier so the escaping happens already when strings are inserted into the template string.
The following code is unnecessarily complex: h.literal(_('There are no files yet. %s') % add_new)
First of all, the '%s' part in the translatable string is a whole new sentence, independent of the first. There is no reason it needs to be part of the same translatable string.
Secondly, the only reason for h.literal is to preserve the link in 'add_new' (which contains the result of 'h.link_to'). But, h.link_to actually already is a 'literal' object. The problem is that the special 'literal' property is lost due to the coercion into a plain string via the '%' operator.
The following code would be a possible solution for the second issue: h.HTML(_('There are no files yet. %s')) % add_new i.e. make sure that the format string is not a plain string but itself a literal object (after its contents being escaped), before applying the '%' operator.
To handle the first issue, this would become: h.HTML(_('There are no files yet.')) + ' ' + h.HTML(add_new) but, here h.HTML is unnecessary on the first string because there is nothing special about it, and equally unnecessary on the 'add_new' variable because h.link_to already returns a literal object.
So, the final code becomes: _('There are no files yet.') + ' ' + add_new
The call to ugettext (_) is there to obtain a translated string. It may contain format specifiers like '%s'. If the code is as follows:
_('User-facing string with %s data' % data)
then the string will never be translated in the application, because the variable 'data' is replaced in the string _before_ being passed to the translation function '_' (ugettext).
The correct code is:
_('User-facing string with %s data') % data
so that we first get the translated string and _then_ substitute the variable 'data'.
Note: the string extraction logic (used to extract all user-facing strings to pass to translators) happily accepted the bad code, ignoring the string substitution. It would have been better if it had warned loudly about this problem.
files: fix ignored navigation back to initial page
The popstate event for going back to the pageload page has no state and was ignored. Instead, in that case, use a "fake" initial state based on template values.
Sometimes, when reusing cached data, the DOM part of select2 activation would be reused, but not the actual activation. Repeated select2 activation would thus give duplicated DOM entries.
Select2 doeesn't seem to have a good way to unload or redo, so instead just try to remove the old DOM parts.
settings: rework logic for flash message after repository scan
Make the code more readable by extracting added_msg and removed_msg away from the h.flash() call and reindenting the logic. There are no functional changes here.
These changes serve as preparatory work for a subsequent commit that will change the logic.
Commit d66201a7ce6e ("files: change "callbacks" function to the more descriptive name "post_load_state" and let it take an actual state data object") broke the 'Show Authors' button when visiting a file in the files browser. The button normally shows a count of authors and their avatars, but after the aforementioned commit it did nothing.
Following patch would have fixed the problems in commit d66201a7ce6e:
diff --git a/kallithea/templates/files/files_source.html b/kallithea/templates/files/files_source.html --- a/kallithea/templates/files/files_source.html +++ b/kallithea/templates/files/files_source.html @@ -81,11 +81,13 @@ <script> $(document).ready(function(){ var state = { + data: { node_list_url: node_list_url.replace('__REV__',${h.js(c.changeset.raw_id)}).replace('__FPATH__', ${h.js(h.safe_unicode(c.file.path))}), url_base: url_base.replace('__REV__',${h.js(c.changeset.raw_id)}), rev: ${h.js(c.changeset.raw_id)}, f_path: ${h.js(h.safe_unicode(c.file.path))} + } } - post_load_state(State.data); // defined in files.html + post_load_state(state.data); // defined in files.html }); </script>
But, later the code got refactored more, and commit 006d68c4d7b9 ("files: use the web browsers built-in js history instead of native.history.js") broke the feature further: the click handler for the button no longer got installed on the 'document-ready' event, but only when a new 'state' is loaded. And it seems there is never a situation where a new state preserves the button, so it makes no sense installing the click handler at that moment.
Instead, move the click handler back to the 'document-ready' event.
cli: fill in git_hook_interpreter at 'config-create' time to really fix potentially invalid interpreter in git hooks (Issue #333)
When generating a configuration file using 'kallithea-cli config-create', it is probably using the right Python interpreter, so fill in the current Python executable as 'git_hook_interpreter' in the generated file.
There should thus rarely be any need to configure this manually, and issue #333 will *really* be fixed.
As this causes an absolute path to be encoded inside the ini file, moving the virtualenv will require updating this path.
For development.ini we do not want to hardcode any path and are happy to leave it using the old heuristics at runtime.
hooks: make the Python interpreter for Git hooks configurable as 'git_hook_interpreter' (Issue #333)
Commit 5e501b6ee639 introduced the use of 'sys.executable' as interpreter for git hooks instead of 'python2' with the following argument:
"Windows doesn't necessarily have "python2" available in $PATH, but we still want to make sure we don't end up invoking a python3. Using the absolute path seems more safe."
But, sys.executable does not necessarily point to Python. When Kallithea is started under uWSGI, sys.executable points to the uwsgi executable. As a result, the interpreter encoded in the git hooks on the server repositories would be:
#!/path/to/uwsgi
And pushing to such repo would result in following client errors:
$ git push Password for 'http://user@localhost:5050': Enumerating objects: 3, done. Counting objects: 100% (3/3), done. Writing objects: 100% (3/3), 241 bytes | 241.00 KiB/s, done. Total 3 (delta 0), reused 0 (delta 0) remote: unable to load configuration from hooks/pre-receive To http://localhost:5050/gitrepo-new ! [remote rejected] master -> master (pre-receive hook declined) error: failed to push some refs to 'http://user@localhost:5050/gitrepo-new'
Fix this problem by introducing a configuration setting 'git_hook_interpreter' that allow administrators to specify which Python interpreter to use.
A subsequent commit will cause its value to be filled in automatically when generating a new ini file, but an administrator can always override it.
The use of /usr/bin/env is only needed for relative arguments (or to pass variables in the environment, which we don't do). It is thus not needed in case the Python interpreter for Git hooks is known with absolute path, as introduced in 5e501b6ee639.
hooks: add intermediate function _get_git_hook_interpreter
The logic to determine the right interpreter for Git hooks is about to change and possibly become more complex. Split it off in a separate function so such changes do not require extra code duplication and preserve the readability of the code.
In TurboGears2 2.3.12, the latest version in the 2.3.x range, the WebOb dependency requirement is [1] WebOb >= 1.2, < 1.8.0
In TurboGears2 2.4.0 (which is in pre-release state at the time of this commit), this becomes [2]: WebOb >= 1.8.0, < 1.10.0
In the Kallithea dependency list, we have matched our WebOb version requirements to that of TurboGears2 and use: WebOb >= 1.7, < 1.8
while our TurboGears2 requirement was liberal and accepted anything in the 2.x range: TurboGears2 >= 2.3.10, < 3
To avoid new Kallithea installations failing with conflicting WebOb version requirements after TurboGears2 2.4.0 is released, restrict the version of TurboGears2 to 2.3.x on the stable branch.
For the default branch, the update to TurboGears2 2.4.0 can be considered once it's released.
Commit f2f7a8c1281e changed the i18n-related ini settings to match TurboGears2. Even though the commit message correctly refers to 'i18n.enabled', the code incorrectly used 'i18n.enable'.
.hgtags: remove accidental unused double tag of 0.4.0rc1
Initially, when tagging 0.4.0rc1 I made a mistake locally, fixed it, then stripped the tagging commit, and retagged. However, it seems something went wrong in this procedure and the original commit also is shown in .hgtags.
This commit never got pushed, is hidden (obsolete) in my local repository, so remove the corresponding line in .hgtags to avoid any confusion.
git: fix handling of submodules that are not in the repo root (Issue #337)
GitChangeset.get_nodes() did not handle submodules correctly if they were not located in the repository root. The file .gitmodules was searched in the 'tree' object being handled, which would be a subdirectory, while the real .gitmodules file is in the root tree of the repository.
Instead of using 'name', the 'path' should be used.
This problem was noticed during indexing of such repositories.
sphinx 1.7.9 has requirement babel!=2.0,>=1.3, but you'll have babel 0.9.6 which is incompatible. sphinx 1.7.9 has requirement docutils>=0.11, but you'll have docutils 0.8.1 which is incompatible. sphinx 1.7.9 has requirement Pygments>=2.0, but you'll have pygments 1.5 which is incompatible.
Fix these by bumping the minimum versions of these dependencies.
this duplication replaced useful diagnostic message from pip with less useful ones. For example, the following message was displayed when the Babel dependency duplication is present:
kallithea 0.4.0rc1 has requirement Babel<2.7,==0.9.6, but you'll have babel 2.6.0 which is incompatible.
When removing the duplication in dev_requirements.txt, this becomes:
sphinx 1.7.9 has requirement babel!=2.0,>=1.3, but you'll have babel 0.9.6 which is incompatible.
which makes it clear that to solve this problem, we need to bump the minimum dependency for Babel in setup.py from 0.9.6 to 1.3.
Use less spacing after comment panels inside comment sections (10px from @kallithea-panel-margin instead of 20px) to hint that the comments are related ... and also to avoid using too much vertical space.
Also show Comment/Cancel buttoms closer to the comment form ("well") they are related to.