It has "always" been wrong, and we have thus been using the default of "pickle" - not "json" as we thought.
I doubt this change has any immediate visible impact. I guess it only means that it will use json for results instead of pickle. That might be more stable and debuggable.
Note: celery_config will uppercase the config settings and replace '.' with '_', so 'celery.result.serializer' turns into 'CELERY_TASK_SERIALIZER'.
mails: make error reporting by mail work with secure mail servers
Even with Kallithea mails working, TurboGears / backlash error reporting would fail like:
Error while reporting exception with <backlash.tracing.reporters.mail.EmailReporter object at 0x7f8f986f8710> Traceback (most recent call last): File ".../env/lib/python3.7/site-packages/backlash/tracing/reporters/mail.py", line 49, in report result = server.sendmail(self.from_address, self.error_email, msg.as_string()) File "/usr/lib64/python3.7/smtplib.py", line 867, in sendmail raise SMTPSenderRefused(code, resp, from_addr) smtplib.SMTPSenderRefused: (530, b'5.7.0 Must issue a STARTTLS command first.', 'kallithea@example.com')
tests: remove race condition in test_forgot_password
One in so many times, test_forgot_password failed with:
kallithea/tests/functional/test_login.py:427: in test_forgot_password assert '\n%s\n' % token in body E assert ('\n%s\n' % 'd71ad3ed3c6ca637ad00b7098828d33c56579201') in "Password Reset Request\n\nHello passwd reset,\n\nWe have received a request to reset the password for your account.\n\nTo s...7e89326ca372ade1d424dafb106d824cddb\n\nIf it weren't you who requested the password reset, just disregard this message.\n"
i.e. the expected token is not the one in the email.
The token is calculated based on a timestamp (among others). And the token is calculated twice: once in the real code and once in the test, each time on a slightly different timestamp. Even though there is flooring of the timestamp to a second resolution, there will always be a race condition where the two timestamps floor to a different second, e.g. 4.99 vs 5.01.
The problem can be reproduced reliably by adding a sleep of e.g. 2 seconds before generating the password reset mail (after the test has already calculated the expected token).
Solve this problem by mocking the time.time() used to generate the timestamp, so that the timestamp used for the real token is the same as the one used for the expected token in the test.
scripts: handle "Python 2.7 reached the end of its life" message
The script failed with:
Error: pip detected following problems: DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. A future version of pip will drop support for Python 2.7. More details about Python 2 support in pip, can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support
Thus, clarify the use of i18n.lang (refining f2f7a8c1281e and 8931078f70db) and set 'en' as default value on app startup.
TurboGears requires an (empty) translation for the source language which is default for i18n.lang . The empty .mo for en is created as the 4 magic .mo bytes followed by lengths of 0: printf '\x95\x04\x12\xde\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0' > kallithea/i18n/en/LC_MESSAGES/kallithea.mo
login: fix incorrect CSRF rejection of "Reset Your Password" form (Issue #350)
htmlfill would remove the CSRF token from the form when substituting the query parameters, causing password reset to break.
By default, htmlfill will clear all input fields that doesn't have a new "default" value provided. It could be fixed by setting force_defaults to False - see http://www.formencode.org/en/1.2-branch/modules/htmlfill.html . It could also be fixed by providing the CSRF token in the defaults to be substituted in the form.
Instead, refactor password_reset_confirmation to have more explicitly safe handling of query parameters. Replace htmlfill with the usual template variables.
The URLs are generated in kallithea/model/user.py send_reset_password_email() and should only contain email, timestamp (integer as digit string) and a hex token from get_reset_password_token() .
logging: always invoke fileConfig with '__file__' and 'here'
WSGI servers tend to provide '__file__' and 'here' as 'defaults' when invoking fileConfig, so '%(here)s' string interpolation also can be used in logging configuration.
Make sure we do the same when we initialize logging without using a WSGI server.
It is annoying to have to do this, and it will only in rare cases make any difference ... but it seems like the best option.
logging: drop fileConfig initialization in make_app - backout 0d4dd9380a45
0d4dd9380a45 was a bit harmful, as it might overwrite existing good logging configuration.
0d4dd9380a45 no longer seems relevant: Testing shows that logging for `gearbox serve` *is* activated anyway. gearbox/commands/serve.py will invoke "setup_logging" right before "loadapp".
We must and can assume that logging has been initialized before make_app.
Reported and based on analysis by Wolfgang Scherer.
Essentially a backout of d2a97f73fa1f and the 4851d15bc437_db_migration_step_after_95c01895c006_ alembic step.
We can't reliably have full index on fields with unbounded length. The upgrade step has been reported to fail on MySQL [1]:
sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (1170, "BLOB/TEXT column 'public_key' used in key specification without a key length") [SQL: u'CREATE INDEX usk_public_key_idx ON user_ssh_keys (public_key)'] (Background on this error at: http://sqlalche.me/e/e3q8)
And we really don't need this index ... especially now when we use fingerprints for key deletion instead of looking up by the full public key.
ssh: make it clear that SshKeyModel.delete has user as mandatory parameter
It is already provided in the two uses: kallithea/controllers/admin/my_account.py: SshKeyModel().delete(fingerprint, request.authuser.user_id) kallithea/controllers/admin/users.py: SshKeyModel().delete(fingerprint, c.user.user_id)
When attempting to use ed25519 SSH keys, parse_pub_key() failed with: SshKeyParseError: Incorrect SSH key - base64 part is not 'ssh-ed25519' as claimed but 'ssh-ed25519'
The problem was the hardcoding of the string length of the key type -- 7 or '\x07' -- which fits ssh-rsa and ssh-dss but not ssh-ed25519.
scripts/make-release: install ldap and pam to fix isort instabilities
isort (triggered by scripts/whitespacecleanup.sh) needs to know which modules are local and which are not, to separate them with a newline. If a module cannot be found, it will be treated as local, apparently.
When ldap is not installed in the current environment, a difference was created by isort in kallithea/bin/ldap_sync.py.
Commit 04dee6fdfdff fixed an apparent problem detected by isort, but as Mads Kiilerich pointed out, it was caused by an incomplete virtualenv that did not include 'python-ldap'. As a result, isort would not consider 'ldap' as a standard module and treated it as a local module.
A subsequent commit will fix the 'make-release' script to install all expected dependencies.
admin: fix 'Settings > Visual' form validation after commit 574218777086
Commit 574218777086 introduced a setting for 'SSH Clone URL' in 'Admin > Settings > Visual' and placed it under a check 'if c.ssh_enabled', which means that the corresponding form field is not present when SSH is not enabled. In this case, when trying to save the form (changing any or none setting), form validation reports an error 'Missing value' without much detail. At the top of the HTML document, even before the opening HTML tag, we can see:
changeset: fix XSS vulnerability in parent-child navigation
The 'Parent Rev.' - 'Child Rev.' links on changesets and in the file browser normally immediately jump to the correct revision upon click. But, if there are multiple candidates, e.g. two children of a commit, then a list of revisions is shown as hyperlinks instead.
These hyperlinks have a 'title' attribute containing the full commit message of the corresponding commit. When this commit message contains characters special to HTML, like ", >, etc. they were added literally to the HTML code.
This can lead to a cross-site scripting (XSS) vulnerability when an attacker has write access to a repository. They could craft a special commit message that would introduce HTML and/or JavaScript code when the commit is listed in such 'parent-child' navigation links.
Escape the commit message before using it further.
The search feature did not correctly escape all arguments when displaying search matches and linking to the corresponding files.
An attacker that can control the contents of a repository could thus cause a cross-site scripting (XSS) vulnerability.
Fix the problem by removing the overall h.literal call that is only needed for the HTML entity » and splitting the link instead.
We take the opportunity to improving the destination of the part before » which is the path to the repository. Instead of pointing to the search result, point to the repository itself. The part after » remains linked to the file containing the search match.
Reported by Bob Hogg <wombat@rwhogg.site> (thanks!).
docs: tweak documentation of Apache+mod_wsgi further
Make the last bullet not only about WSGIScriptAlias but about the entire set of WSGI* directives in the Apache Virtual Host configuration file. This fits better with the already existing explanation of changing user/group settings, and with the upcoming hints about locale settings.
docs: move Apache+mod_wsgi example code to the corresponding bullets
The documentation about Apache+mod_wsgi has bullet points with inline snippets, yet the example WSGI dispatch script is placed at the bottom of the section instead of near its corresponding bullet.
It seems more readable and more according to the logical setup flow to move the code next to its bullet.
Due to the additional indentation required to 'attach' the code to the bullet, this commit is best viewed with the 'ignore whitespace changes' setting.
templates: narrow down scope of webhelpers.html.literal for HTML injection
When using webhelpers.html.literal to inject some explicit HTML code with some variable data, there are two approaches: h.literal('some <html> code with %s data' % foobar) or h.literal('some <html> code with %s data') % foobar
In the first case, the literal also applies to the contents of variable 'foobar' which may be influenceable by users and thus potentially malicious. In the second case, this term will be escaped by webhelpers.
In files_browser.html, the correction of this scope of literal() also means that explicit escaping of node.name can be removed. The escaping is now done automatically by webhelpers as mentioned above.
templates/files: narrow down scope of webhelpers.html.literal
In the 'Show Authors' functionality on a file of a repository, the following construct: h.literal(ungettext('..A..') % (..B..))
can be simplified. Here, literal was used to cater for explicit HTML tags in the (..B..) part only. There is no need to apply literal on the '..A..' part.
A better structure of this code is: h.HTML(ungettext('..A..')) % h.literal(..B..)
Note that we still need to wrap the '..A..' part in webhelpers.html.HTML to make sure the '%' operator will preserve the 'literal' property.
See also the documentation: (the text below for 'literal' also applies to 'HTML') https://docs.pylonsproject.org/projects/webhelpers/en/latest/modules/html/builder.html " When literal is used in a mixed expression containing both literals and ordinary strings, it tries hard to escape the strings and return a literal. However, this depends on which value has “control” of the expression. literal seems to be able to take control with all combinations of the + operator, but with % and join it must be on the left side of the expression. So these all work:
templates, controllers: replace webhelpers.html.literal() with webhelpers.html.HTML() where possible
Usage of webhelpers.literal (h.literal) can be a problem when variables are not correctly escaped. Luckily, this function can be avoided in several cases.
Several users of the construct: h.literal(_('..A..') % (..B..))
can be simplified if (..B..) just contains a call to h.link_to. In this case, there is actually no need to use h.literal, because the object returned by link_to is already a literal. It is sufficient to use webhelpers.html.HTML() like so: h.HTML(_('..A..')) % (..B..)
which is better because it will escape the '..A..' part instead of passing it literally.
The need to wrap the '..A..' part in HTML() is to make sure the (escaped) end result is not a plain string but a 'literal' to avoid double escaping later.
See also the documentation: https://docs.pylonsproject.org/projects/webhelpers/en/latest/modules/html/builder.html " When literal is used in a mixed expression containing both literals and ordinary strings, it tries hard to escape the strings and return a literal. However, this depends on which value has “control” of the expression. literal seems to be able to take control with all combinations of the + operator, but with % and join it must be on the left side of the expression. So these all work:
This same escaping with 'HTML()' was already done by default in mako templates for constructs like ${_("something")} that do not contain format specifiers. When the translated string _does_ contain format specifiers, we want to use the same escaping, but we have to do it explicit and earlier so the escaping happens already when strings are inserted into the template string.
The following code is unnecessarily complex: h.literal(_('There are no files yet. %s') % add_new)
First of all, the '%s' part in the translatable string is a whole new sentence, independent of the first. There is no reason it needs to be part of the same translatable string.
Secondly, the only reason for h.literal is to preserve the link in 'add_new' (which contains the result of 'h.link_to'). But, h.link_to actually already is a 'literal' object. The problem is that the special 'literal' property is lost due to the coercion into a plain string via the '%' operator.
The following code would be a possible solution for the second issue: h.HTML(_('There are no files yet. %s')) % add_new i.e. make sure that the format string is not a plain string but itself a literal object (after its contents being escaped), before applying the '%' operator.
To handle the first issue, this would become: h.HTML(_('There are no files yet.')) + ' ' + h.HTML(add_new) but, here h.HTML is unnecessary on the first string because there is nothing special about it, and equally unnecessary on the 'add_new' variable because h.link_to already returns a literal object.
So, the final code becomes: _('There are no files yet.') + ' ' + add_new
The call to ugettext (_) is there to obtain a translated string. It may contain format specifiers like '%s'. If the code is as follows:
_('User-facing string with %s data' % data)
then the string will never be translated in the application, because the variable 'data' is replaced in the string _before_ being passed to the translation function '_' (ugettext).
The correct code is:
_('User-facing string with %s data') % data
so that we first get the translated string and _then_ substitute the variable 'data'.
Note: the string extraction logic (used to extract all user-facing strings to pass to translators) happily accepted the bad code, ignoring the string substitution. It would have been better if it had warned loudly about this problem.
files: fix ignored navigation back to initial page
The popstate event for going back to the pageload page has no state and was ignored. Instead, in that case, use a "fake" initial state based on template values.
Sometimes, when reusing cached data, the DOM part of select2 activation would be reused, but not the actual activation. Repeated select2 activation would thus give duplicated DOM entries.
Select2 doeesn't seem to have a good way to unload or redo, so instead just try to remove the old DOM parts.
settings: rework logic for flash message after repository scan
Make the code more readable by extracting added_msg and removed_msg away from the h.flash() call and reindenting the logic. There are no functional changes here.
These changes serve as preparatory work for a subsequent commit that will change the logic.
Commit d66201a7ce6e ("files: change "callbacks" function to the more descriptive name "post_load_state" and let it take an actual state data object") broke the 'Show Authors' button when visiting a file in the files browser. The button normally shows a count of authors and their avatars, but after the aforementioned commit it did nothing.
Following patch would have fixed the problems in commit d66201a7ce6e:
diff --git a/kallithea/templates/files/files_source.html b/kallithea/templates/files/files_source.html --- a/kallithea/templates/files/files_source.html +++ b/kallithea/templates/files/files_source.html @@ -81,11 +81,13 @@ <script> $(document).ready(function(){ var state = { + data: { node_list_url: node_list_url.replace('__REV__',${h.js(c.changeset.raw_id)}).replace('__FPATH__', ${h.js(h.safe_unicode(c.file.path))}), url_base: url_base.replace('__REV__',${h.js(c.changeset.raw_id)}), rev: ${h.js(c.changeset.raw_id)}, f_path: ${h.js(h.safe_unicode(c.file.path))} + } } - post_load_state(State.data); // defined in files.html + post_load_state(state.data); // defined in files.html }); </script>
But, later the code got refactored more, and commit 006d68c4d7b9 ("files: use the web browsers built-in js history instead of native.history.js") broke the feature further: the click handler for the button no longer got installed on the 'document-ready' event, but only when a new 'state' is loaded. And it seems there is never a situation where a new state preserves the button, so it makes no sense installing the click handler at that moment.
Instead, move the click handler back to the 'document-ready' event.
cli: fill in git_hook_interpreter at 'config-create' time to really fix potentially invalid interpreter in git hooks (Issue #333)
When generating a configuration file using 'kallithea-cli config-create', it is probably using the right Python interpreter, so fill in the current Python executable as 'git_hook_interpreter' in the generated file.
There should thus rarely be any need to configure this manually, and issue #333 will *really* be fixed.
As this causes an absolute path to be encoded inside the ini file, moving the virtualenv will require updating this path.
For development.ini we do not want to hardcode any path and are happy to leave it using the old heuristics at runtime.
hooks: make the Python interpreter for Git hooks configurable as 'git_hook_interpreter' (Issue #333)
Commit 5e501b6ee639 introduced the use of 'sys.executable' as interpreter for git hooks instead of 'python2' with the following argument:
"Windows doesn't necessarily have "python2" available in $PATH, but we still want to make sure we don't end up invoking a python3. Using the absolute path seems more safe."
But, sys.executable does not necessarily point to Python. When Kallithea is started under uWSGI, sys.executable points to the uwsgi executable. As a result, the interpreter encoded in the git hooks on the server repositories would be:
#!/path/to/uwsgi
And pushing to such repo would result in following client errors:
$ git push Password for 'http://user@localhost:5050': Enumerating objects: 3, done. Counting objects: 100% (3/3), done. Writing objects: 100% (3/3), 241 bytes | 241.00 KiB/s, done. Total 3 (delta 0), reused 0 (delta 0) remote: unable to load configuration from hooks/pre-receive To http://localhost:5050/gitrepo-new ! [remote rejected] master -> master (pre-receive hook declined) error: failed to push some refs to 'http://user@localhost:5050/gitrepo-new'
Fix this problem by introducing a configuration setting 'git_hook_interpreter' that allow administrators to specify which Python interpreter to use.
A subsequent commit will cause its value to be filled in automatically when generating a new ini file, but an administrator can always override it.
The use of /usr/bin/env is only needed for relative arguments (or to pass variables in the environment, which we don't do). It is thus not needed in case the Python interpreter for Git hooks is known with absolute path, as introduced in 5e501b6ee639.
hooks: add intermediate function _get_git_hook_interpreter
The logic to determine the right interpreter for Git hooks is about to change and possibly become more complex. Split it off in a separate function so such changes do not require extra code duplication and preserve the readability of the code.
In TurboGears2 2.3.12, the latest version in the 2.3.x range, the WebOb dependency requirement is [1] WebOb >= 1.2, < 1.8.0
In TurboGears2 2.4.0 (which is in pre-release state at the time of this commit), this becomes [2]: WebOb >= 1.8.0, < 1.10.0
In the Kallithea dependency list, we have matched our WebOb version requirements to that of TurboGears2 and use: WebOb >= 1.7, < 1.8
while our TurboGears2 requirement was liberal and accepted anything in the 2.x range: TurboGears2 >= 2.3.10, < 3
To avoid new Kallithea installations failing with conflicting WebOb version requirements after TurboGears2 2.4.0 is released, restrict the version of TurboGears2 to 2.3.x on the stable branch.
For the default branch, the update to TurboGears2 2.4.0 can be considered once it's released.
Commit f2f7a8c1281e changed the i18n-related ini settings to match TurboGears2. Even though the commit message correctly refers to 'i18n.enabled', the code incorrectly used 'i18n.enable'.
compare: correct display of special branch names in initial placeholder
When a branch name contains special characters like '<' or '>', and a 'compare' operation is performed with such branch as one of the two compare sides, then the special branch name will be part of the URL, e.g.
The encoded branch name is then used at page load as placeholders for the branch selection dropdowns. But, the special characters, were escaped too much, causing '<' to become < in the display of the dropdown.
The placeholder was escaped via the default mako escape filter, before being passed to make_revision_dropdown, thus too early. We want the raw value. h.js() (copied from the default branch) gives us that, while still formatting and escaping the string so it is safe inside the script tag.
compare: prevent XSS due to unescaped branch/tag/bookmark names
In the revision selection dropdown of the 'Compare' functionality, the branch/tag/bookmark names were not correctly escaped.
This means that if an attacker is able to push a branch/tag/bookmark containing HTML/JavaScript in its name, then that code would be evaluated. This is a cross-site scripting (XSS) vulnerability.
Fix the problem by correctly escaping the branch/tag/bookmarks.
templates/summary: escape branch/tag/bookmark names in 'Download as zip' links to prevent XSS
On a repository summary page, in the 'Download' section where you can download an archive of the repository at a given revision, the branch/tag names were not correctly escaped.
This means that if an attacker is able to push a branch/tag/bookmark containing HTML/JavaScript in its name, then that code would be evaluated. This is a cross-site scripting (XSS) vulnerability.
Fix the problem by correctly escaping the branch/tag/bookmarks.
Reported by Bob Hogg <wombat@rwhogg.site> (thanks!).
lib: sanitize HTML for all types of README rendering, not only markdown
The repository summary page will display a rendered version of the repository 'readme' based on its file extension. In commit 5746cc3b3fa5, the rendered output was already sanitized when the input was markdown. However, also readmes written in other formats, like ReStructuredText (RST) or plain text could have content that we want sanitized.
Therefore, move the sanitizing one level up so it covers all renderers, for now and the future.
This fixes an XSS issue when a repository readme contains javascript code, which would be executed when the repository summary page is visited by a user.
Reported by Bob Hogg <wombat@rwhogg.site> (thanks!).
cleanup: remove unnecessary (and potentially problematic) use of 'literal'
webhelpers.html.literal (kallithea.lib.helpers.literal) is only needed when the passed string may contain HTML that needs to be interpreted literally. It is unnecessary for plain strings.
Incorrect usage of literal can lead to XSS issues, via a malicious user controlling data which will be rendered in other users' browsers. The data could either be stored previously in the system or be part of a forged URL the victim clicks on.
For example, when a user browses to a forged URL where a repository changeset or branch name contains a javascript snippet, the snippet was executed when printed on the page using 'literal'.
Remaining uses of 'literal' have been reviewed with no apparent problems found.
Reported by Bob Hogg <wombat@rwhogg.site> (thanks!).
pullrequests: prevent XSS in 'Potential Reviewers' list when first and last names cannot be trusted
If a user first or last name contains javascript, these fields need proper escaping to avoid XSS attacks.
An example scenario is: - the malicious user creates a repository. This will cause this user to be listed automatically under 'Potential Reviewers' in pull requests. - another user creates a pull request on that repository and selects the suggested reviewer from the 'Potential Reviewers' list.
Reported by Bob Hogg <wombat@rwhogg.site> (thanks!).
Technical note: the other caller of addReviewMember in base.js itself does _not_ need to be adapted to escape the input values, because the input values (oData) are _already_ escaped (by the YUI framework).