scripts/i18n: also normalize casing of UTF-8 in Content-Type
f626260a376c introduced invariant msgmerge casing. Do the same when normalizing to ensure consistency also without msgmerge and to avoid unnecessary conflicts.
scripts/i18n: introduce --merge-pot-file to control normalization
There are actually *two* kinds of normalization:
- in main branches, where we just want the translations - not any trivially derived information or temporary or unstructured data. - in i18n branches, where we want the trivially derived information, and also want to preserve any other information there might be in the .po files.
If no pot file is specifed, do it as on the main branches and strip everything but actual translations. This mode will primarily be used when grafting or rebasing changes from i18n branches.
When a pot file is specified, run GNU msgmerge with it on the po files. The pot file should ideally be fully updated (as done by extract_messages). That will establish a common baseline, leaving only the essential changes as needing merge.
If merging from default branches to 18n, it is better to skip .po and .pot in first 'hg merge' pass, while resolving everything else. Then, with the uncommitted merge, run 'extract_messages', and then merge the .po files using --merge-pot-file kallithea/i18n/kallithea.pot .
(Actually, these two different modes could perhaps be auto detected ...)
scripts/i18n: add command 'normalized-merge' for use with Mercurial's 'merge-tool' option
Add a 'normalized-merge' command to scripts/i18n that will first normalize the i18n files contributing to the merge, then perform a standard merge. If that merge fails (e.g. due to real conflicts) the normalized files are left behind, and the user needs to run another merge tool manually and resolve the merge of these.
Use by putting following snippets in your .hgrc file:
The translation files in the Kallithea repository contained references to the location(s) of each string in the repository. This is useful to translators, but is not needed for all other users.
The big problem with that information is that it changes very commonly as a result of normal development in Kallithea, causing a lot of unimportant delta in the Kallithea repository, thus causing unnecessary repository growth.
In this commit, a basic version of the script is added, only containing the code to normalize the translation files by removing generated and outdated data.
This can be used to check or ensure internal consistency between code and translations, by extracting and merging and then removing most of it again with normalize-po-files:
./setup.py extract_messages
for po in kallithea/i18n/*/LC_MESSAGES/kallithea.po; do msgmerge --width=76 --backup=none --previous --update $po kallithea/i18n/kallithea.pot ; done
scripts/i18n: introduce new i18n maintenance script
The translation files in the Kallithea repository contained references to the location(s) of each string in the repository. This is useful to translators, but is not needed for all other users.
The big problem with that information is that it changes very commonly as a result of normal development in Kallithea, causing a lot of unimportant delta in the Kallithea repository, thus causing unnecessary repository growth.
A script 'i18n' is added to help maintain the i18n files. Functionality will be added later.
Traceback (most recent call last): File "scripts/update-copyrights.py", line 45, in <module> from . import contributor_data ImportError: attempted relative import with no known parent package
Fixed by backing out changeset 2786730e56e0. The pytype problem can be solved differently.
user: make get_by_username_or_email default to treat username case insensitive
The get_by_username_or_email is a flexible function, intended to find users in multiple ways, suitable for login prompts. The function was sometimes used with case sensitive user lookup, sometimes without. Instead, be consistent and just default to be insensitive.
auth: show a clear "Authentication failed" message if login fails after passing form validation
log_in_user will only set a session cookie after verifying that the user is valid (for example based on IP). The code is thus safe, but no hint were given to the user if login failed for that reason.
login: assert that the validated user actually is found
Due to another bug, it was possible that authentication succeeded but the user object couldn't be obtained. This was for example noticed when the LDAP auth module did not correctly parse the email attribute, and a login via email was attempted. In this case, the user was retrieved from email address and LDAP found the user, but the email attribute in the Kallithea database was then changed incorrectly and a subsequent retrieval based on the same original email address would not find the user.
Such problem would lead to an assert in Kallithea:
File ".../kallithea/controllers/login.py", line 104, in index auth_user = log_in_user(user, c.form_result['remember'], is_external_auth=False, ip_addr=request.ip_addr) File ".../kallithea/lib/base.py", line 122, in log_in_user assert not user.is_default_user, user AttributeError: 'NoneType' object has no attribute 'is_default_user'
This assert cought the problem but is not a spot-on indicator of the real problem. Instead, we can catch this problem sooner by adding an assert already in the login controller.
Viewing the two-way diff of an added file gives following exception:
File "_base_root_html", line 211, in render_body
File "_base_base_html", line 42, in render_body
File "files_diff_2way_html", line 197, in render_main
File ".../kallithea/lib/vcs/nodes.py", line 411, in is_binary return b'\0' in self.content TypeError: 'in <string>' requires string as left operand, not bytes
At this point, self.content was '' (empty string).
Due to a missing conversion from bytes to unicode for the attribute values obtained from LDAP, storing the values in a unicode field in the database would fail. It would apparently either store a repr of the bytes or store them in some other way.
Upon user login, SQLAlchemy warned about this:
.../sqlalchemy/sql/sqltypes.py:269: SAWarning: Unicode type received non-unicode bind param value b'John'. (this warning may be suppressed after 10 occurrences) .../sqlalchemy/sql/sqltypes.py:269: SAWarning: Unicode type received non-unicode bind param value b'Doe'. (this warning may be suppressed after 10 occurrences)
In PostgreSQL, this would result in 'weird' values for first name, last name, and email fields, both in the database and the web UI, e.g. firstname: \x4a6f686e lastname: \x446f65 email: \x6a6f686e406578616d706c652e636f6d These values represent the actual values in hexadecimal, e.g. \x4a6f686e = 0x4a 0x6f 0x68 0x6e = J o h n
In SQLite, the problem initially shows differently, as an exception in gravatar_url():
File "_base_root_html", line 207, in render_body
File "_index_html", line 78, in render_header_menu
File "_base_base_html", line 479, in render_menu
File ".../kallithea/lib/helpers.py", line 908, in gravatar_div gravatar(email_address, cls=cls, size=size))) File ".../kallithea/lib/helpers.py", line 923, in gravatar src = gravatar_url(email_address, size * 2) File ".../kallithea/lib/helpers.py", line 956, in gravatar_url .replace('{email}', email_address) \ TypeError: replace() argument 2 must be str, not bytes
but nevertheless the root cause of the problem is the same.
Fix the problem by converting the LDAP attributes from bytes to strings.
hg: handle Mercurial RepoError correctly in is_valid_repo_uri
RepoError would be leaked by is_valid_repo_uri.
Now, when for example validating an ssh URL without having the ssh client binary, it will be shown/logged as: remote: /bin/sh: ssh: command not found 2020-03-17 17:28:53.907 WARNI [kallithea.model.validators] validation of clone URL 'ssh://no-ssh.com/' failed: Mercurial RepoError: no suitable response from remote hg and shown in the UI as 'Invalid repository URL'.
repos: separate repo creation from form validation
The broad catching of Exception in the repo creation controller is conceptually bad. It also caused misleading "Error creating repository None" when form validation failed with anything but formencode.Invalid . For now, just constrain the broad exception handling to only cover repo creation. It is a bug if form validation fails in unexpected ways, and we want it reported as a crash that we can fix.
hg: always show and run Mercurial hooks in alphabetical order (Issue #246)
Mercurial will generally run hooks in the order they are found in the configuration. For entries found in the database, there is no such order. Instead, always use alphabetical order for these.
Since we now want to order things explicitly in the db query, we want an index with a composite key. We do that even though we don't really need it for the few entries in this table, and even though it might/could use the same index as the existing unique constraint. This composite UniqueConstraint was added in b9f4b444a172 where it replaced a wrong UniqueConstraint that could/should have been removed in c25191aadf92. Fix that while touching this area and running a migration script.
hg: set HGPLAIN globally to minimize potential impact from reading hgrc configuration when creating ui
See `hg help scripting` for HGPLAIN description ... but Kallithea will invoke Mercurial internals directly, so HGPLAIN shouldn't really impact anything.
Traceback (most recent call last): File "data/env/bin/isort", line 10, in <module> sys.exit(main()) File ".../env/lib64/python3.7/site-packages/isort/main.py", line 379, in main for sort_attempt in attempt_iterator: File ".../env/lib64/python3.7/site-packages/isort/main.py", line 377, in <genexpr> attempt_iterator = (sort_imports(file_name, **arguments) for file_name in file_names) File ".../env/lib64/python3.7/site-packages/isort/main.py", line 88, in sort_imports result = SortImports(file_name, **arguments) File ".../env/lib64/python3.7/site-packages/isort/isort.py", line 207, in __init__ self._add_formatted_imports() File ".../env/lib64/python3.7/site-packages/isort/isort.py", line 606, in _add_formatted_imports self._add_from_imports(from_modules, section, section_output, sort_ignore_case) File ".../env/lib64/python3.7/site-packages/isort/isort.py", line 526, in _add_from_imports import_statement = self._multi_line_reformat(import_start, from_import_section, comments) File ".../env/lib64/python3.7/site-packages/isort/isort.py", line 552, in _multi_line_reformat dynamic_indent, indent, line_length, comments) File ".../env/lib64/python3.7/site-packages/isort/isort.py", line 705, in _output_grid if len(next_statement.split(self.line_separator)[-1]) + 1 > line_length: TypeError: '>' not supported between instances of 'int' and 'str'
validators: don't catch all Exceptions as invalid clone URIs, be specific
When adding a new repository with a remote clone URI, the URI will be validated in some way. Several exceptions could occur during that validation.
Previously, the code would catch based on 'Exception', which means that _any_ exception would cause the URI to be found invalid. This means that errors in the code (e.g. related to Python 3 conversion) were also categorized as 'invalid clone URI'. And thus, the tests that test an actually invalid URI would pass, even though there was a bug.
Now, things have been refactored so it only is relevant to catch InvalidCloneUriException. Any other exception will now yield a 500 Internal Server Error, as expected.
validators: introduce InvalidCloneUriException instead of throwing bare Exceptions for invalid clone URIs
When adding a new repository with a remote clone URI, the URI will be validated in some way. In case of errors, it would raise 'Exception' to report invalid URLs. Code calling the validation would thus have to catch 'Exception', which means that _any_ exception would cause the URI to be found invalid.
Instead, create a special exception type intended to be used for all exceptions we know can occur during normal URL validation.
vcs: fix repo creation with a remote clone uri under Python 3
When trying to add a new repository based on a remote clone, it fails in Python 3 as follows, for git:
ERROR kallithea.model.validators:validators.py:413 URL validation failed Traceback (most recent call last): File ".../kallithea/model/validators.py", line 411, in _validate_python is_valid_repo_uri(repo_type, url, make_ui()) File ".../kallithea/lib/utils.py", line 250, in is_valid_repo_uri GitRepository._check_url(url) File ".../kallithea/lib/vcs/backends/git/repository.py", line 174, in _check_url if not test_uri.endswith('info/refs'): TypeError: endswith first arg must be bytes or a tuple of bytes, not str
and for hg, the other way around:
ERROR kallithea.model.validators:validators.py:413 URL validation failed Traceback (most recent call last): File ".../kallithea/model/validators.py", line 411, in _validate_python is_valid_repo_uri(repo_type, url, make_ui()) File ".../kallithea/lib/utils.py", line 233, in is_valid_repo_uri MercurialRepository._check_url(url, ui) File ".../kallithea/lib/vcs/backends/hg/repository.py", line 297, in _check_url if os.path.isdir(url) or url.startswith(b'file:'): TypeError: startswith first arg must be str or a tuple of str, not bytes
In both cases, the code seems to actually expect a bytes url.
celery: fix send_email to work with JSON encoding (Issue #363)
Long time ago, c935bcaf7086 introduced an optional User object parameter to the send_email task and used the computed full_name_or_username property. Due to the magic of pickle, that also worked when using Celery to run the task async.
Now, Celery 4 changed the default encoding from Pickle to JSON, which we anticipated in e539db6cc0da. That broke send_email in some cases, for example when a user comments on another user's changeset.
Fixed by passing the "From" name as string instead of passing the whole User object.
front-end: add eslint-plugin-html as dependency as introduced in .eslintrc.js in 4d36432bf705
Usage example: hg up -cr. sed -i -e 's/\${[^{}]*\({[^{}]*}[^{}]*\)*}/""/g' -e 's/%\(if\|else\|endif\|for\|endfor\)\>.*//g' -e 's/##.*//g' $(hg loc 'kallithea/templates/**.html') vim kallithea/templates/pullrequests/pullrequest.html +139 # blank out the multi line 'var url = ${}' ( cd kallithea/front-end/ && node_modules/.bin/eslint $(hg loc 'kallithea/templates/**.html')) | tee l hg up -Cr.
( sed -n 's/^function \([^(]*\).*/\1/gp' kallithea/public/js/base.js; sed -n 's/.* var \([^ ]*\) =.*/\1/p' kallithea/templates/base/root.html; echo pyroutes; echo CodeMirror; echo Select2; echo BranchRenderer;)|while read x; do echo $x; sed -i "/error .$x. is not defined /d" l; done; cat l
auth: also use safe password hashing on Windows using bcrypt
For unknown reasons, Kallithea used different hashing algorithms on Windows and Posix. Perhaps because problems with bcrypt on Windows in the past. That should no longer be an issue, and it doesn't make sense to have different security properties on the platforms.
While changing to bcrypt everywhere, also remain backwards compatible and accept existing passwords hashed with sha256 - both on Windows (where it used to be used), and elsewhere (in case a system has been migrated from Windows to Unix).
The "ROUTES" prefix might refer to "CELERY_ROUTES" ... but it doesn't take a simple string list anyway, so there is no point in treating it as a list value.
Kallithea only uses Celery results when repos are created or forked and user browsers are reloading pages to poll for completion. amqp seems like unnecessary complexity for that use case.
Sqlite does however seem like a minimal but fine solution for the Kallithea use case in most setups.
celery: set default config values in code and remove them from the generated .ini
It is hard to imagine any reason the user should change celery.imports . And if it ever should change, we want it controlled in code - not left stale in user controlled config files.
Everybody sould just use .json and there is no reason anybody should specify that in the .ini ... and it will be the default in Celery 4.
- but it is quite noisy. Some problems do however stand out as relevant to fix.
Script sections in HTML files can also be checked after removing mako markup:
hg up -cr. sed -i -e 's/\${[^{}]*\({[^{}]*}[^{}]*\)*}/""/g' -e 's/%\(if\|else\|endif\|for\|endfor\)\>.*//g' -e 's/##.*//g' $(hg loc 'kallithea/templates/**.html') vim kallithea/templates/pullrequests/pullrequest.html +139 # blank out the multi line 'var url = ${}' ./node_modules/.bin/eslint $(hg loc 'kallithea/templates/**.html') hg up -Cr.
- but that is even more noisy.
The noise is mainly due to eslint not knowing that everything runs together, with kallithea/templates/base/root.html defining global variables, kallithea/public/js/base.js using these and defining functions, which then is used "everywhere". There might be solutions to that - this is a starting point.
ui: show toggleable "Follow" status in repo groups' repo list
It makes sense to show Follow status next to repo names in the repo list, and it is a meaningful and efficient bulk operation to toggle Follow status there.
Clicking on the (Un)Follow 'heart' will toggle the caller's follow status for that repo.
The repo model already has layering violations - expand on them to compute the follow status of the current user.
(Changeset was cherry picked and modified by Mads Kiilerich.)
cleanup: run pyflakes as a part of scripts/run-all-cleanup
pyflakes has no usable configuration options, so create a small wrapper script. Instead of having two wrapper scripts (with one being almost nothing and the other containing configuration), just keep it simple and use one combined.
The reasoning in hg _fix_path is no longer correct.
We only need it for stripping the trailing '/' ... but we would rather be explicit about that. (But it is also questionable if we actually want it at all - we should just pass the right data and fail on wrong data.)
.../site-packages/_pytest/doctest.py:381: in _mock_aware_unwrap return real_unwrap(obj, stop=_is_mocked) /usr/lib64/python3.7/inspect.py:511: in unwrap while _is_wrapper(func): /usr/lib64/python3.7/inspect.py:505: in _is_wrapper return hasattr(f, '__wrapped__') and not stop(f) .../site-packages/tg/support/objectproxy.py:19: in __getattr__ return getattr(self._current_obj(), attr) .../site-packages/tg/request_local.py:240: in _current_obj return getattr(context, self.name) .../site-packages/tg/support/objectproxy.py:19: in __getattr__ return getattr(self._current_obj(), attr) .../site-packages/tg/support/registry.py:72: in _current_obj 'thread' % self.____name__) E TypeError: No object (name: context) has been registered for this thread
pytest's doctest support is (in _mock_aware_unwrap) using py3 inspect.
Inside inspect, _is_wrapper will do an innocent looking: hasattr(f, '__wrapped__')
But if the code under test has un (unused) import of a tg context (such as tg.request), it is no longer so innocent. tg will throw: TypeError: No object (name: context) has been registered for this thread (which in py2 would have caught by hasattr, but not in py3.)
pytest will thus fail already in the "collecting ..." phase.
To work around that, use the hack of pushing a tg context in the top level pytest_configure.