The benefit of this functionality is questionable. Especially in bigger setups with multiple front-end instances all serving the same multitude of repositories, making the hit rate very low. And the overhead of storing cache invalidation data *in* the database is non-trivial.
We preserve a small cache in Repository SA records, but should probably just in general know what we are doing and not ask for the same information multiple times in each request.
setup: install pip in virtualenv to make sure we have the latest version
Older versions are good enough for bootstrapping (and might also be good for everything else) but gives:
WARNING: You are using pip version 19.3.1; however, version 20.0.2 is available. You should consider upgrading via the 'pip install --upgrade pip' command.
The user might still get this warning initially, but then it goes away ...
config: set base_path config in set_app_settings using Ui.get_repos_location() instead of in app_cfg using make_ui()
Only hit the database once (when starting the application) to get this.
It would perhaps be more elegant to set it directly, for example in kallithea.base_path ... but it seems like a setting that really belongs in the .ini file ...
auth: for default permissions, use existing explicit query result values instead of following dot references in ORM result objects
There has been reports of spurious crashes on resolving references like .repository from Permissions:
File ".../kallithea/lib/auth.py", line 678, in __wrapper if self.check_permissions(user): File ".../kallithea/lib/auth.py", line 718, in check_permissions return user.has_repository_permission_level(repo_name, self.required_perm) File ".../kallithea/lib/auth.py", line 450, in has_repository_permission_level actual_perm = self.permissions['repositories'].get(repo_name) File ".../kallithea/lib/vcs/utils/lazy.py", line 41, in __get__ value = self._func(obj) File ".../kallithea/lib/auth.py", line 442, in permissions return self.__get_perms(user=self, cache=False) File ".../kallithea/lib/auth.py", line 498, in __get_perms return compute(user_id, user_is_admin) File ".../kallithea/lib/auth.py", line 190, in _cached_perms_data r_k = perm.UserRepoToPerm.repository.repo_name File ".../sqlalchemy/orm/attributes.py", line 285, in __get__ return self.impl.get(instance_state(instance), dict_) File ".../sqlalchemy/orm/attributes.py", line 721, in get value = self.callable_(state, passive) File ".../sqlalchemy/orm/strategies.py", line 710, in _load_for_state % (orm_util.state_str(state), self.key)
sqlalchemy.orm.exc.DetachedInstanceError: Parent instance <UserRepoToPerm at ...> is not bound to a Session; lazy load operation of attribute 'repository' cannot proceed (Background on this error at: http://sqlalche.me/e/bhk3)
Permissions are cached between requests: SA result records are stored in in beaker.cache.sql_cache_short and resued in following requests after the initial session as been removed. References in Permission objects would usually give lazy lookup ... but not outside the original session, where we would get an error like this.
Permissions are indeed implemented/used incorrectly. That might explain a part of the problem. Even if not fully explaining or fixing this problem, it is still worth fixing:
Permissions are fetched from the database using Session().query with multiple class/table names (joined together in way that happens to match the references specified in the table definitions) - including Repository. The results are thus "structs" with selected objects. If repositories always were retrieved using this selected repository, everything would be fine. In some places, this was what we did.
But in some places, the code happened to do what was more intuitive: just use .repository and rely on "lazy" resolving. SA was not aware that this one already was present in the result struct, and would try to fetch it again. Best case, that could be inefficient. Worst case, it would fail as we see here.
Fix this by only querying from one table but use the "joinedload" option to also fetch other referenced tables in the same select. (This might inefficiently return the main record multiple times ... but that was already the case with the previous approach.)
This change is thus doing multiple things with circular dependencies that can't be split up in minor parts without taking detours:
The existing repository join like: .join((Repository, UserGroupRepoToPerm.repository_id == Repository.repo_id)) is thus replaced by: .options(joinedload(UserGroupRepoToPerm.repository))
Since we only are doing Session.query() on one table, the results will be of that type instead of "structs" with multiple objects. If only querying for UserRepoToPerm this means: - perm.UserRepoToPerm.repository becomes perm.repository - perm.Permission.permission_name looked at the explicitly queried Permission in the result struct - instead it should look in the the dereferenced repository as perm.permission.permission_name
scripts/i18n: also normalize casing of UTF-8 in Content-Type
f626260a376c introduced invariant msgmerge casing. Do the same when normalizing to ensure consistency also without msgmerge and to avoid unnecessary conflicts.
scripts/i18n: introduce --merge-pot-file to control normalization
There are actually *two* kinds of normalization:
- in main branches, where we just want the translations - not any trivially derived information or temporary or unstructured data. - in i18n branches, where we want the trivially derived information, and also want to preserve any other information there might be in the .po files.
If no pot file is specifed, do it as on the main branches and strip everything but actual translations. This mode will primarily be used when grafting or rebasing changes from i18n branches.
When a pot file is specified, run GNU msgmerge with it on the po files. The pot file should ideally be fully updated (as done by extract_messages). That will establish a common baseline, leaving only the essential changes as needing merge.
If merging from default branches to 18n, it is better to skip .po and .pot in first 'hg merge' pass, while resolving everything else. Then, with the uncommitted merge, run 'extract_messages', and then merge the .po files using --merge-pot-file kallithea/i18n/kallithea.pot .
(Actually, these two different modes could perhaps be auto detected ...)
scripts/i18n: add command 'normalized-merge' for use with Mercurial's 'merge-tool' option
Add a 'normalized-merge' command to scripts/i18n that will first normalize the i18n files contributing to the merge, then perform a standard merge. If that merge fails (e.g. due to real conflicts) the normalized files are left behind, and the user needs to run another merge tool manually and resolve the merge of these.
Use by putting following snippets in your .hgrc file:
The translation files in the Kallithea repository contained references to the location(s) of each string in the repository. This is useful to translators, but is not needed for all other users.
The big problem with that information is that it changes very commonly as a result of normal development in Kallithea, causing a lot of unimportant delta in the Kallithea repository, thus causing unnecessary repository growth.
In this commit, a basic version of the script is added, only containing the code to normalize the translation files by removing generated and outdated data.
This can be used to check or ensure internal consistency between code and translations, by extracting and merging and then removing most of it again with normalize-po-files:
./setup.py extract_messages
for po in kallithea/i18n/*/LC_MESSAGES/kallithea.po; do msgmerge --width=76 --backup=none --previous --update $po kallithea/i18n/kallithea.pot ; done
scripts/i18n: introduce new i18n maintenance script
The translation files in the Kallithea repository contained references to the location(s) of each string in the repository. This is useful to translators, but is not needed for all other users.
The big problem with that information is that it changes very commonly as a result of normal development in Kallithea, causing a lot of unimportant delta in the Kallithea repository, thus causing unnecessary repository growth.
A script 'i18n' is added to help maintain the i18n files. Functionality will be added later.
Traceback (most recent call last): File "scripts/update-copyrights.py", line 45, in <module> from . import contributor_data ImportError: attempted relative import with no known parent package
Fixed by backing out changeset 2786730e56e0. The pytype problem can be solved differently.
user: make get_by_username_or_email default to treat username case insensitive
The get_by_username_or_email is a flexible function, intended to find users in multiple ways, suitable for login prompts. The function was sometimes used with case sensitive user lookup, sometimes without. Instead, be consistent and just default to be insensitive.
auth: show a clear "Authentication failed" message if login fails after passing form validation
log_in_user will only set a session cookie after verifying that the user is valid (for example based on IP). The code is thus safe, but no hint were given to the user if login failed for that reason.
login: assert that the validated user actually is found
Due to another bug, it was possible that authentication succeeded but the user object couldn't be obtained. This was for example noticed when the LDAP auth module did not correctly parse the email attribute, and a login via email was attempted. In this case, the user was retrieved from email address and LDAP found the user, but the email attribute in the Kallithea database was then changed incorrectly and a subsequent retrieval based on the same original email address would not find the user.
Such problem would lead to an assert in Kallithea:
File ".../kallithea/controllers/login.py", line 104, in index auth_user = log_in_user(user, c.form_result['remember'], is_external_auth=False, ip_addr=request.ip_addr) File ".../kallithea/lib/base.py", line 122, in log_in_user assert not user.is_default_user, user AttributeError: 'NoneType' object has no attribute 'is_default_user'
This assert cought the problem but is not a spot-on indicator of the real problem. Instead, we can catch this problem sooner by adding an assert already in the login controller.
Viewing the two-way diff of an added file gives following exception:
File "_base_root_html", line 211, in render_body
File "_base_base_html", line 42, in render_body
File "files_diff_2way_html", line 197, in render_main
File ".../kallithea/lib/vcs/nodes.py", line 411, in is_binary return b'\0' in self.content TypeError: 'in <string>' requires string as left operand, not bytes
At this point, self.content was '' (empty string).
Due to a missing conversion from bytes to unicode for the attribute values obtained from LDAP, storing the values in a unicode field in the database would fail. It would apparently either store a repr of the bytes or store them in some other way.
Upon user login, SQLAlchemy warned about this:
.../sqlalchemy/sql/sqltypes.py:269: SAWarning: Unicode type received non-unicode bind param value b'John'. (this warning may be suppressed after 10 occurrences) .../sqlalchemy/sql/sqltypes.py:269: SAWarning: Unicode type received non-unicode bind param value b'Doe'. (this warning may be suppressed after 10 occurrences)
In PostgreSQL, this would result in 'weird' values for first name, last name, and email fields, both in the database and the web UI, e.g. firstname: \x4a6f686e lastname: \x446f65 email: \x6a6f686e406578616d706c652e636f6d These values represent the actual values in hexadecimal, e.g. \x4a6f686e = 0x4a 0x6f 0x68 0x6e = J o h n
In SQLite, the problem initially shows differently, as an exception in gravatar_url():
File "_base_root_html", line 207, in render_body
File "_index_html", line 78, in render_header_menu
File "_base_base_html", line 479, in render_menu
File ".../kallithea/lib/helpers.py", line 908, in gravatar_div gravatar(email_address, cls=cls, size=size))) File ".../kallithea/lib/helpers.py", line 923, in gravatar src = gravatar_url(email_address, size * 2) File ".../kallithea/lib/helpers.py", line 956, in gravatar_url .replace('{email}', email_address) \ TypeError: replace() argument 2 must be str, not bytes
but nevertheless the root cause of the problem is the same.
Fix the problem by converting the LDAP attributes from bytes to strings.
hg: handle Mercurial RepoError correctly in is_valid_repo_uri
RepoError would be leaked by is_valid_repo_uri.
Now, when for example validating an ssh URL without having the ssh client binary, it will be shown/logged as: remote: /bin/sh: ssh: command not found 2020-03-17 17:28:53.907 WARNI [kallithea.model.validators] validation of clone URL 'ssh://no-ssh.com/' failed: Mercurial RepoError: no suitable response from remote hg and shown in the UI as 'Invalid repository URL'.
repos: separate repo creation from form validation
The broad catching of Exception in the repo creation controller is conceptually bad. It also caused misleading "Error creating repository None" when form validation failed with anything but formencode.Invalid . For now, just constrain the broad exception handling to only cover repo creation. It is a bug if form validation fails in unexpected ways, and we want it reported as a crash that we can fix.
hg: always show and run Mercurial hooks in alphabetical order (Issue #246)
Mercurial will generally run hooks in the order they are found in the configuration. For entries found in the database, there is no such order. Instead, always use alphabetical order for these.
Since we now want to order things explicitly in the db query, we want an index with a composite key. We do that even though we don't really need it for the few entries in this table, and even though it might/could use the same index as the existing unique constraint. This composite UniqueConstraint was added in b9f4b444a172 where it replaced a wrong UniqueConstraint that could/should have been removed in c25191aadf92. Fix that while touching this area and running a migration script.
hg: set HGPLAIN globally to minimize potential impact from reading hgrc configuration when creating ui
See `hg help scripting` for HGPLAIN description ... but Kallithea will invoke Mercurial internals directly, so HGPLAIN shouldn't really impact anything.
Traceback (most recent call last): File "data/env/bin/isort", line 10, in <module> sys.exit(main()) File ".../env/lib64/python3.7/site-packages/isort/main.py", line 379, in main for sort_attempt in attempt_iterator: File ".../env/lib64/python3.7/site-packages/isort/main.py", line 377, in <genexpr> attempt_iterator = (sort_imports(file_name, **arguments) for file_name in file_names) File ".../env/lib64/python3.7/site-packages/isort/main.py", line 88, in sort_imports result = SortImports(file_name, **arguments) File ".../env/lib64/python3.7/site-packages/isort/isort.py", line 207, in __init__ self._add_formatted_imports() File ".../env/lib64/python3.7/site-packages/isort/isort.py", line 606, in _add_formatted_imports self._add_from_imports(from_modules, section, section_output, sort_ignore_case) File ".../env/lib64/python3.7/site-packages/isort/isort.py", line 526, in _add_from_imports import_statement = self._multi_line_reformat(import_start, from_import_section, comments) File ".../env/lib64/python3.7/site-packages/isort/isort.py", line 552, in _multi_line_reformat dynamic_indent, indent, line_length, comments) File ".../env/lib64/python3.7/site-packages/isort/isort.py", line 705, in _output_grid if len(next_statement.split(self.line_separator)[-1]) + 1 > line_length: TypeError: '>' not supported between instances of 'int' and 'str'
validators: don't catch all Exceptions as invalid clone URIs, be specific
When adding a new repository with a remote clone URI, the URI will be validated in some way. Several exceptions could occur during that validation.
Previously, the code would catch based on 'Exception', which means that _any_ exception would cause the URI to be found invalid. This means that errors in the code (e.g. related to Python 3 conversion) were also categorized as 'invalid clone URI'. And thus, the tests that test an actually invalid URI would pass, even though there was a bug.
Now, things have been refactored so it only is relevant to catch InvalidCloneUriException. Any other exception will now yield a 500 Internal Server Error, as expected.
validators: introduce InvalidCloneUriException instead of throwing bare Exceptions for invalid clone URIs
When adding a new repository with a remote clone URI, the URI will be validated in some way. In case of errors, it would raise 'Exception' to report invalid URLs. Code calling the validation would thus have to catch 'Exception', which means that _any_ exception would cause the URI to be found invalid.
Instead, create a special exception type intended to be used for all exceptions we know can occur during normal URL validation.
vcs: fix repo creation with a remote clone uri under Python 3
When trying to add a new repository based on a remote clone, it fails in Python 3 as follows, for git:
ERROR kallithea.model.validators:validators.py:413 URL validation failed Traceback (most recent call last): File ".../kallithea/model/validators.py", line 411, in _validate_python is_valid_repo_uri(repo_type, url, make_ui()) File ".../kallithea/lib/utils.py", line 250, in is_valid_repo_uri GitRepository._check_url(url) File ".../kallithea/lib/vcs/backends/git/repository.py", line 174, in _check_url if not test_uri.endswith('info/refs'): TypeError: endswith first arg must be bytes or a tuple of bytes, not str
and for hg, the other way around:
ERROR kallithea.model.validators:validators.py:413 URL validation failed Traceback (most recent call last): File ".../kallithea/model/validators.py", line 411, in _validate_python is_valid_repo_uri(repo_type, url, make_ui()) File ".../kallithea/lib/utils.py", line 233, in is_valid_repo_uri MercurialRepository._check_url(url, ui) File ".../kallithea/lib/vcs/backends/hg/repository.py", line 297, in _check_url if os.path.isdir(url) or url.startswith(b'file:'): TypeError: startswith first arg must be str or a tuple of str, not bytes
In both cases, the code seems to actually expect a bytes url.
celery: fix send_email to work with JSON encoding (Issue #363)
Long time ago, c935bcaf7086 introduced an optional User object parameter to the send_email task and used the computed full_name_or_username property. Due to the magic of pickle, that also worked when using Celery to run the task async.
Now, Celery 4 changed the default encoding from Pickle to JSON, which we anticipated in e539db6cc0da. That broke send_email in some cases, for example when a user comments on another user's changeset.
Fixed by passing the "From" name as string instead of passing the whole User object.
front-end: add eslint-plugin-html as dependency as introduced in .eslintrc.js in 4d36432bf705
Usage example: hg up -cr. sed -i -e 's/\${[^{}]*\({[^{}]*}[^{}]*\)*}/""/g' -e 's/%\(if\|else\|endif\|for\|endfor\)\>.*//g' -e 's/##.*//g' $(hg loc 'kallithea/templates/**.html') vim kallithea/templates/pullrequests/pullrequest.html +139 # blank out the multi line 'var url = ${}' ( cd kallithea/front-end/ && node_modules/.bin/eslint $(hg loc 'kallithea/templates/**.html')) | tee l hg up -Cr.
( sed -n 's/^function \([^(]*\).*/\1/gp' kallithea/public/js/base.js; sed -n 's/.* var \([^ ]*\) =.*/\1/p' kallithea/templates/base/root.html; echo pyroutes; echo CodeMirror; echo Select2; echo BranchRenderer;)|while read x; do echo $x; sed -i "/error .$x. is not defined /d" l; done; cat l
auth: also use safe password hashing on Windows using bcrypt
For unknown reasons, Kallithea used different hashing algorithms on Windows and Posix. Perhaps because problems with bcrypt on Windows in the past. That should no longer be an issue, and it doesn't make sense to have different security properties on the platforms.
While changing to bcrypt everywhere, also remain backwards compatible and accept existing passwords hashed with sha256 - both on Windows (where it used to be used), and elsewhere (in case a system has been migrated from Windows to Unix).
The "ROUTES" prefix might refer to "CELERY_ROUTES" ... but it doesn't take a simple string list anyway, so there is no point in treating it as a list value.
Kallithea only uses Celery results when repos are created or forked and user browsers are reloading pages to poll for completion. amqp seems like unnecessary complexity for that use case.
Sqlite does however seem like a minimal but fine solution for the Kallithea use case in most setups.
celery: set default config values in code and remove them from the generated .ini
It is hard to imagine any reason the user should change celery.imports . And if it ever should change, we want it controlled in code - not left stale in user controlled config files.
Everybody sould just use .json and there is no reason anybody should specify that in the .ini ... and it will be the default in Celery 4.
- but it is quite noisy. Some problems do however stand out as relevant to fix.
Script sections in HTML files can also be checked after removing mako markup:
hg up -cr. sed -i -e 's/\${[^{}]*\({[^{}]*}[^{}]*\)*}/""/g' -e 's/%\(if\|else\|endif\|for\|endfor\)\>.*//g' -e 's/##.*//g' $(hg loc 'kallithea/templates/**.html') vim kallithea/templates/pullrequests/pullrequest.html +139 # blank out the multi line 'var url = ${}' ./node_modules/.bin/eslint $(hg loc 'kallithea/templates/**.html') hg up -Cr.
- but that is even more noisy.
The noise is mainly due to eslint not knowing that everything runs together, with kallithea/templates/base/root.html defining global variables, kallithea/public/js/base.js using these and defining functions, which then is used "everywhere". There might be solutions to that - this is a starting point.