model: handle redundant reviewers in add_reviewers
Don't attempt to add reviewers that are already a reviewer for the specified PR (redundant reviewers).
Return the list of added and redundant reviewers, for the controller to handle.
Under normal circumstances, the pullrequest controller already processes the list of reviewers and only calls add_reviewers for new reviewers. But, there could be ways were this checking fails, for example due to a race condition between two simultaneous requests for the same pullrequest, or due to a bug in the web server framework that causes the same request to be handled again.
There are cases where the firstname/lastname known to Kallithea, is not complete, perhaps only an initial. But users that try to add a reviewer, may start typing a full name and expect autocompletion.
For example:
firstname: John lastname: D email: john.doe@example.com
When a review starts typing 'Doe' they will not get any matches if autocompletion is purely based on names.
This situation sometimes arises with Indian developers, where the names known to companies may be abbreviated as initials. For example "C K Deepak" where Deepak is the first name and C K are the initials of the last name.
autocomplete: also query 'firstname lastname' and 'lastname firstname' combinations
The autocomplete functionality for user names, e.g. in pull request reviewer lists, @mentions, etc. would match the input term only on firstname, lastname or username, but not a combination of firstname lastname.
This is a problem when there are many matches on the same firstname or lastname, in particular with Chinese names like 'Wang', 'Cheng', etc. If you know the full name and type it, you would not get any matches.
Instead, adapt the queries to also match on 'firstname lastname' and 'lastname firstname'.
This means that simple matching on only username or only lastname, can be removed.
Ed Wong reported problems with a SSH key that accidentally was copy-pasted with extra newlines. This truncation wasn't detected, so the truncated key was added to authorized_keys where it obviously didn't work for sshd.
The base64 decoding would sometimes catch truncated keys - but not always. We seem to have to look inside the key, parse it according to the RFCs, and verify they contain the right amount of data for the key type.
It is an additional burden to have to parse SSH key internals just to validate them. We could consider using some external method for validation. But the explicit validation introduced here might be more spot-on for our needs.
git: fix pull request deletion - don't crash on deleting refs to PR heads
The refs name was passed as unicode string, and that would cause failure like: File ".../site-packages/dulwich/repo.py", line 720, in __delitem__ if name.startswith(b"refs/") or name == b"HEAD": TypeError: startswith first arg must be str or a tuple of str, not bytes
Fixed by correctly passing the ref name as bytes, as we do when creating the PR refs.
Git is so different than Mercurial that it is easier to use separate tests. Some of the tests that are relevant for Mercurial doesn't apply to the Git support.
Also fix crash an odd corner case of creating PRs from a repo that has no branches at all.
Commit eca0cb56a822 attempted to fix a type inconsistency, which caused failure in the 'kallithea-api' tool when using '--save-config', but this unfortunately did not fix the problem completely.
Following error still appeared:
Traceback (most recent call last): File ".../bin/kallithea-api", line 33, in <module> sys.exit(load_entry_point('Kallithea', 'console_scripts', 'kallithea-api')()) File ".../bin/kallithea_api.py", line 84, in main 'apihost': args.apihost}) File ".../bin/base.py", line 104, in __init__ self.make_config(config) File ".../bin/base.py", line 132, in make_config ext_json.dump(config, f, indent=4) File "/usr/lib/python3.7/json/__init__.py", line 180, in dump fp.write(chunk) TypeError: a bytes-like object is required, not 'str'
Also, there is not much essential code in the old sentry support, and it seems like it would be easier to reimplement from scratch. There is thus not much lost by dropping it.
According to documentation, hmac.new is the way to create a HMAC object ... and the first argument is mandaatory and we don't want to name it.
This has no functional change but will address a pytype warning:
File "kallithea/model/user.py", line 304, in get_reset_password_token: Invalid keyword arguments (digestmod, key, msg) to function HMAC.__init__ [wrong-keyword-args]
rcmail: pass smtplib.SMTP.sendmail to_addrs as list
Passing it as a set worked ... but is apparently wrong. The documentation states it has to be a "list of addresses".
Pytype warned:
File "kallithea/lib/rcmail/smtp_mailer.py", line 99, in send: Function SMTP.sendmail was called with the wrong arguments [wrong-arg-types] Expected: (self, from_addr, to_addrs: Union[Sequence[str], str], ...) Actually passed: (self, from_addr, to_addrs: set, ...)
Prepare for computing each kind of permissions separately so they can be skipped in the common case.
This will undo a part of 1ecd6c0e2787 where we took the detour of assigning permissions in __init__ - now we move it back to a LazyProperty, but in a better way.
config: only assign kallitha.CONFIG once and keep it as a plain dict
Mute pytype warnings and make code more readable for developers:
File "kallithea/lib/hooks.py", line 318, in _hook_environment: No attribute 'global_conf' on Dict[nothing, nothing] [attribute-error] File "kallithea/lib/hooks.py", line 318, in _hook_environment: No attribute 'local_conf' on Dict[nothing, nothing] [attribute-error] File "kallithea/bin/kallithea_cli_base.py", line 82, in runtime_wrapper: No attribute 'global_conf' on Dict[nothing, nothing] [attribute-error] File "kallithea/bin/kallithea_cli_base.py", line 82, in runtime_wrapper: No attribute 'local_conf' on Dict[nothing, nothing] [attribute-error]
cli: refactor db_create to avoid app initialization in multiple places
Make sure kallithea.config.application.make_app only is invoked at a high level ... and avoid references to kallithea.CONFIG.global_conf right before kallithea.CONFIG is set to a plain dict again.
Click commands that have both needs_config_file=True and config_file_initialize_app=True will now be called twice - first time before setting kallithea.CONFIG but with the config dict as parameter.
That seems kind of intuitive and will simplify the code and allow cleanup of config handling.
Mute pytype warning:
File "kallithea/bin/kallithea_cli_db.py", line 73, in db_create: No attribute 'global_conf' on Dict[nothing, nothing] [attribute-error] File "kallithea/bin/kallithea_cli_db.py", line 73, in db_create: No attribute 'local_conf' on Dict[nothing, nothing] [attribute-error]
ssh: update authorized_keys after deleting a user with ssh keys
This fixes a minor problem with minimal impact: The authorized_keys entries would fail anyway when the referenced user wasn't found ... and the entries would go away next time authorized_keys for some reason were updated.
File "kallithea/bin/base.py", line 133, in make_config: Function BinaryIO.write was called with the wrong arguments [wrong-arg-types] Expected: (self, s: Union[bytearray, bytes, memoryview]) Actually passed: (self, s: str)
... E File ".../kallithea/lib/utils.py", line 256, in is_valid_repo_uri E GitRepository._check_url(url) E File ".../kallithea/lib/vcs/backends/git/repository.py", line 183, in _check_url E passmgr.add_password(*authinfo) E File "/usr/lib/python3.7/urllib/request.py", line 848, in add_password E self.reduce_uri(u, default_port) for u in uri) E File "/usr/lib/python3.7/urllib/request.py", line 848, in <genexpr> E self.reduce_uri(u, default_port) for u in uri) E File "/usr/lib/python3.7/urllib/request.py", line 875, in reduce_uri E host, port = splitport(authority) E File "/usr/lib/python3.7/urllib/parse.py", line 1022, in splitport E match = _portprog.fullmatch(host) E TypeError: cannot use a string pattern on a bytes-like object
The authinfo tuple is obtained via mercurial.util.url, which unfortunately returns a tuple of bytes whereas urllib expects strings. It seems that mercurial internally has some more hacking around urllib as urllibcompat.py, which we don't use.
Therefore, transform the bytes into strings before passing authinfo to urllib. As the realm can be None, we need to check it specifically otherwise safe_str would return a string 'None'.
A basic test that catches the mentioned problem is added, even though it does not actually test that cloning with auth info will actually work (it only tests that it fails cleanly if the URI is not reachable).
Additionally, one use of 'test_uri' in hg/repository.py still needed to be transformed from bytes to string. For git this was already ok.
No release notes yet, but testing shows it working with a few changes.
A few tests need updating because error messages from remote ssh like our sys.stderr.write('abort: %s\n' % error) like for remote: abort: Access to %r denied will now appear on stderr instead of stdout - a very reasonable bug fix.
mysql: bump sqlalchemy.url MariaDB/MySQL charset to to 'utf8mb4' to get full UTF-8 support
The change in 210e76d69b62 only changed character_set_database, as shown by output after:
--- a/kallithea/model/base.py +++ b/kallithea/model/base.py @@ -46,3 +46,8 @@ def init_model(engine): engine_str = obfuscate_url_pw(str(engine.url)) log.info("initializing db for %s", engine_str) meta.Base.metadata.bind = engine + + meta.Session.configure(bind=engine) + for a, b in meta.Session().execute('''show variables''').fetchall(): + if 'character_set_' in a: + print(a, repr(b))
Before, with charset=utf8, the utf8mb3 charset was used all the way through the stack:
[kallithea.model.base] initializing db for mysql://kallithea-test:XXXXX@localhost/kallithea-test?charset=utf8 character_set_client 'utf8' character_set_connection 'utf8' character_set_database 'utf8mb4' character_set_filesystem 'binary' character_set_results 'utf8' character_set_server 'latin1' character_set_system 'utf8'
With explicit charset=utf8mb4:
[kallithea.model.base] initializing db for mysql://kallithea-test:XXXXX@localhost/kallithea-test?charset=utf8mb4 character_set_client 'utf8mb4' character_set_connection 'utf8mb4' character_set_database 'utf8mb4' character_set_filesystem 'binary' character_set_results 'utf8mb4' character_set_server 'latin1' character_set_system 'utf8'
tests: introduce REUSE_TEST_DB as an alternative to TEST_DB
With TEST_DB, the specified database would be dropped (if existing) and created from scratch. That would fail if the user only had been granted access to that database without general database creation permissions.
With REUSE_TEST_DB, the database must exist, and the tables in the existing database will be deleted before creating the new ones.
This is for example useful for testing on PostgreSQL without full database privileges. (With MariaDB/MySQL, it is possible to grant create permissions to a database before it exists, so it is easier to use TEST_DB.)
Support use of an existing database so the Kallithea database user doesn't have to be granted permissions to create databases.
The existing database must of course have been created "correctly", for example using the right charset and collation on MariaDB/MySQL. Creating the database manually also provides a way to avoid the hardcoded defaults.
db: refactor to clarify that we always invoke SA create_all with checkfirst=False
create_all(checkfirst=True) would skip creating tables that already exist, potentially leaving them with wrong schema.
We are strict and want a successful create_all to leave all the tables in a fully known state. We thus want it to fail if any tables already are present. Existing tables should not silently be accepted.
In MySQL, the character sets for server, database, tables, and connection are set independently. Ideally, they should all use UTF-8, but systems tend to use latin1 as default encoding, for example:
To make things work consistently anyway, we have so far specified the utf8mb4 charset explicitly when creating tables, but there is no corresponding simple option for specifying the collation for tables. We need a better solution.
and there is thus no longer any need for specifying the charset when creating tables.
To be reasonably resilient across all systems without relying on system defaults, we will now start specifying the charset and collation when creating the database, but drop the specification of charset when creating tables.
For investigation of these issues, consider the output from: show variables like '%char%'; show variables like '%collation%'; show create database `KALLITHEA_DB_NAME`; SELECT * FROM information_schema.SCHEMATA WHERE schema_name = "KALLITHEA_DB_NAME"; SELECT * FROM information_schema.TABLES T, information_schema.COLLATION_CHARACTER_SET_APPLICABILITY CCSA WHERE CCSA.collation_name = T.table_collation AND T.table_schema = "KALLITHEA_DB_NAME";
mysql: bump charset to to 'utf8mb4' to get full UTF-8 support
We used to use 'utf8', but in MySQL, this is just an alias for 'utf8mb3' which isn't full unicode. 'utf8mb4' has less surprises (especially if used with the 'utf8mb4_unicode_ci' collation).
MySQL character sets for server, database, tables, and connection are set independently. Until now, we have specified 'utf8' when creating tables to overrule the database charset and in the default MySQL connection URL.
There is no good reason it only should be possible to comment on content lines. Other lines might not have an obvious locator, but we can live with that as long as each comment only apply in one place.
With this, we actually want the comment bubble on all lines with bubble markup, so we can loosen the css selector.
diff: drop per file ignore-whitespace and context - it didn't work and had conceptual issue (Issue #344)
Diffs are currently generated at the low level as one big diff between two vcs resisions, provided global values for diff context size and flag for ignoring whitespace. All files use the same flags. There is no way to actually compute the full diff using these use per file flags, and no simple and efficient way to add it.
The best option is thus to drop the failed attempt at making it per file, and just rely on the simple global flags in the URL.
The links for changing whitespace and context is sometimes shown for the whole "page", and sometimes next to the diff for one file. For now, keep showing the link in these places, but make sure it navigates back to the FID of the section where the link was clicked.
The implementation is completely rewritten and moved to a more appropriate location in helpers.
With a more clean implementation, we also consistently use the simple getters to extract values from the URL.
diff: fix ignorews/context link to use the right target as anchor
The value in url_fid might not be a valid anchor.
For changesets, url_fid would be like 'C--9c390eb52cd6' even though the actual target included the changeset hash and were like 'C-1536d03b4869-9c390eb52cd6'.
For pullrequests and compare, it wouldn't link to anything at all, even though there was a target like 'C--56535da5df40'.
Instead, pass id_fid as anchor value as a separate argument. That one is a valid anchor.
jQuery has weak "friendly" semantics for data attributes and do for example interpret the data attribute name 'id_for' as 'idFor'. We could thus not retrieve it as 'id_for'.
Instead, avoid the special semantics of '_' by just naming the data attribute 'for'.
db: better support for databases with "odd" characters in the name, such as "-"
Add missing quoting, using the quoting flavour preferred by the DB.
Tested with
echo "CREATE USER 'kallithea-test'@'localhost' IDENTIFIED BY 'password'"|sudo -u mysql mysql echo "GRANT ALL PRIVILEGES ON \`kallithea-test\`.* TO 'kallithea-test'@'localhost'" | sudo -u mysql mysql TEST_DB='mysql://kallithea-test:password@localhost/kallithea-test?charset=utf8mb4' py.test
Column widths did in some cases break with 32757d5e9d0b. The markup was too magic to find and fix the exact root cause, but we fix it by refactoring to a slightly different approach:
Use fixed table-layout to be able to control columns width from the first row, without too much auto sizing.
Instead of using padding, put the line number in a centered block-inline with right-align and min-width. The numbers will thus generally be right aligned, but will expand to use less margin for big line numbers.
lib: use sha1 instead of md5 in a couple of places
md5 is dead and should be avoided. In the places changed here, we want to keep using hashes without trivial collisions, but do not expect strong crypto security. sha1 seems like a trivial step up from md5 and without obvious alternatives. It is more expensive than md5, but we can live with that in these places.
The remaining few uses of md5() cannot be changed without breaking backwards compatibility or external API.