controllers: rename __before__ to _before in preparation of TurboGears2
__before__ in Pylons is called _before in TurboGears2. We can prepare this rename already in Pylons-based Kallithea, so that the real TG2 migration commit just changes the BaseController.
Since TurboGears2 _before can pass extra arguments, we add *args and **kwargs parameters as well.
Move the existing app_test_context_fixture from test_pullrequests.py to conftest.py to make it available to all test modules.
It is useful in two cases:
1. there is test setup code (xUnit style) that needs to execute in the same test context as the actual test.
2. even without test setup code, an entire test needs to be executed in a test context. In this case, the fixture just reduces code complexity by not requiring changes in the test code (compared to standard 'with test_context').
It is possible to apply this (or any) fixture to an entire test class using the class decorator @pytest.mark.usefixtures("...") This is similar to 'autouse=True' but can be used even if the fixture is defined elsewhere.
gearbox: make a make-config sub-command available again
Drop the old kallithea-config command line tool and rework it to a replacement for the make-config paster command. Make-config was experimental in paste, it doesn't exist with gearbox, it was tied to pylons, and we already have the luxury problem of having two ways of doing almost the same thing. This is a good opportunity to get rid of one of them.
This replacement tool is Kallithea specific and doesn't need to be told that it is Kallithea it has to create a config file for. The command should thus perhaps be given a Kallithea specific name instead of the very generic name ...
gearbox: replace paster with something TurboGears2-ish that still works with the Pylons stack
This is a step towards moving away from the Pylons stack to TurboGears2, but still independent of it.
Some notes from the porting - it could perhaps be the missing(?) documentation for migrating from paster to gearbox:
Note: 'gearbox' without parameters will crash - specify '-h' to get started testing.
Replace paster summary = 'yada yada' with the first line of the docstring of the Command class ... or override get_description.
Note: All newlines in the docstring will be collapsed and mangle the long help text.
Grouping of commands is not possible. Standard commands (for development) can't be customized under the same name or hidden. (Like for paster, the conceptual model also assumes that the sub-command naming is namespaced so commands from other packages won't conflict.)
The usage help is fully automated from the declared options.
For all deprecated Commands, replace paster hidden = True with gearbox deprecated = True
Note: config_file, takes_config_file, min_args and max_args are not available / relevant.
The gearbox parser is customized by overriding get_parser - there is nothing like paster update_parser.
Gearbox is using argparse instead of optparse ... but argparse add_argument is mostly backwards compatible with optparse add_option.
Instead of overriding command or run as in paster, override take_action in gearbox. The parsed arguments are passed to take_action, not available on the command instance.
Paster BadCommand is not available and must be handled manually, terminating with sys.exit(1).
There is no standard make-config command in gearbox.
Paster appinstall has been replaced by the somewhat different setup_app module in gearbox. There is still no clean way to pass parameters to SetupAppCommand and it relies on websetup and other apparently unnecessary complexity. Instead, implement setup-db from scratch.
Minor change by Thomas De Schampheleire: add gearbox logging configuration. Because we use logging.config.fileConfig(.inifile) during gearbox command execution, the logging settings need to be correct and contain a block for gearbox logging itself. Otherwise, errors in command processing are not even visible and the command exits silently.
jenkinsfile: create venv in special folder instead of jenkins workspace
The path to a jenkins workspace tends to get very long. But the maximum for a shebang line is usually 127 characters. This can cause problems with git hook scripts or when calling pip. To avoid these problems create the virtualenv in the a folder under $JENKINS_HOME which is usually much shorter.
Obviously the git hooks and the web app need to use the same database. Therefore, if the web app uses the TEST_DB environment variable, the git hooks needs to do the same.
The gzip header contains a MTIME part (Modification TIME). http://www.zlib.org/rfc-gzip.html Because is takes a little bit of time, from one fill_archive call to the next, this part may differ and fail the test from time to time. So we should not include that part in the comparison.
Add a warning about API key implications on the actual My Accounts/ API keys page where users are likely to see it.
No warning is added to the admin page equivalent, under the assumptions that admins can be trusted to either know what API keys are (or at least not mess around with them when editing other users), and thus don't need the admonishment.
In 33b71a130b16, the addPermAction template was incorrectly escaped via h.jshtml, where it should've been plain h.js.
Instead of merely fixing the escaping, refactor the code to completely remove the need for escaping anything, by avoiding the template variable expansion inside the JavaScript.
admin: apply LOWER() on journal filtering term for suffix/infix matching
Before this revision, LOWER() (via func.lower() of sqlalchemy) isn't applied on journal filtering term for suffix and infix matching, even though it is applied for prefix or immediate value matching.
This causes unexpected result with DBMS, which compares strings case- sensitively by default.
This revision applies LOWER() on journal filtering term for suffix/infix matching, to filter journal case-insensitively.
tests: add Jenkinsfile for automatic creation of Jenkins projects for testing Kallithea itself
This is the new way of how to build and run tests with jenkins. It runs pylint and py.test with 4 different settings: 1. default 2. with German language settings 3. with MySQL DB 4. with PostgreSQL DB
A major advantage of this new Jenkinsfile approach is that Jenkins will be able to automatically create new projects for each branch (or pull request) that contains a jenkinsfile. There is thus no need for maintaining multiple jenkins projects.
pullrequests: introduce "action objects" for PR creation
Inspired by the command design pattern, this attempts the following:
* move policy and business logic from controllers into the model, * move validation, authorization and execution logic closer together in the code, * establish a reusable pattern for modelling higher-level concepts (like "create a new PR iteration"), * make error handling more well-defined, and independent of the controller layer, and * provide clear separation between, one one hand, the validation and authorization of a request, and on the other, the actual execution.
TLDR: Kallithea has issues with escaping values for use in inline JS. Despite judicious poking of the code, no actual security vulnerabilities have been found, just lots of corner-case bugs. This patch fixes those, and hardens the code against actual security issues.
The long version:
To embed a Python value (typically a 'unicode' plain-text value) in a larger file, it must be escaped in a context specific manner. Example:
>>> s = u'<script>alert("It\'s a trap!");</script>'
1) Escaped for insertion into HTML element context
>>> print cgi.escape(s) <script>alert("It's a trap!");</script>
2) Escaped for insertion into HTML element or attribute context
>>> print h.escape(s) <script>alert("It's a trap!");</script>
This is the default Mako escaping, as usually used by Kallithea.
3) Encoded as JSON
>>> print json.dumps(s) "<script>alert(\"It's a trap!\");</script>"
4) Escaped for insertion into a JavaScript file
>>> print '(' + json.dumps(s) + ')' ("<script>alert(\"It's a trap!\");</script>")
The parentheses are not actually required for strings, but may be needed to avoid syntax errors if the value is a number or dict (object).
5) Escaped for insertion into a HTML inline <script> element
>>> print h.js(s) ("\x3cscript\x3ealert(\"It's a trap!\");\x3c/script\x3e")
Here, we need to combine JS and HTML escaping, further complicated by the fact that "<script>" tag contents can either be parsed in XHTML mode (in which case '<', '>' and '&' must additionally be XML escaped) or HTML mode (in which case '</script>' must be escaped, but not using HTML escaping, which is not available in HTML "<script>" tags). Therefore, the XML special characters (which can only occur in string literals) are escaped using JavaScript string literal escape sequences.
(This, incidentally, is why modern web security best practices ban all use of inline JavaScript...)
Unsurprisingly, Kallithea does not do (5) correctly. In most cases, Kallithea might slap a pair of single quotes around the HTML escaped Python value. A typical benign example:
$('#child_link').html('${_('No revisions')}');
This works in English, but if a localized version of the string contains an apostrophe, the result will be broken JavaScript. In the more severe cases, where the text is user controllable, it leaves the door open to injections. In this example, the script inserts the string as HTML, so Mako's implicit HTML escaping makes sense; but in many other cases, HTML escaping is actually an error, because the value is not used by the script in an HTML context.
The good news is that the HTML escaping thwarts attempts at XSS, since it's impossible to inject syntactically valid JavaScript of any useful complexity. It does allow JavaScript errors and gibberish to appear on the page, though.
In these cases, the escaping has been fixed to use either the new 'h.js' helper, which does JavaScript escaping (but not HTML escaping), OR the new 'h.jshtml' helper (which does both), in those cases where it was unclear if the value might be used (by the script) in an HTML context. Some of these can probably be "relaxed" from h.jshtml to h.js later, but for now, using h.jshtml fixes escaping and doesn't introduce new errors.
In a few places, Kallithea JSON encodes values in the controller, then inserts the JSON (without any further escaping) into <script> tags. This is also wrong, and carries actual risk of XSS vulnerabilities. However, in all cases, security vulnerabilities were narrowly avoided due to other filtering in Kallithea. (E.g. many special characters are banned from appearing in usernames.) In these cases, the escaping has been fixed and moved to the template, making it immediately visible that proper escaping has been performed.
Mini-FAQ (frequently anticipated questions):
Q: Why do everything in one big, hard to review patch? Q: Why add escaping in specific case FOO, it doesn't seem needed?
Because the goal here is to have "escape everywhere" as the default policy, rather than identifying individual bugs and fixing them one by one by adding escaping where needed. As such, this patch surely introduces a lot of needless escaping. This is no different from how Mako/Pylons HTML escape everything by default, even when not needed: it's errs on the side of needless work, to prevent erring on the side of skipping required (and security critical) work.
As for reviewability, the most important thing to notice is not where escaping has been introduced, but any places where it might have been missed (or where h.jshtml is needed, but h.js is used).
Q: The added escaping is kinda verbose/ugly.
That is not a question, but yes, I agree. Hopefully it'll encourage us to move away from inline JavaScript altogether. That's a significantly larger job, though; with luck this patch will keep us safe and secure until such a time as we can implement the real fix.
Q: Why not use Mako filter syntax ("${val|h.js}")?
Because of long-standing Mako bug #140, preventing use of 'h' in filters.
Q: Why not work around bug #140, or even use straight "${val|js}"?
Because Mako still applies the default h.escape filter before the explicitly specified filters.
Q: Where do we go from here?
Longer term, we should stop doing variable expansions in script blocks, and instead pass data to JS via e.g. data attributes, or asynchronously using AJAX calls. Once we've done that, we can remove inline JavaScript altogether in favor of separate script files, and set a strict Content Security Policy explicitly blocking inline scripting, and thus also the most common kind of cross-site scripting attack.
The only remaining purpose of this class was to provide the "sa" field, allowing a custom SQLAlchemy session to be used for model operations. However, this never actually worked, nor was it used anywhere.
There's always a global SQLAlchemy session associated with the current thread; using another session for a single function call does not make any sense (as sessions cannot be mixed), unless the code works carefully to ensure the two sessions (and all objects loaded from them) are kept completely separate. Suffice to say that Kallithea does no such thing, thus there's no need to pretend to support multiple concurrent sessions.
Session.add should only be called on newly created database objects.
Per the Kallithea contribution guidelines:
When getting an object from the session (via Session().query or any of the utility functions that look up objects in the database), it's already part of the session, and should not be added again.
cleanup: do Session.add directly inside _create_default_perms
Let the various _create_default_perms functions add the created object to the SQLAlchemy database session, instead of having the caller do it.
This is in accordance with the Kallithea contribution guidelines:
When creating an object using a factory function (like create_repo), the returned object has already (by convention) been added to the session, and should not be added again.
cleanup: remove SQLAlchemy session argument to action_logger
There's always a global SQLAlchemy session associated with the current thread; using another session for a single function call does not make any sense (as sessions cannot be mixed), unless the code works carefully to ensure the two sessions (and all objects loaded from them) are kept completely separate. Suffice to say that Kallithea does no such thing, thus there's no need to pretend to support multiple concurrent sessions.
The fix in 56bce741176f was incorrect, as it robbed the login dialog of its panel styling. Using on .panel-primary instead of #main solves this.
Additionally, while the rules deleted in a90acf3a1422 were indeed unused, it didn't actually solve the problem of select2 dropdowns being white on white, due to a separate selector also causing this effect. This should be fixed now.
In practice, Kallithea has the 'usergroup.admin' permission imply the 'usergroup.write' permission, which again implies 'usergroup.read'.
This codifies this practice by replacing the HasUserGroupPermissionAny "perm function" with the new HasUserGroupLevel function, reducing the risk of errors and saving quite a lot of typing.
In practice, Kallithea has the 'group.admin' permission imply the 'group.write' permission, which again implies 'group.read'.
This codifies this practice by replacing HasRepoGroupPermissionAny "perm function" with the new HasRepoGroupLevel function, reducing the risk of errors and saving quite a lot of typing.
In practice, Kallithea has the 'repository.admin' permission imply the 'repository.write' permission, which again implies 'repository.read'.
This codifies/enforces this practice by replacing HasRepoPermissionAny "perm function" with the new HasRepositoryLevel function, reducing the risk of errors and saving quite a lot of typing.
The purpose of this check is to ensure that we don't recursively assign "default" user perms for a repo with the "private" flag set (because in that case, the "default" user perms should always be "no access").
(The check, and this fix, is of course only applicable to Kallithea instances that have anonymous access enabled to begin with.)
However, the check was only functional if the user was specified as a username. This is apparently always the case when Kallithea is running, but was not e.g. the case in the test suite, which consistently passed a user ID instead of a username.
This commit ensures that the user is always resolved before the check is made. There's no significant overhead to this, as the code immediately calls RepoModel().grant_user_permission, which resolved the user anyway. This change just moves the database lookup a bit earlier.
Fixing this revealed the matching test case to be broken, so it has been fixed as well.
Down the road, we should eliminate Kallithea's bizarre practice of passing around usernames and user IDs, in favor of passing actual User objects. That'll get rid of mistakes like these, as well as repeated needless database lookups.
comments: for PR comments, link directly to comment on PR page
Comments with approval changes made in the context of PRs also show up on the individual PR changesets, and currently link to the (top of the) PR page. If the user wants to see the surrounding comments, it can be difficult to find the comment.
This change makes the link go to the comment anchor on the PR page, so the user immediately sees the relevant comment thread. The PR metadata is still readily available (by scrolling up or pressing "Home").
Checkboxes are a bit off-alignment throughout Kallithea, but floating them left makes it noticably worse, and also seems like an odd thing to do (checkboxes are usually rendered inline with the label).
Remove rule coloring "#content div.panel div.panel-heading .pull-left a" (and ditto .pull-right) white, as it caused the branch filter drop-down box on the changelog page to become white-on-white.
I have not found anywhere this rule actually serves a purpose.
On several pages, the rules matched our standard grey buttons, but was overruled by an !important style for .btn elements.
On repository group pages, it matches the repo group parent path link, and in /_admin/journal, it matches the "Watched Repositories" and "My Repositories" links; but all of these are already white from the "#content div.panel div.panel-heading a" rule.
auth: when a auth plugin can't be imported try the next one instead of breaking completly
Some authentication modules depend on external services. This may cause the import to fail. Or another scenario is that a (third party) authentication module has been removed and can't be imported anymore.
vcs: remove confusing and unnecessary local variable
This removes the "username" local variable, substituting the equivalent "user.username". The function also has another, unrelated use of the same local variable name; this OTHER "username" variable is untouched.
The relevant code is duplicated between Hg and Git... and thus also this change.
__get_action can only return "pull" or "push"; any other value is an error on our end. And indeed, if not, the old code would crash further down in the function (under "CHECK LOCKING") because the "user" local variable was not assigned.
This also corrects the comment that incorrectly suggests the perm check is only relevant for anonymous access.
The relevant code is duplicated between Hg and Git... and thus also this change.
(Diff becomes clearer if whitespace changes are ignored.)
css: fix vertical alignment in repo and PR summaries etc.
In several table-like layouts (not actual HTML tables) with separate "label" and "value" columns, this 10px margin-top (which only applies to the "value") caused the "cells" to be out of alignment:
- The repository summary (clone URL, description, etc.) - The pull request "summary" (description, origin, target, etc.) - Settings pages of all sorts
'<span class="repotag">git' does not actually appear in JS, so move it above the comment saying so. For the rest, tighten the checks to actually ensure they match JS, and not e.g. the same text in plain HTML elsewhere on the page.
indexers: use correct full repository name, which contains group name, at indexing
Before this revision, searching under the specific repository could cause unexpected result, because repository names used for indexing didn't contain the group name.
This issue was introduced by 8b7c0ef62427, which uses repo.name_unicode as repository name instead of safe_unicode(repo_name) to reduce unicode conversion cost while repetition at indexing.
To use correct repository name at indexing, this revision replaces repo.name_unicode by safe_unicode(repo_name). Reducing cost of repeated unicode conversion cost while will (and should) be addressed in the future.
This revision also adds a comment to BaseRepository.name property, to avoid similar misunderstandings in the future.
Therefore, username related conditions with "this", "a", "you", and so on are completely ignored, even if they are valid username components.
To prevent username related conditions from removing "stop words", this revision explicitly specifies "analyzer" for username related fields of SCHEMA and CHGSETS_SCHEMA.
Difference between EMAILADDRANALYZER and default analyzer of TEXT is whether "stop words" are preserved or not. Tokenization is still applied on usernames.
For future changing, this revision doesn't make EMAILADDRANALYZER share analyzer definition with PATHANALYZER, even though their definitions are identical with each other at this revision.
This revision requires full re-building index tables, because indexing schemas are changed.
Original patch has been modified by Mads Kiilerich - tests of 'owner' will be addressed separately.
Therefore, pathname related conditions with "this", "a", "you", and so on are completely ignored, even if they are valid pathname components.
To prevent pathname related conditions from removing "stop words", this revision explicitly specifies "analyzer" for pathname related fields of SCHEMA and CHGSETS_SCHEMA.
Difference between PATHANALYZER and default analyzer of TEXT is whether "stop words" are preserved or not. Tokenization is still applied on pathnames.
This revision requires full re-building index tables, because indexing schemas are changed.
search: make "repository:" condition work case-insensitively as expected
Before this revision, "repository:" condition at searching for "Commit messages" never shows revisions in a repository, of which name uses upper case letter.
Using ID for "repository" of CHGSETS_SCHEMA preserves case of repository name at indexing. On the other hand, search condition itself is forcibly lowered before parsing.
- files in repository "FOO" is indexed as "FOO" in "repository" field - "repository:FOO" condition is treated as "repository:foo:
Then, indexing search itself is executed case-sensitively. Therefore, "repository:FOO" condition never show revisions in repository "FOO".
But just making "repository" of CHGSETS_SCHEMA case-insensitive isn't reasonable enough, because it breaks assumptions below, if there is case-insensitive name collision between repositories, even though Kallithea itself can manage such repositories at same time.
- combination of "raw_id" (= revision hash ID) and "repository" is unique between all known revisions under Kallithea
CHGSETS_SCHEMA assumes this.
This unique-ness is required by Whoosh library to determine whether index table should be updated or not for that repository.
- searching in a repository shows only revisions in that repository
Before this revision, this filtering is achieve by "repository:" condition with case-preserved repository name from requested URL.
To make "repository:" search condition work case-insensitively as expected (without any violation of assumptions above), this revision does:
- make "repository" of CHGSETS_SCHEMA case-insensitive by "analyzer=ICASEIDANALYZER"
- introduce "repository_rawname" into SCHEMA and CHGSETS_SCHEMA, to ensure assumptions described above, by preserving case of repository name
"repository_rawname" of SCHEMA uses not ID but TEXT, because the former disable "positions" feature, which is required for highlight-ing file content (see previous revision for detail).
This revision requires full re-building index tables, because indexing schemas are changed.
search: make "repository:" condition work as expected
Before this revision, "repository:foo" condition at searching for "File contents" or "File names" shows files in repositories below.
- foo - foo/bar - foo-bar - and so on ...
Whoosh library, which is used to parse text for indexing and seaching, does:
- treat almost all non-alphanumeric characters as delimiter both at indexing search items and at parsing search condition - make each fields for a search item be indexed by multiple values
For example, files in "foo/bar" repository are indexed by "foo" and "bar" in "repository" field. This tokenization make "repository:foo" search condition match against files in "foo/bar" repository, too.
In addition to it, using plain TEXT also causes unintentional ignorance of "stop words" in search conditions. For example, "this", "a", "you", and so on are ignored at indexing and parsing, because these are too generic words (from point of view of generic "text search").
This issue can't be resolved by using ID instead of TEXT for "repository" of SCHEMA, like as previous revisions for JOURNAL_SCHEMA, because:
- highlight-ing file content requires SCHEMA to support "positions" feature, but using ID instead of TEXT disables it - using ID violates current case-insensitive search policy, because it preserves case of text
To make "repository:" condition work as expected, this revision explicitly specifies "analyzer", which does:
- avoid tokenization - match case-insensitively - avoid removing "stop words" from text
This revision requires full re-building index tables, because indexing schema is changed.
BTW, "repository:" condition at searching for "Commit messages" uses CHGSETS_SCHEMA instead of SCHEMA. The former uses ID for "repository", and it does:
- avoid issues by tokenization and removing "stop words"
- disable "positions" feature of CHGSETS_SCHEMA
But highlight-ing file content isn't needed at searching for "Commit messages". Therefore, this can be ignored.
- preserve case of text
This violates current case-insensitive search policy, This issue will be fixed by subsequent revision, because fixing it isn't so simple.
journal: make "repository:" filtering condition work as expected (Issue #261)
Before this revision, journal filtering conditions like as below never match against any entry, even if there are corresponded repositories.
- repository:foo/bar - repository:foo-bar
Whoosh library, which is used to parse filtering condition, does:
- treat almost all non-alphanumeric characters as delimiter at parsing condition - join each conditions at filtering by "AND", by default
For example, filtering condition "repository:foo/bar" is translated as "repository:foo AND repository:bar". This combined condition never matches against any entry, because it is impossible that "repository" field in DBMS table "user_logs" has both "foo" and "bar" values at same time.
Using TEXT for "repository" of JOURNAL_SCHEMA causes this issue, because TEXT assumes tokenization at parsing.
In addition to it, using TEXT also causes unintentional ignorance of "stop words" in filtering conditions. For example, "this", "a", "you", and so on are ignored at parsing, because these are too generic words (from point of view of generic "text search").
To make "repository:" filtering condition work as expected, this revision uses ID instead of TEST for "repository" of JOURNAL_COLUMN. ID avoids both tokenization and removing "stop words".
This replacement should be safe with already existing DBMS instance, because:
- JOURNAL_SCHEMA is used only to parse filtering condition - DBMS table "user_logs" itself is defined by UserLog class (in kallithea/model/db.py)
BTW, using ID also avoids normalization by lowercase-ing. But this doesn't violate current case-insensitive search policy, because LOWER-ing in actual SQL query is achieved by get_filterion() or so in kallithea/controllers/admin/admin.py.
tests: introduce more test coverage of whoosh filtering
This has been extracted from other changesets by Mads Kiilerich to establish a test baseline so we clearly can see what the following fixes are fixing.
Some of these tests will thus demonstrate bad behaviour - that will be fixed later.
* drop support for 'cache.' prefix - it is only 'beaker.config.' that ever has been documented or used in examples * make prefix stripping more safe (if the prefix should appear twice in the key) * trust exactly what has been configured - drop stripping of strings * drop int conversion of 'expire' - beaker documentation suggests strings are fine
The second argument to the guess_instance classmethod is in practice fixed for Gist, Permission, RepoGroup, Repository, User, UserGroup; so might as well move this logic into per-class specializations of guess_instance.
Since the test repositories are created in the database by scanning the file system, their IDs depends on the (non-deterministic) order in which files are listed by the file system.
The solution, of course, is to ask the database for the ID instead of assuming it will be ID 1.
This refactoring simply emphasizes that the origin reference used to generate the default title is not always the same as the one actually used to create the pull request; for PRs not from a branch head, the branch name is stored so it can be used for creating new iterations.