journal: make "repository:" filtering condition work as expected (Issue #261)
Before this revision, journal filtering conditions like as below never match against any entry, even if there are corresponded repositories.
- repository:foo/bar - repository:foo-bar
Whoosh library, which is used to parse filtering condition, does:
- treat almost all non-alphanumeric characters as delimiter at parsing condition - join each conditions at filtering by "AND", by default
For example, filtering condition "repository:foo/bar" is translated as "repository:foo AND repository:bar". This combined condition never matches against any entry, because it is impossible that "repository" field in DBMS table "user_logs" has both "foo" and "bar" values at same time.
Using TEXT for "repository" of JOURNAL_SCHEMA causes this issue, because TEXT assumes tokenization at parsing.
In addition to it, using TEXT also causes unintentional ignorance of "stop words" in filtering conditions. For example, "this", "a", "you", and so on are ignored at parsing, because these are too generic words (from point of view of generic "text search").
To make "repository:" filtering condition work as expected, this revision uses ID instead of TEST for "repository" of JOURNAL_COLUMN. ID avoids both tokenization and removing "stop words".
This replacement should be safe with already existing DBMS instance, because:
- JOURNAL_SCHEMA is used only to parse filtering condition - DBMS table "user_logs" itself is defined by UserLog class (in kallithea/model/db.py)
BTW, using ID also avoids normalization by lowercase-ing. But this doesn't violate current case-insensitive search policy, because LOWER-ing in actual SQL query is achieved by get_filterion() or so in kallithea/controllers/admin/admin.py.
tests: introduce more test coverage of whoosh filtering
This has been extracted from other changesets by Mads Kiilerich to establish a test baseline so we clearly can see what the following fixes are fixing.
Some of these tests will thus demonstrate bad behaviour - that will be fixed later.
* drop support for 'cache.' prefix - it is only 'beaker.config.' that ever has been documented or used in examples * make prefix stripping more safe (if the prefix should appear twice in the key) * trust exactly what has been configured - drop stripping of strings * drop int conversion of 'expire' - beaker documentation suggests strings are fine
The second argument to the guess_instance classmethod is in practice fixed for Gist, Permission, RepoGroup, Repository, User, UserGroup; so might as well move this logic into per-class specializations of guess_instance.
Since the test repositories are created in the database by scanning the file system, their IDs depends on the (non-deterministic) order in which files are listed by the file system.
The solution, of course, is to ask the database for the ID instead of assuming it will be ID 1.
This refactoring simply emphasizes that the origin reference used to generate the default title is not always the same as the one actually used to create the pull request; for PRs not from a branch head, the branch name is stored so it can be used for creating new iterations.
auth: simplify the double invoked auth classes used for permission checking
Also avoid storing request state in the Has Permission instances. The instances er temporary and only used once and there is thus not real problem, but it is simpler and cleaner this way.
controllers: avoid setting request state in controller instances - set it in the thread global request variable
In TurboGears, controllers are singletons and we should avoid using instance variables for any volatile data. Instead, use the "global thread local" request context.
With everything in request, some use of c is dropped.
Note: kallithea/controllers/api/__init__.py still use instance variables that will cause problems with TurboGears.
controllers: avoid setting constants as controller instance variables in __before__
Setting constants in __before__ is a weird pattern and we can do fine without doing it. That makes it more clear that there is no state in the controller instances.
Real constants can just be set at the module level.
Some values depend on configuration and can thus probably not be set as constants at module import time but could perhaps be set in __init__. But reading configuration directly when needed will probably be just as good.
This type of code examples are not present in other files. It is unclear whether this example is still accurate. And it is equally unclear whether they provide real value compared to 'real' code.
The markup in base.html is a bit misleading. The "quick login" drop down menu is not just the quick login box but also the "current user info" box.
When the user is logged in, the "real" quick login box isn't shown and the drop menu's aria-describedby="quick_login_h" thus becomes an invalid reference.
To fix the reference, also call the user info for quick_login_h - that is approximately correct.
hg: return 400 Bad Request for hg commands that not are commands
Avoid throwing bare Exceptions which requires framework specific testing. Instead, return a reasonable http error code and make the test more framework independent.
The "helpful" message will just be a description of the http exception and not sent to the client.
auth: add support for "Bearer" auth scheme (API key variant)
This allows the API key to be passed in a header instead of the query string, reducing the risk of accidental API key leaks:
Authorization: Bearer <api key>
The Bearer authorization scheme is standardized in RFC 6750, though used here outside the full OAuth 2.0 authorization framework. (Full OAuth can still be added later without breaking existing users.)
auth: track API key used for authentication in AuthUser
This allows us to define only once how an API key is passed to the app. We might e.g. allow API keys to be passed in an HTTP header; with this change, we only need to update the code in one place.
Also change the code to verify up front that the API key resolved to a valid and active user, so LoginRequired doesn't need to do that.
Also return plain 403 Forbidden for bad API keys instead of redirecting to the login form, which makes more sense for non-interactive clients (the typical users of API keys).
auth: add AuthUser.is_anonymous, along with some exposition
This reveals the name of the NotAnonymous decorator to be misleading, an unfortunate detail only documented here, but which must be properly resolved in a later changeset.
Note that NotAnonymous behaves as advertised as long as it is used together with LoginRequired, which is always the case in the current code, so there's no actual security issue here, the code is just weird, hard to read and fragile.
---
Some thoughts on cleaning this up in a future changeset: As it turns out, every controller (except the login page!) should be LoginRequired decorated (since it doesn't actually block anonymous users, as long as anonymous access is enabled in the Kallithea config). Thus the most obvious solution would be to move the LoginRequired functionality into BaseController (with an override for LoginController), and delete the decorator entirely. However, LoginRequired does one other thing: it carries information about whether API access is enabled for individual controller methods ("@LoginRequired(api_key=True)"), and also performs the check for this, something which is not easily moved into the base controller class, since the base controller doesn't know which method is about to be called. Possibly that can be determined by poking Pylons, but such code is likely to break with the upcoming TurboGears 2 move. Thus such cleanup is probably better revisited after the switch to TG2.
auth: perform basic HTTP security checks already in BaseController
There's no reason to postpone these to a LoginRequired decorated controller function. This way, they run unconditionally for all subclasses of BaseController (so everything except JSON-RPC and VCS controllers).