Visiting a /statistics with py3 would fail with: ... in statistics sorted(lang_stats, reverse=True, key=lambda k: k[1])[:10] TypeError: '<' not supported between instances of 'dict' and 'dict'
The "summary" computation didn't have that problem. And it put '?' as description for unknown extensions. And it had stable output as it also sorted on the file extension as secondary key. Just use that.
lib: clean up ext_json and how it is used - avoid monkey patching
Note that py3 json.dumps will return ASCII (with all unicode escaped) encoded as str. But we generally want JSON as bytes (which json.loads also can read), so also wrap the result with ascii_bytes in many places.
vcs: fix get_file_annotate - consistently bind sha so it has the right value when executing
The Git implementation did *not* save the sha value in the lambda expression for the "changeset lazy loader". Thus, if the generator had moved on and assigned a different value to sha when the expression was executed, it would use the "wrong" sha.
Fixed by doing as the Hg implementation: bind the sha value as value of a default parameter when defining the lambda expression.
The Hg implementation did however also save the line - it is not used, and there is no need for that.
vcs: tweak how revisions and repo names are shown in error messages
Decode bytes to str, and show repo name instead of repr or full server file system path. In some places, it will only report the "basename" of the repository, without any "group names" that also would be nice to have. The easy alternative would be to show the full file system path ... but it would be unfortunate to leak absolute server side paths to end users.
logging: always invoke fileConfig with '__file__' and 'here'
WSGI servers tend to provide '__file__' and 'here' as 'defaults' when invoking fileConfig, so '%(here)s' string interpolation also can be used in logging configuration.
Make sure we do the same when we initialize logging without using a WSGI server.
It is annoying to have to do this, and it will only in rare cases make any difference ... but it seems like the best option.
logging: drop fileConfig initialization in make_app - backout 0d4dd9380a45
0d4dd9380a45 was a bit harmful, as it might overwrite existing good logging configuration.
0d4dd9380a45 no longer seems relevant: Testing shows that logging for `gearbox serve` *is* activated anyway. gearbox/commands/serve.py will invoke "setup_logging" right before "loadapp".
We must and can assume that logging has been initialized before make_app.
Reported and based on analysis by Wolfgang Scherer.
Essentially a backout of d2a97f73fa1f and the 4851d15bc437_db_migration_step_after_95c01895c006_ alembic step.
We can't reliably have full index on fields with unbounded length. The upgrade step has been reported to fail on MySQL [1]:
sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (1170, "BLOB/TEXT column 'public_key' used in key specification without a key length") [SQL: u'CREATE INDEX usk_public_key_idx ON user_ssh_keys (public_key)'] (Background on this error at: http://sqlalche.me/e/e3q8)
And we really don't need this index ... especially now when we use fingerprints for key deletion instead of looking up by the full public key.
ssh: make it clear that SshKeyModel.delete has user as mandatory parameter
It is already provided in the two uses: kallithea/controllers/admin/my_account.py: SshKeyModel().delete(fingerprint, request.authuser.user_id) kallithea/controllers/admin/users.py: SshKeyModel().delete(fingerprint, c.user.user_id)
validator: fix ASCII password check to verify if it can be *encoded* in ascii
In Python 2, unicode strings have a .decode method (which really doesn't make sense). Python 3 has more strict typing by design, and unicode strings don't have a .decode method.
A Unicode string "is ASCII" if it can be encoded as ASCII. The check should thus *encode* to ASCII - not decode.
gist: make it a bit more clear how gist_access_id is used ... and how it is different from gist_id
A gist has a gist_access_id which gives access to it. For private Gists, it is a multi-letter secure random string.
gist_id is the primary key in the database and thus an automatically incrementing integer. It is also used as the not-so-secret gist_access_id for public gists.
This gets rid of one odd safe_unicode applied to an int.
lib: handle both HTML, unsafe strings, and exceptions passed to helpers.flash()
Before, h.flash would trust any input to contain html ... and callers would convert exceptions to string, often with a simple str() or unicode() ... which really didn't deserve to be trusted.
Instead, only trust messages that have a __html__ and escape anything else ... but also apply str/unicode on the parameter so the caller doesn't have to but *can* pass an exception directly.
lib: let get_git_version invoke git as all other commands do, without special options
There is no need for _bare or _safe. It is fine to have '-c core.quotepath=false' before '--version', and it is perfectly fine to get a RepositoryError if things go terribly wrong.
lib: establish py3 compatible strategy for string handling: introducing safe_bytes and deprecating safe_str
The meaning of safe_str will change when moving to py3. All use of safe_str is thus tech debt that we have to chop off, mostly by moving to either safe_unicode or safe_bytes ... or dropping because we know what we are doing and rely on the improved type safety in py3.
We will soon move to Python 3 which only will support 5.1 or later.
Remove old hacks and tech debt.
Also avoids future warning: DeprecationWarning: inspect.getargspec() is deprecated since Python 3.0, use inspect.signature() or inspect.getfullargspec()
lib: only maintain one copy of safe_str / safe_unicode
The standalone-ish nature of vcs gets a bit in the way. It already depends on some very generic Kallithea functionality. But for now, avoid code duplication, and let Kallithea use vcs functionality instead of duplicating it.
feeds: replace webhelpers.feedgenerator with simple mako templating for rendering RSS/Atom XML
Most of the complexity in RSS libraries is in dynamically supporting all kinds of attributes. For our use, we have a small static set of attributes we use, and it is simpler to just use mako.
Also, webhelpers is dead, and the alternatives seem quite heavy.
repo: don't just report user name and email in one field - separate things properly
In the repo RSS feed, report author as <author>name@example.com (User Name)</author> instead of using <dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">User Name <name@example.com></dc:creator>
And in the ATOM feed with name and email separate: <author> <name>User Name</name> <email>name@example.com</email> </author> Instead of <name>User Name <name@example.com></name>
journal: don't include email in author name - avoid double data
In journal RSS feed, report author as: <author>name@example.com (User Name)</author> instead of double email due to: <author>name@example.com (User Name <name@example.com>)</author>
In the journal ATOM feed, report author as: <author> <name>user Name</name> <email>test_admin@example.com</email> </author> instead of using double email due to: <name>name@example.com (User Name <name@example.com>)</name>