
* prepare codebase to create scheduled tasks
there is some prep work involved with this. the scheduler would be happy
if this work was done. simply, we extract out the `created_utc`
interface from *everything* that uses it such that we don't have to
repeat ourselves a bunch. all fun stuff.
next commit is the meat of it.
* cron: basic backend work for scheduler
* avoid ipmort loop
* attempt 2 at fixing import loops
* parathensize because operator precedence
* delete file that came back for some reason.
* does NOPing the oauth apps work?
* import late and undo clients.py change
* stringify column names.
* reorder imports.
* remove task reference
* fix missing mapper object
* make coupled to repeatabletask i guess
* sanitize: fix sanitize imports
* import shadowing crap
* re-shadow shadowed variable
* fix regexes
* use the correct not operator
* readd missing commit
* scheduler: SQLA only allows concrete relations
* implement submission scheduler
* fix import loop with db_session
* get rid of import loop in submission.py and comment.py
* remove import loops by deferring import until function clal
* i give up.
* awful.
* ...
* fix another app import loop
* fix missing import in route handler
* fix import error in wrappers.py
* fix wrapper error
* call update wrapper in the admin_level_required case
* :marseyshrug:
* fix issue with wrapper
* some cleanup and some fixes
* some more cleanup
let's avoid polluting scopes where we can.
* ...
* add SCHEDULED_POSTS permission.
* move const.py into config like the other files.
* style fixes.
* lock table for concurrency improvements
* don't attempt to commit on errors
* Refactor code, create `TaskRunContext`, create python callable task type.
* use import contextlib
* testing stuff i guess.
* handle repeatable tasks properly.
* Attempt another fix at fighting the mapper
* do it right ig
* SQLA1.4 doesn't support nested polymorphism ig
* fix errenous class import
* fix mapper errors
* import app in wrappers.py
* fix import failures and stuff like that.
* embed and import fixes
* minor formatting changes.
* Add running state enum and don't attempt to check for currently running tasks.
* isort
* documentation, style, and commit after each task.
* Add completion time and more docs, rename, etc
* document `CRON_SLEEP_SECONDS` better.
* add note about making LiteralString
* filter out tasks that have been run in the future
* reference RepeatableTask's `__tablename__` directly
* use a master/slave configuration for tasks
the master periodically checks to see if the slave is alive, healthy,
and not taking too many resources, and if applicable kills its
child and restarts it.
only one relation is supported at the moment.
* don't duplicate process unnecessarily
* note impl detail, add comments
* fix imports.
* getting imports to stop being stupid.
* environment notes.
* syntax derp
* *sigh*
* stupid environment stuff
* add UI for submitting a scheduled post
* stupid things i need to fix the user class
* ...
* fix template
* add formkey
* pass v
* add hour and minute field
* bleh
* remove concrete
* the sqlalchemy docs are wrong
* fix me being dumb and not understanding error messages
* missing author attribute for display
* author_name property
* it's a property
* with_polymorphic i think fixes this
* dsfavgnhmjk
* *sigh*
* okay try this again
* try getting rid of the comment section
* include -> extends
* put the div outside of the thing.
* fix user page listings :/
* mhm
* i hate this why isn't this working
* this should fix it
* Fix posts being set as disabled by default
* form UI imrpovements
* label
* <textarea>s should have their closing tag
* UI fixes.
* and fix errenous spinner thing.
* don't abort(415) when browsers send 0 length files for some reason
* UI improvements
* line break.
* CSS :S
* better explainer
* don't show moderation buttons for scheduled posts
* ...
* meh
* add edit form
* include forms on default page.
* fix hour minute selectino.
* improve ui i guess and add api
* Show previous postings on scheduled task page
* create task id
* sqla
* posts -> submissions
* fix OTM relationship
* edit URL
* use common formkey control
* Idk why this isn't working
* Revert "Idk why this isn't working"
This reverts commit 3b93f741df
.
* does removing viewonly fix it?
* don't import routes on db migrations
* apparently this has to be a string
* UI improvements redux
* margins and stuff
* add cron to supervisord
* remove stupid duplication
* typo fix
* postgres syntax error
* better lock and error handling
* add relationship between task and runs
* fix some ui stuff
* fix incorrect timestamp comparison
* ...
* Fix logic errors blocking scheduled posts
Two bugs here:
- RepeatableTask.run_time_last <= now: run_time_last is NULL by
default. NULL is not greater than, less than, or equal to any
value. We use NULL to signify a never-run task; check for that
condition when building the task list.
- `6 <= weekday <= 0`: there is no integer that is both gte 6 and
lte 0. This was always false.
* pasthrough worker process STDOUT and STDERR
* Add scheduler to admin panel
* scheduler
* fix listing and admin home
* date formatting ixes
* fix ages
* task user interface
* fix some more import crap i have to deal with
* fix typing
* avoid import loop
* UI fixes
* fix incorrect type
* task type
* Scheduled task UI improvements (add runs and stuff)
* make the width a lil bit smaller
* task runs.
* fix submit page
* add alembic migration
* log on startup
* Fix showing edit button
* Fix logic for `can_edit` (accidentally did `author_id` instead of `id`)
* Broad review pass
Review:
- Call `invalidate_cache` with `is_html=` explicitly for clarity,
rather than a bare boolean in the call args.
- Remove `marseys_const*` and associated stateful const system:
the implementation was good if we needed them, but TheMotte
doesn't use emoji, and a greenfield emoji system would likely
not keep those darned lists floating in thread-local scope.
Also they were only needed for goldens and random emoji, which
are fairly non-central features.
- Get `os.environ` fully out of the templates by using the new
constants we already have in files.helpers.config.environment.
- Given files.routes.posts cleanup,get rid of shop discount dict.
It's already a mapping of badge IDs to discounts for badges that
likely won't continue to exist (if they even do at present).
- RepeatableTaskRun.exception: use `@property.setter` instead of
overriding `__setattr__`.
Fix:
- Welcome message literal contained an indented Markdown code block.
- Condition to show "View source" button changed to show source to
logged out. This may well be a desirable change, but it's not
clearly intended here.
* Fix couple of routing issues
* fix 400 with post body editing
* Add error handler for HTTP 415
* fix router giving wrong arg name to handler
* Use supervisord to monitor memory rather than DIY
Also means we're using pip for getting supervisord now, so we don't rely
on the Debian image base for any packages.
* fix task run elapsed time display
* formatting and removing redundant code
* Fix missing ModAction import
* dates and times fixes
* Having to modify imports here anyway, might as
well change it.
* correct documentation.
* don't use urlunparse
* validators: import sanitize instead of from syntax
* cron: prevent races on task running
RepeatableTask.run_state_enum acts as the mutex on repeatable tasks.
Previously, the list of tasks to run was acquired before individually
locking each task. However, there was a period where the table is both
unlocked and the tasks are in state WAITING between those points.
This could potentially have led to two 'cron' processes each running the
same task simultaneously. Instead, we check for runnability both when
building the preliminary list and when mutexing the task via run state
in the database.
Also:
- g.db and the cron db object are both instances of `Session`, not
`scoped_session` because they are obtained from
`scoped_session.__call__`, which acts as a `Session` factory.
Propagate this to the type hints.
- Sort order of task run submissions so /tasks/scheduled_posts/<id>
"Previous Task Runs" listings are useful.
* Notify followers on post publication
This was old behavior lost in the refactoring of the submit endpoint.
Also fix an AttributeError in `Follow.__repr__` which carried over
from all the repr copypasta.
* Fix image attachment
Any check for `file.content_length` relies on browsers sending
Content-Length headers with the request. It seems that few actually do.
The pre-refactor approach was to check for truthiness, which excludes
both None and the strange empty strings that we seem to get in absence
of a file upload. We return to doing so.
---------
Co-authored-by: TLSM <duolsm@outlook.com>
192 lines
6.5 KiB
Python
192 lines
6.5 KiB
Python
import shutil
|
|
import time
|
|
import urllib.parse
|
|
from dataclasses import dataclass
|
|
from typing import Optional
|
|
|
|
from flask import Request, abort, request
|
|
from werkzeug.datastructures import FileStorage
|
|
|
|
import files.helpers.embeds as embeds
|
|
import files.helpers.sanitize as sanitize
|
|
from files.helpers.config.environment import SITE_FULL, YOUTUBE_KEY
|
|
from files.helpers.config.const import (SUBMISSION_BODY_LENGTH_MAXIMUM,
|
|
SUBMISSION_TITLE_LENGTH_MAXIMUM,
|
|
SUBMISSION_URL_LENGTH_MAXIMUM)
|
|
from files.helpers.content import canonicalize_url2
|
|
from files.helpers.media import process_image
|
|
|
|
|
|
def guarded_value(val:str, min_len:int, max_len:int) -> str:
|
|
'''
|
|
Get request value `val` and ensure it is within length constraints
|
|
Requires a request context and either aborts early or returns a good value
|
|
'''
|
|
raw = request.values.get(val, '').strip()
|
|
raw = raw.replace('\u200e', '')
|
|
|
|
if len(raw) < min_len: abort(400, f"Minimum length for {val} is {min_len}")
|
|
if len(raw) > max_len: abort(400, f"Maximum length for {val} is {max_len}")
|
|
# TODO: it may make sense to do more sanitisation here
|
|
return raw
|
|
|
|
|
|
def int_ranged(val:str, min:int, max:int) -> int:
|
|
raw:Optional[int] = request.values.get(val, default=None, type=int)
|
|
if raw is None or raw < min or raw > max:
|
|
abort(400,
|
|
f"Invalid input ('{val}' must be an integer and be between {min} and {max})")
|
|
return raw
|
|
|
|
@dataclass(frozen=True, kw_only=True, slots=True)
|
|
class ValidatedSubmissionLike:
|
|
title: str
|
|
title_html: str
|
|
body: str
|
|
body_raw: Optional[str]
|
|
body_html: str
|
|
url: Optional[str]
|
|
thumburl: Optional[str]
|
|
|
|
@property
|
|
def embed_slow(self) -> Optional[str]:
|
|
url:Optional[str] = self.url
|
|
url_canonical: Optional[urllib.parse.ParseResult] = self.url_canonical
|
|
if not url or not url_canonical: return None
|
|
|
|
embed:Optional[str] = None
|
|
domain:str = url_canonical.netloc
|
|
|
|
if domain == "twitter.com":
|
|
embed = embeds.twitter(url)
|
|
|
|
if url.startswith('https://youtube.com/watch?v=') and YOUTUBE_KEY:
|
|
embed = embeds.youtube(url)
|
|
|
|
if SITE_FULL in domain and "/post/" in url and "context" not in url:
|
|
id = url.split("/post/")[1]
|
|
if "/" in id: id = id.split("/")[0]
|
|
embed = str(int(id))
|
|
|
|
return embed if embed and len(embed) <= 1500 else None
|
|
|
|
@property
|
|
def repost_search_url(self) -> Optional[str]:
|
|
search_url = self.url_canonical_str
|
|
if not search_url: return None
|
|
|
|
if search_url.endswith('/'):
|
|
search_url = search_url[:-1]
|
|
return search_url
|
|
|
|
@property
|
|
def url_canonical(self) -> Optional[urllib.parse.ParseResult]:
|
|
if not self.url: return None
|
|
return canonicalize_url2(self.url, httpsify=True)
|
|
|
|
@property
|
|
def url_canonical_str(self) -> Optional[str]:
|
|
url_canonical:Optional[urllib.parse.ParseResult] = self.url_canonical
|
|
if not url_canonical: return None
|
|
return url_canonical.geturl()
|
|
|
|
@classmethod
|
|
def from_flask_request(cls,
|
|
request:Request,
|
|
*,
|
|
allow_embedding:bool,
|
|
allow_media_url_upload:bool=True,
|
|
embed_url_file_key:str="file2",
|
|
edit:bool=False) -> "ValidatedSubmissionLike":
|
|
'''
|
|
Creates the basic structure for a submission and validating it. The
|
|
normal submission API has a lot of duplicate code and while this is not
|
|
a pretty solution, this essentially forces all submission-likes through
|
|
a central interface.
|
|
|
|
:param request: The Flask Request object.
|
|
:param allow_embedding: Whether to allow embedding. This should usually
|
|
be the value from the environment.
|
|
:param allow_media_url_upload: Whether to allow media URL upload. This
|
|
should generally be `True` for submission submitting if file uploads
|
|
are allowed and `False` in other contexts (such as editing)
|
|
:param embed_url_file_key: The key to use for inline file uploads.
|
|
:param edit: The value of `edit` to pass to `sanitize`
|
|
'''
|
|
|
|
def _process_media(file:Optional[FileStorage]) -> tuple[bool, Optional[str], Optional[str]]:
|
|
if request.headers.get("cf-ipcountry") == "T1": # forbid Tor uploads
|
|
return False, None, None
|
|
elif not file:
|
|
# We actually care about falseyness, not just `is not None` because
|
|
# no attachment is <FileStorage: '' ('application/octet-stream')>
|
|
# (at least from Firefox 111).
|
|
return False, None, None
|
|
elif not file.content_type.startswith('image/'):
|
|
abort(415, "Image files only")
|
|
|
|
name = f'/images/{time.time()}'.replace('.','') + '.webp'
|
|
file.save(name)
|
|
url:Optional[str] = process_image(name)
|
|
if not url: return False, None, None
|
|
|
|
name2 = name.replace('.webp', 'r.webp')
|
|
shutil.copyfile(name, name2)
|
|
thumburl:Optional[str] = process_image(name2, resize=100)
|
|
return True, url, thumburl
|
|
|
|
def _process_media2(body:str, file2:Optional[list[FileStorage]]) -> tuple[bool, str]:
|
|
if request.headers.get("cf-ipcountry") == "T1": # forbid Tor uploads
|
|
return False, body
|
|
elif not file2: # empty list or None
|
|
return False, body
|
|
file2 = file2[:4]
|
|
if not all(file for file in file2):
|
|
# Falseyness check to handle <'' ('application/octet-stream')>
|
|
return False, body
|
|
|
|
for file in file2:
|
|
if not file.content_type.startswith('image/'):
|
|
abort(415, "Image files only")
|
|
|
|
name = f'/images/{time.time()}'.replace('.','') + '.webp'
|
|
file.save(name)
|
|
image = process_image(name)
|
|
if allow_embedding:
|
|
body += f"\n\n"
|
|
else:
|
|
body += f'\n\n<a href="{image}">{image}</a>'
|
|
return True, body
|
|
|
|
title = guarded_value("title", 1, SUBMISSION_TITLE_LENGTH_MAXIMUM)
|
|
title = sanitize.sanitize_raw(title, allow_newlines=False, length_limit=SUBMISSION_TITLE_LENGTH_MAXIMUM)
|
|
|
|
url = guarded_value("url", 0, SUBMISSION_URL_LENGTH_MAXIMUM)
|
|
|
|
body_raw = guarded_value("body", 0, SUBMISSION_BODY_LENGTH_MAXIMUM)
|
|
body_raw = sanitize.sanitize_raw(body_raw, allow_newlines=True, length_limit=SUBMISSION_BODY_LENGTH_MAXIMUM)
|
|
|
|
if not url and allow_media_url_upload:
|
|
has_file, url, thumburl = _process_media(request.files.get("file"))
|
|
else:
|
|
has_file = False
|
|
thumburl = None
|
|
|
|
has_file2, body = _process_media2(body_raw, request.files.getlist(embed_url_file_key))
|
|
|
|
if not body_raw and not url and not has_file and not has_file2:
|
|
raise ValueError("Please enter a URL or some text")
|
|
|
|
title_html = sanitize.filter_emojis_only(title, graceful=True)
|
|
if len(title_html) > 1500:
|
|
raise ValueError("Rendered title is too big!")
|
|
|
|
return ValidatedSubmissionLike(
|
|
title=title,
|
|
title_html=sanitize.filter_emojis_only(title, graceful=True),
|
|
body=body,
|
|
body_raw=body_raw,
|
|
body_html=sanitize.sanitize(body, edit=edit),
|
|
url=url,
|
|
thumburl=thumburl,
|
|
)
|