rDrama/files/helpers/validators.py
justcool393 be952c2771
post scheduling (#554)
* prepare codebase to create scheduled tasks

there is some prep work involved with this. the scheduler would be happy
 if this work was done. simply, we extract out the `created_utc`
 interface from *everything* that uses it such that we don't have to
 repeat ourselves a bunch. all fun stuff.

next commit is the meat of it.

* cron: basic backend work for scheduler

* avoid ipmort loop

* attempt 2 at fixing import loops

* parathensize because operator precedence

* delete file that came back for some reason.

* does NOPing the oauth apps work?

* import late and undo clients.py change

* stringify column names.

* reorder imports.

* remove task reference

* fix missing mapper object

* make coupled to repeatabletask i guess

* sanitize: fix sanitize imports

* import shadowing crap

* re-shadow shadowed variable

* fix regexes

* use the correct not operator

* readd missing commit

* scheduler: SQLA only allows concrete relations

* implement submission scheduler

* fix import loop with db_session

* get rid of import loop in submission.py and comment.py

* remove import loops by deferring import until function clal

* i give up.

* awful.

* ...

* fix another app import loop

* fix missing import in route handler

* fix import error in wrappers.py

* fix wrapper error

* call update wrapper in the admin_level_required case

* :marseyshrug:

* fix issue with wrapper

* some cleanup and some fixes

* some more cleanup

let's avoid polluting scopes where we can.

* ...

* add SCHEDULED_POSTS permission.

* move const.py into config like the other files.

* style fixes.

* lock table for concurrency improvements

* don't attempt to commit on errors

* Refactor code, create `TaskRunContext`, create python callable task type.

* use import contextlib

* testing stuff i guess.

* handle repeatable tasks properly.

* Attempt another fix at fighting the mapper

* do it right ig

* SQLA1.4 doesn't support nested polymorphism ig

* fix errenous class import

* fix mapper errors

* import app in wrappers.py

* fix import failures and stuff like that.

* embed and import fixes

* minor formatting changes.

* Add running state enum and don't attempt to check for currently running tasks.

* isort

* documentation, style, and commit after each task.

* Add completion time and more docs, rename, etc

* document `CRON_SLEEP_SECONDS` better.

* add note about making LiteralString

* filter out tasks that have been run in the future

* reference RepeatableTask's `__tablename__` directly

* use a master/slave configuration for tasks

the master periodically checks to see if the slave is alive, healthy,
and not taking too many resources, and if applicable kills its
child and restarts it.

only one relation is supported at the moment.

* don't duplicate process unnecessarily

* note impl detail, add comments

* fix imports.

* getting imports to stop being stupid.

* environment notes.

* syntax derp

* *sigh*

* stupid environment stuff

* add UI for submitting a scheduled post

* stupid things i need to fix the user class

* ...

* fix template

* add formkey

* pass v

* add hour and minute field

* bleh

* remove concrete

* the sqlalchemy docs are wrong

* fix me being dumb and not understanding error messages

* missing author attribute for display

* author_name property

* it's a property

* with_polymorphic i think fixes this

* dsfavgnhmjk

* *sigh*

* okay try this again

* try getting rid of the comment section

* include -> extends

* put the div outside of the thing.

* fix user page listings :/

* mhm

* i hate this why isn't this working

* this should fix it

* Fix posts being set as disabled by default

* form UI imrpovements

* label

* <textarea>s should have their closing tag

* UI fixes.

* and fix errenous spinner thing.

* don't abort(415) when browsers send 0 length files for some reason

* UI improvements

* line break.

* CSS :S

* better explainer

* don't show moderation buttons for scheduled posts

* ...

* meh

* add edit form

* include forms on default page.

* fix hour minute selectino.

* improve ui i guess and add api

* Show previous postings on scheduled task page

* create task id

* sqla

* posts -> submissions

* fix OTM relationship

* edit URL

* use common formkey control

* Idk why this isn't working

* Revert "Idk why this isn't working"

This reverts commit 3b93f741df.

* does removing viewonly fix it?

* don't import routes on db migrations

* apparently this has to be a string

* UI improvements redux

* margins and stuff

* add cron to supervisord

* remove stupid duplication

* typo fix

* postgres syntax error

* better lock and error handling

* add relationship between task and runs

* fix some ui stuff

* fix incorrect timestamp comparison

* ...

* Fix logic errors blocking scheduled posts

Two bugs here:
  - RepeatableTask.run_time_last <= now: run_time_last is NULL by
    default. NULL is not greater than, less than, or equal to any
    value. We use NULL to signify a never-run task; check for that
    condition when building the task list.
  - `6 <= weekday <= 0`: there is no integer that is both gte 6 and
    lte 0. This was always false.

* pasthrough worker process STDOUT and STDERR

* Add scheduler to admin panel

* scheduler

* fix listing and admin home

* date formatting ixes

* fix ages

* task user interface

* fix some more import crap i have to deal with

* fix typing

* avoid import loop

* UI fixes

* fix incorrect type

* task type

* Scheduled task UI improvements (add runs and stuff)

* make the width a lil bit smaller

* task runs.

* fix submit page

* add alembic migration

* log on startup

* Fix showing edit button

* Fix logic for `can_edit` (accidentally did `author_id` instead of `id`)

* Broad review pass

Review:
  - Call `invalidate_cache` with `is_html=` explicitly for clarity,
    rather than a bare boolean in the call args.
  - Remove `marseys_const*` and associated stateful const system:
    the implementation was good if we needed them, but TheMotte
    doesn't use emoji, and a greenfield emoji system would likely
    not keep those darned lists floating in thread-local scope.
    Also they were only needed for goldens and random emoji, which
    are fairly non-central features.
  - Get `os.environ` fully out of the templates by using the new
    constants we already have in files.helpers.config.environment.
  - Given files.routes.posts cleanup,get rid of shop discount dict.
    It's already a mapping of badge IDs to discounts for badges that
    likely won't continue to exist (if they even do at present).
  - RepeatableTaskRun.exception: use `@property.setter` instead of
    overriding `__setattr__`.

Fix:
  - Welcome message literal contained an indented Markdown code block.
  - Condition to show "View source" button changed to show source to
    logged out. This may well be a desirable change, but it's not
    clearly intended here.

* Fix couple of routing issues

* fix 400 with post body editing

* Add error handler for HTTP 415

* fix router giving wrong arg name to handler

* Use supervisord to monitor memory rather than DIY

Also means we're using pip for getting supervisord now, so we don't rely
on the Debian image base for any packages.

* fix task run elapsed time display

* formatting and removing redundant code

* Fix missing ModAction import

* dates and times fixes

* Having to modify imports here anyway, might as
well change it.

* correct documentation.

* don't use urlunparse

* validators: import sanitize instead of from syntax

* cron: prevent races on task running

RepeatableTask.run_state_enum acts as the mutex on repeatable tasks.
Previously, the list of tasks to run was acquired before individually
locking each task. However, there was a period where the table is both
unlocked and the tasks are in state WAITING between those points.
This could potentially have led to two 'cron' processes each running the
same task simultaneously. Instead, we check for runnability both when
building the preliminary list and when mutexing the task via run state
in the database.

Also:
  - g.db and the cron db object are both instances of `Session`, not
    `scoped_session` because they are obtained from
    `scoped_session.__call__`, which acts as a `Session` factory.
    Propagate this to the type hints.
  - Sort order of task run submissions so /tasks/scheduled_posts/<id>
    "Previous Task Runs" listings are useful.

* Notify followers on post publication

This was old behavior lost in the refactoring of the submit endpoint.

Also fix an AttributeError in `Follow.__repr__` which carried over
from all the repr copypasta.

* Fix image attachment

Any check for `file.content_length` relies on browsers sending
Content-Length headers with the request. It seems that few actually do.

The pre-refactor approach was to check for truthiness, which excludes
both None and the strange empty strings that we seem to get in absence
of a file upload. We return to doing so.

---------

Co-authored-by: TLSM <duolsm@outlook.com>
2023-03-29 16:32:48 -05:00

192 lines
6.5 KiB
Python

import shutil
import time
import urllib.parse
from dataclasses import dataclass
from typing import Optional
from flask import Request, abort, request
from werkzeug.datastructures import FileStorage
import files.helpers.embeds as embeds
import files.helpers.sanitize as sanitize
from files.helpers.config.environment import SITE_FULL, YOUTUBE_KEY
from files.helpers.config.const import (SUBMISSION_BODY_LENGTH_MAXIMUM,
SUBMISSION_TITLE_LENGTH_MAXIMUM,
SUBMISSION_URL_LENGTH_MAXIMUM)
from files.helpers.content import canonicalize_url2
from files.helpers.media import process_image
def guarded_value(val:str, min_len:int, max_len:int) -> str:
'''
Get request value `val` and ensure it is within length constraints
Requires a request context and either aborts early or returns a good value
'''
raw = request.values.get(val, '').strip()
raw = raw.replace('\u200e', '')
if len(raw) < min_len: abort(400, f"Minimum length for {val} is {min_len}")
if len(raw) > max_len: abort(400, f"Maximum length for {val} is {max_len}")
# TODO: it may make sense to do more sanitisation here
return raw
def int_ranged(val:str, min:int, max:int) -> int:
raw:Optional[int] = request.values.get(val, default=None, type=int)
if raw is None or raw < min or raw > max:
abort(400,
f"Invalid input ('{val}' must be an integer and be between {min} and {max})")
return raw
@dataclass(frozen=True, kw_only=True, slots=True)
class ValidatedSubmissionLike:
title: str
title_html: str
body: str
body_raw: Optional[str]
body_html: str
url: Optional[str]
thumburl: Optional[str]
@property
def embed_slow(self) -> Optional[str]:
url:Optional[str] = self.url
url_canonical: Optional[urllib.parse.ParseResult] = self.url_canonical
if not url or not url_canonical: return None
embed:Optional[str] = None
domain:str = url_canonical.netloc
if domain == "twitter.com":
embed = embeds.twitter(url)
if url.startswith('https://youtube.com/watch?v=') and YOUTUBE_KEY:
embed = embeds.youtube(url)
if SITE_FULL in domain and "/post/" in url and "context" not in url:
id = url.split("/post/")[1]
if "/" in id: id = id.split("/")[0]
embed = str(int(id))
return embed if embed and len(embed) <= 1500 else None
@property
def repost_search_url(self) -> Optional[str]:
search_url = self.url_canonical_str
if not search_url: return None
if search_url.endswith('/'):
search_url = search_url[:-1]
return search_url
@property
def url_canonical(self) -> Optional[urllib.parse.ParseResult]:
if not self.url: return None
return canonicalize_url2(self.url, httpsify=True)
@property
def url_canonical_str(self) -> Optional[str]:
url_canonical:Optional[urllib.parse.ParseResult] = self.url_canonical
if not url_canonical: return None
return url_canonical.geturl()
@classmethod
def from_flask_request(cls,
request:Request,
*,
allow_embedding:bool,
allow_media_url_upload:bool=True,
embed_url_file_key:str="file2",
edit:bool=False) -> "ValidatedSubmissionLike":
'''
Creates the basic structure for a submission and validating it. The
normal submission API has a lot of duplicate code and while this is not
a pretty solution, this essentially forces all submission-likes through
a central interface.
:param request: The Flask Request object.
:param allow_embedding: Whether to allow embedding. This should usually
be the value from the environment.
:param allow_media_url_upload: Whether to allow media URL upload. This
should generally be `True` for submission submitting if file uploads
are allowed and `False` in other contexts (such as editing)
:param embed_url_file_key: The key to use for inline file uploads.
:param edit: The value of `edit` to pass to `sanitize`
'''
def _process_media(file:Optional[FileStorage]) -> tuple[bool, Optional[str], Optional[str]]:
if request.headers.get("cf-ipcountry") == "T1": # forbid Tor uploads
return False, None, None
elif not file:
# We actually care about falseyness, not just `is not None` because
# no attachment is <FileStorage: '' ('application/octet-stream')>
# (at least from Firefox 111).
return False, None, None
elif not file.content_type.startswith('image/'):
abort(415, "Image files only")
name = f'/images/{time.time()}'.replace('.','') + '.webp'
file.save(name)
url:Optional[str] = process_image(name)
if not url: return False, None, None
name2 = name.replace('.webp', 'r.webp')
shutil.copyfile(name, name2)
thumburl:Optional[str] = process_image(name2, resize=100)
return True, url, thumburl
def _process_media2(body:str, file2:Optional[list[FileStorage]]) -> tuple[bool, str]:
if request.headers.get("cf-ipcountry") == "T1": # forbid Tor uploads
return False, body
elif not file2: # empty list or None
return False, body
file2 = file2[:4]
if not all(file for file in file2):
# Falseyness check to handle <'' ('application/octet-stream')>
return False, body
for file in file2:
if not file.content_type.startswith('image/'):
abort(415, "Image files only")
name = f'/images/{time.time()}'.replace('.','') + '.webp'
file.save(name)
image = process_image(name)
if allow_embedding:
body += f"\n\n![]({image})"
else:
body += f'\n\n<a href="{image}">{image}</a>'
return True, body
title = guarded_value("title", 1, SUBMISSION_TITLE_LENGTH_MAXIMUM)
title = sanitize.sanitize_raw(title, allow_newlines=False, length_limit=SUBMISSION_TITLE_LENGTH_MAXIMUM)
url = guarded_value("url", 0, SUBMISSION_URL_LENGTH_MAXIMUM)
body_raw = guarded_value("body", 0, SUBMISSION_BODY_LENGTH_MAXIMUM)
body_raw = sanitize.sanitize_raw(body_raw, allow_newlines=True, length_limit=SUBMISSION_BODY_LENGTH_MAXIMUM)
if not url and allow_media_url_upload:
has_file, url, thumburl = _process_media(request.files.get("file"))
else:
has_file = False
thumburl = None
has_file2, body = _process_media2(body_raw, request.files.getlist(embed_url_file_key))
if not body_raw and not url and not has_file and not has_file2:
raise ValueError("Please enter a URL or some text")
title_html = sanitize.filter_emojis_only(title, graceful=True)
if len(title_html) > 1500:
raise ValueError("Rendered title is too big!")
return ValidatedSubmissionLike(
title=title,
title_html=sanitize.filter_emojis_only(title, graceful=True),
body=body,
body_raw=body_raw,
body_html=sanitize.sanitize(body, edit=edit),
url=url,
thumburl=thumburl,
)