post scheduling (#554)

* prepare codebase to create scheduled tasks

there is some prep work involved with this. the scheduler would be happy
 if this work was done. simply, we extract out the `created_utc`
 interface from *everything* that uses it such that we don't have to
 repeat ourselves a bunch. all fun stuff.

next commit is the meat of it.

* cron: basic backend work for scheduler

* avoid ipmort loop

* attempt 2 at fixing import loops

* parathensize because operator precedence

* delete file that came back for some reason.

* does NOPing the oauth apps work?

* import late and undo clients.py change

* stringify column names.

* reorder imports.

* remove task reference

* fix missing mapper object

* make coupled to repeatabletask i guess

* sanitize: fix sanitize imports

* import shadowing crap

* re-shadow shadowed variable

* fix regexes

* use the correct not operator

* readd missing commit

* scheduler: SQLA only allows concrete relations

* implement submission scheduler

* fix import loop with db_session

* get rid of import loop in submission.py and comment.py

* remove import loops by deferring import until function clal

* i give up.

* awful.

* ...

* fix another app import loop

* fix missing import in route handler

* fix import error in wrappers.py

* fix wrapper error

* call update wrapper in the admin_level_required case

* :marseyshrug:

* fix issue with wrapper

* some cleanup and some fixes

* some more cleanup

let's avoid polluting scopes where we can.

* ...

* add SCHEDULED_POSTS permission.

* move const.py into config like the other files.

* style fixes.

* lock table for concurrency improvements

* don't attempt to commit on errors

* Refactor code, create `TaskRunContext`, create python callable task type.

* use import contextlib

* testing stuff i guess.

* handle repeatable tasks properly.

* Attempt another fix at fighting the mapper

* do it right ig

* SQLA1.4 doesn't support nested polymorphism ig

* fix errenous class import

* fix mapper errors

* import app in wrappers.py

* fix import failures and stuff like that.

* embed and import fixes

* minor formatting changes.

* Add running state enum and don't attempt to check for currently running tasks.

* isort

* documentation, style, and commit after each task.

* Add completion time and more docs, rename, etc

* document `CRON_SLEEP_SECONDS` better.

* add note about making LiteralString

* filter out tasks that have been run in the future

* reference RepeatableTask's `__tablename__` directly

* use a master/slave configuration for tasks

the master periodically checks to see if the slave is alive, healthy,
and not taking too many resources, and if applicable kills its
child and restarts it.

only one relation is supported at the moment.

* don't duplicate process unnecessarily

* note impl detail, add comments

* fix imports.

* getting imports to stop being stupid.

* environment notes.

* syntax derp

* *sigh*

* stupid environment stuff

* add UI for submitting a scheduled post

* stupid things i need to fix the user class

* ...

* fix template

* add formkey

* pass v

* add hour and minute field

* bleh

* remove concrete

* the sqlalchemy docs are wrong

* fix me being dumb and not understanding error messages

* missing author attribute for display

* author_name property

* it's a property

* with_polymorphic i think fixes this

* dsfavgnhmjk

* *sigh*

* okay try this again

* try getting rid of the comment section

* include -> extends

* put the div outside of the thing.

* fix user page listings :/

* mhm

* i hate this why isn't this working

* this should fix it

* Fix posts being set as disabled by default

* form UI imrpovements

* label

* <textarea>s should have their closing tag

* UI fixes.

* and fix errenous spinner thing.

* don't abort(415) when browsers send 0 length files for some reason

* UI improvements

* line break.

* CSS :S

* better explainer

* don't show moderation buttons for scheduled posts

* ...

* meh

* add edit form

* include forms on default page.

* fix hour minute selectino.

* improve ui i guess and add api

* Show previous postings on scheduled task page

* create task id

* sqla

* posts -> submissions

* fix OTM relationship

* edit URL

* use common formkey control

* Idk why this isn't working

* Revert "Idk why this isn't working"

This reverts commit 3b93f741df.

* does removing viewonly fix it?

* don't import routes on db migrations

* apparently this has to be a string

* UI improvements redux

* margins and stuff

* add cron to supervisord

* remove stupid duplication

* typo fix

* postgres syntax error

* better lock and error handling

* add relationship between task and runs

* fix some ui stuff

* fix incorrect timestamp comparison

* ...

* Fix logic errors blocking scheduled posts

Two bugs here:
  - RepeatableTask.run_time_last <= now: run_time_last is NULL by
    default. NULL is not greater than, less than, or equal to any
    value. We use NULL to signify a never-run task; check for that
    condition when building the task list.
  - `6 <= weekday <= 0`: there is no integer that is both gte 6 and
    lte 0. This was always false.

* pasthrough worker process STDOUT and STDERR

* Add scheduler to admin panel

* scheduler

* fix listing and admin home

* date formatting ixes

* fix ages

* task user interface

* fix some more import crap i have to deal with

* fix typing

* avoid import loop

* UI fixes

* fix incorrect type

* task type

* Scheduled task UI improvements (add runs and stuff)

* make the width a lil bit smaller

* task runs.

* fix submit page

* add alembic migration

* log on startup

* Fix showing edit button

* Fix logic for `can_edit` (accidentally did `author_id` instead of `id`)

* Broad review pass

Review:
  - Call `invalidate_cache` with `is_html=` explicitly for clarity,
    rather than a bare boolean in the call args.
  - Remove `marseys_const*` and associated stateful const system:
    the implementation was good if we needed them, but TheMotte
    doesn't use emoji, and a greenfield emoji system would likely
    not keep those darned lists floating in thread-local scope.
    Also they were only needed for goldens and random emoji, which
    are fairly non-central features.
  - Get `os.environ` fully out of the templates by using the new
    constants we already have in files.helpers.config.environment.
  - Given files.routes.posts cleanup,get rid of shop discount dict.
    It's already a mapping of badge IDs to discounts for badges that
    likely won't continue to exist (if they even do at present).
  - RepeatableTaskRun.exception: use `@property.setter` instead of
    overriding `__setattr__`.

Fix:
  - Welcome message literal contained an indented Markdown code block.
  - Condition to show "View source" button changed to show source to
    logged out. This may well be a desirable change, but it's not
    clearly intended here.

* Fix couple of routing issues

* fix 400 with post body editing

* Add error handler for HTTP 415

* fix router giving wrong arg name to handler

* Use supervisord to monitor memory rather than DIY

Also means we're using pip for getting supervisord now, so we don't rely
on the Debian image base for any packages.

* fix task run elapsed time display

* formatting and removing redundant code

* Fix missing ModAction import

* dates and times fixes

* Having to modify imports here anyway, might as
well change it.

* correct documentation.

* don't use urlunparse

* validators: import sanitize instead of from syntax

* cron: prevent races on task running

RepeatableTask.run_state_enum acts as the mutex on repeatable tasks.
Previously, the list of tasks to run was acquired before individually
locking each task. However, there was a period where the table is both
unlocked and the tasks are in state WAITING between those points.
This could potentially have led to two 'cron' processes each running the
same task simultaneously. Instead, we check for runnability both when
building the preliminary list and when mutexing the task via run state
in the database.

Also:
  - g.db and the cron db object are both instances of `Session`, not
    `scoped_session` because they are obtained from
    `scoped_session.__call__`, which acts as a `Session` factory.
    Propagate this to the type hints.
  - Sort order of task run submissions so /tasks/scheduled_posts/<id>
    "Previous Task Runs" listings are useful.

* Notify followers on post publication

This was old behavior lost in the refactoring of the submit endpoint.

Also fix an AttributeError in `Follow.__repr__` which carried over
from all the repr copypasta.

* Fix image attachment

Any check for `file.content_length` relies on browsers sending
Content-Length headers with the request. It seems that few actually do.

The pre-refactor approach was to check for truthiness, which excludes
both None and the strange empty strings that we seem to get in absence
of a file upload. We return to doing so.

---------

Co-authored-by: TLSM <duolsm@outlook.com>
This commit is contained in:
justcool393 2023-03-29 14:32:48 -07:00 committed by GitHub
parent 9133d35e6f
commit be952c2771
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
121 changed files with 3284 additions and 1808 deletions

View file

@ -1,7 +1,8 @@
from files.classes import *
from flask import g
from .sanitize import *
from .const import *
from .config.const import *
def create_comment(text_html, autojanny=False):
if autojanny: author_id = AUTOJANNY_ID
@ -19,7 +20,6 @@ def create_comment(text_html, autojanny=False):
return new_comment.id
def send_repeatable_notification(uid, text, autojanny=False):
if autojanny: author_id = AUTOJANNY_ID
else: author_id = NOTIFICATIONS_ID
@ -38,13 +38,11 @@ def send_repeatable_notification(uid, text, autojanny=False):
def send_notification(uid, text, autojanny=False):
cid = notif_comment(text, autojanny)
add_notif(cid, uid)
def notif_comment(text, autojanny=False):
if autojanny:
author_id = AUTOJANNY_ID
alert = True
@ -61,7 +59,6 @@ def notif_comment(text, autojanny=False):
def notif_comment2(p):
search_html = f'%</a> has mentioned you: <a href="/post/{p.id}">%'
existing = g.db.query(Comment.id).filter(Comment.author_id == NOTIFICATIONS_ID, Comment.parent_submission == None, Comment.body_html.like(search_html)).first()
@ -81,11 +78,12 @@ def add_notif(cid, uid):
g.db.add(notif)
def NOTIFY_USERS(text, v):
def NOTIFY_USERS(text, v) -> set[int]:
notify_users = set()
for word, id in NOTIFIED_USERS.items():
if id == 0 or v.id == id: continue
if word in text.lower() and id not in notify_users: notify_users.add(id)
if word in text.lower() and id not in notify_users:
notify_users.add(id)
captured = []
for i in mention_regex.finditer(text):
@ -95,6 +93,29 @@ def NOTIFY_USERS(text, v):
captured.append(i.group(0))
user = get_user(i.group(2), graceful=True)
if user and v.id != user.id and not v.any_block_exists(user): notify_users.add(user.id)
if user and v.id != user.id and not v.any_block_exists(user):
notify_users.add(user.id)
return notify_users
def notify_submission_publish(target: Submission):
# Username mentions in title & body
text: str = f'{target.title} {target.body}'
notify_users = NOTIFY_USERS(text, target.author)
if notify_users:
comment_id = notif_comment2(target)
for user_id in notify_users:
add_notif(comment_id, user_id)
# Submission author followers
if target.author.followers:
message: str = (
f"@{target.author.username} has made a new post: "
f"[{target.title}]({target.shortlink})"
)
if target.sub:
message += f" in <a href='/h/{target.sub}'>/h/{target.sub}"
cid = notif_comment(message, autojanny=True)
for follow in target.author.followers:
add_notif(cid, follow.user_id)

25
files/helpers/caching.py Normal file
View file

@ -0,0 +1,25 @@
from files.__main__ import cache
import files.helpers.listing as listing
# i hate this.
#
# we should probably come up with a better way for this in the future.
# flask_caching is kinda weird in that it requires you to use a function
# reference to deleted a memoized function, which basically means fitting your
# code to flask_caching's worldview. it's very much not ideal and ideally would
# be less coupled in the future.
#
# the question is whether it's worth it.
def invalidate_cache(*, frontlist=False, userpagelisting=False, changeloglist=False):
'''
Invalidates the caches for the front page listing, user page listings,
and optionally, the changelog listing.
:param frontlist: Whether to invalidate the `frontlist` cache.
:param userpagelisting: Whether to invalidate the `userpagelisting` cache.
:param changeloglist: Whether to invalidate the `changeloglist` cache.
'''
if frontlist: cache.delete_memoized(listing.frontlist)
if userpagelisting: cache.delete_memoized(listing.userpagelisting)
if changeloglist: cache.delete_memoized(listing.changeloglist)

View file

@ -1,15 +1,18 @@
from sys import stdout
from typing import Optional
import gevent
from flask import g, request
from pusher_push_notifications import PushNotifications
from sqlalchemy import select, update
from sqlalchemy.orm import Query, aliased
from sqlalchemy.sql.expression import alias, func, text
from files.classes import Comment, Notification, Subscription, User
from files.helpers.alerts import NOTIFY_USERS
from files.helpers.const import PUSHER_ID, PUSHER_KEY, SITE_ID, SITE_FULL
from files.helpers.assetcache import assetcache_path
from flask import g
from sqlalchemy import select, update
from sqlalchemy.sql.expression import func, text, alias
from sqlalchemy.orm import Query, aliased
from sys import stdout
import gevent
from typing import Optional
from files.helpers.config.environment import (PUSHER_ID, PUSHER_KEY, SITE_FULL,
SITE_ID)
if PUSHER_ID != 'blahblahblah':
beams_client = PushNotifications(instance_id=PUSHER_ID, secret_key=PUSHER_KEY)

View file

@ -1,19 +1,47 @@
import re
import sys
from copy import deepcopy
from os import environ
from enum import IntEnum
from typing import Final
from flask import request
from files.__main__ import db_session
from files.classes.sub import Sub
from files.classes.marsey import Marsey
SITE = environ.get("DOMAIN", '').strip()
SITE_ID = environ.get("SITE_ID", '').strip()
SITE_TITLE = environ.get("SITE_TITLE", '').strip()
SCHEME = environ.get('SCHEME', 'http' if 'localhost' in SITE else 'https')
SITE_FULL = SCHEME + '://' + SITE
class Service(IntEnum):
'''
An enumeration of services provided by this application
'''
THEMOTTE = 0
'''
TheMotte web application. Handles most routes and tasks performed,
including all non-chat web requests.
'''
CRON = 1
'''
Runs tasks periodicially on a set schedule
'''
CHAT = 2
'''
Chat application.
'''
MIGRATION = 3
'''
Migration mode. Used for performing database migrations
'''
@classmethod
def from_argv(cls):
if "db" in sys.argv:
return cls.MIGRATION
if "cron" in sys.argv:
return cls.CRON
if "load_chat" in sys.argv:
return cls.CHAT
return cls.THEMOTTE
@property
def enable_services(self) -> bool:
return self not in {self.CRON, self.MIGRATION}
CC = "COUNTRY CLUB"
CC_TITLE = CC.title()
@ -32,7 +60,6 @@ BASEDBOT_ID = 0
GIFT_NOTIF_ID = 9
OWNER_ID = 9
BUG_THREAD = 0
WELCOME_MSG = f"Welcome to {SITE_TITLE}! Please read [the rules](/rules) first. Then [read some of our current conversations](/) and feel free to comment or post!\n\nWe encourage people to comment even if they aren't sure they fit in; as long as your comment follows [community rules](/rules), we are happy to have posters from all backgrounds, education levels, and specialties."
ROLES={}
LEADERBOARD_LIMIT: Final[int] = 25
@ -54,11 +81,16 @@ SORTS_POSTS = {
SORTS_POSTS.update(SORTS_COMMON)
SORTS_COMMENTS = SORTS_COMMON
PUSHER_ID = environ.get("PUSHER_ID", "").strip()
PUSHER_KEY = environ.get("PUSHER_KEY", "").strip()
DEFAULT_COLOR = environ.get("DEFAULT_COLOR", "fff").strip()
COLORS = {'ff66ac','805ad5','62ca56','38a169','80ffff','2a96f3','eb4963','ff0000','f39731','30409f','3e98a7','e4432d','7b9ae4','ec72de','7f8fa6', 'f8db58','8cdbe6', DEFAULT_COLOR}
MAX_CONTENT_LENGTH = 16 * 1024 * 1024
SESSION_COOKIE_SAMESITE = "Lax"
PERMANENT_SESSION_LIFETIME = 60 * 60 * 24 * 365
DEFAULT_THEME = "TheMotte"
FORCE_HTTPS = 1
COLORS = {'ff66ac','805ad5','62ca56','38a169','80ffff','2a96f3','eb4963','ff0000','f39731','30409f','3e98a7','e4432d','7b9ae4','ec72de','7f8fa6', 'f8db58','8cdbe6', 'fff'}
SUBMISSION_FLAIR_LENGTH_MAXIMUM: Final[int] = 350
SUBMISSION_TITLE_LENGTH_MAXIMUM: Final[int] = 500
SUBMISSION_URL_LENGTH_MAXIMUM: Final[int] = 2048
SUBMISSION_BODY_LENGTH_MAXIMUM: Final[int] = 20000
COMMENT_BODY_LENGTH_MAXIMUM: Final[int] = 10000
MESSAGE_BODY_LENGTH_MAXIMUM: Final[int] = 10000
@ -72,6 +104,7 @@ ERROR_MESSAGES = {
405: "Something went wrong and it's probably my fault. If you can do it reliably, or it's causing problems for you, please report it!",
409: "There's a conflict between what you're trying to do and what you or someone else has done and because of that you can't do what you're trying to do.",
413: "Max file size is 8 MB",
415: "That file type isn't allowed to be uploaded here",
422: "Something is wrong about your request. If you keep getting this unexpectedly, please report it!",
429: "Are you hammering the site? Stop that, yo.",
500: "Something went wrong and it's probably my fault. If you can do it reliably, or it's causing problems for you, please report it!",
@ -98,6 +131,7 @@ WERKZEUG_ERROR_DESCRIPTIONS = {
}
IMAGE_FORMATS = ['png','gif','jpg','jpeg','webp']
IMAGE_URL_ENDINGS = IMAGE_FORMATS + ['.webp', '.jpg', '.png', '.gif', '.jpeg', '?maxwidth=9999', '&fidelity=high']
VIDEO_FORMATS = ['mp4','webm','mov','avi','mkv','flv','m4v','3gp']
AUDIO_FORMATS = ['mp3','wav','ogg','aac','m4a','flac']
NO_TITLE_EXTENSIONS = IMAGE_FORMATS + VIDEO_FORMATS + AUDIO_FORMATS
@ -108,11 +142,15 @@ FEATURES = {
PERMS = {
"DEBUG_LOGIN_TO_OTHERS": 3,
'PERFORMANCE_KILL_PROCESS': 3,
'PERFORMANCE_SCALE_UP_DOWN': 3,
'PERFORMANCE_RELOAD': 3,
'PERFORMANCE_STATS': 3,
"PERFORMANCE_KILL_PROCESS": 3,
"PERFORMANCE_SCALE_UP_DOWN": 3,
"PERFORMANCE_RELOAD": 3,
"PERFORMANCE_STATS": 3,
"POST_COMMENT_MODERATION": 2,
"POST_EDITING": 3,
"SCHEDULER": 2,
"SCHEDULER_POSTS": 2,
"SCHEDULER_TASK_TRACEBACK": 3,
"USER_SHADOWBAN": 2,
}
@ -137,82 +175,6 @@ NOTIFIED_USERS = {
patron = 'Patron'
discounts = {
69: 0.02,
70: 0.04,
71: 0.06,
72: 0.08,
73: 0.10,
}
CF_KEY = environ.get("CF_KEY", "").strip()
CF_ZONE = environ.get("CF_ZONE", "").strip()
CF_HEADERS = {"Authorization": f"Bearer {CF_KEY}", "Content-Type": "application/json"}
dues = int(environ.get("DUES").strip())
christian_emojis = (':#marseyjesus:',':#marseyimmaculate:',':#marseymothermary:',':#marseyfatherjoseph:',':#gigachadorthodox:',':#marseyorthodox:',':#marseyorthodoxpat:')
db = db_session()
marseys_const = [x[0] for x in db.query(Marsey.name).filter(Marsey.name!='chudsey').all()]
marseys_const2 = marseys_const + ['chudsey','a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','0','1','2','3','4','5','6','7','8','9','exclamationpoint','period','questionmark']
db.close()
valid_username_chars = 'a-zA-Z0-9_\\-'
valid_username_regex = re.compile("^[a-zA-Z0-9_\\-]{3,25}$", flags=re.A)
mention_regex = re.compile('(^|\\s|<p>)@(([a-zA-Z0-9_\\-]){1,25})', flags=re.A)
mention_regex2 = re.compile('<p>@(([a-zA-Z0-9_\\-]){1,25})', flags=re.A)
valid_password_regex = re.compile("^.{8,100}$", flags=re.A)
marsey_regex = re.compile("[a-z0-9]{1,30}", flags=re.A)
tags_regex = re.compile("[a-z0-9: ]{1,200}", flags=re.A)
valid_sub_regex = re.compile("^[a-zA-Z0-9_\\-]{3,20}$", flags=re.A)
query_regex = re.compile("(\\w+):(\\S+)", flags=re.A)
title_regex = re.compile("[^\\w ]", flags=re.A)
based_regex = re.compile("based and (.{1,20}?)(-| )pilled", flags=re.I|re.A)
controversial_regex = re.compile('["> ](https:\\/\\/old\\.reddit\\.com/r/[a-zA-Z0-9_]{3,20}\\/comments\\/[\\w\\-.#&/=\\?@%+]{5,250})["< ]', flags=re.A)
fishylinks_regex = re.compile("https?://\\S+", flags=re.A)
spoiler_regex = re.compile('''\\|\\|(.+)\\|\\|''', flags=re.A)
reddit_regex = re.compile('(^|\\s|<p>)\\/?((r|u)\\/(\\w|-){3,25})(?![^<]*<\\/(code|pre|a)>)', flags=re.A)
sub_regex = re.compile('(^|\\s|<p>)\\/?(h\\/(\\w|-){3,25})', flags=re.A)
# Bytes that shouldn't be allowed in user-submitted text
# U+200E is LTR toggle, U+200F is RTL toggle, U+200B and U+FEFF are Zero-Width Spaces,
# and U+1242A is a massive and terrifying cuneiform numeral
unwanted_bytes_regex = re.compile("\u200e|\u200f|\u200b|\ufeff|\U0001242a")
whitespace_regex = re.compile('\\s+')
strikethrough_regex = re.compile('''~{1,2}([^~]+)~{1,2}''', flags=re.A)
mute_regex = re.compile("/mute @([a-z0-9_\\-]{3,25}) ([0-9])+", flags=re.A)
emoji_regex = re.compile(f"[^a]>\\s*(:[!#@]{{0,3}}[{valid_username_chars}]+:\\s*)+<\\/", flags=re.A)
emoji_regex2 = re.compile(f"(?<!\"):([!#@{valid_username_chars}]{{1,31}}?):", flags=re.A)
emoji_regex3 = re.compile(f"(?<!\"):([!@{valid_username_chars}]{{1,31}}?):", flags=re.A)
snappy_url_regex = re.compile('<a href=\\"(https?:\\/\\/[a-z]{1,20}\\.[\\w:~,()\\-.#&\\/=?@%;+]{5,250})\\" rel=\\"nofollow noopener noreferrer\\" target=\\"_blank\\">([\\w:~,()\\-.#&\\/=?@%;+]{5,250})<\\/a>', flags=re.A)
# Technically this allows stuff that is not a valid email address, but realistically
# we care "does this email go to the correct person" rather than "is this email
# address syntactically valid", so if we care we should be sending a confirmation
# link, and otherwise should be pretty liberal in what we accept here.
email_regex = re.compile('[^@]+@[^@]+\\.[^@]+', flags=re.A)
utm_regex = re.compile('utm_[a-z]+=[a-z0-9_]+&', flags=re.A)
utm_regex2 = re.compile('[?&]utm_[a-z]+=[a-z0-9_]+', flags=re.A)
YOUTUBE_KEY = environ.get("YOUTUBE_KEY", "").strip()
proxies = {}
approved_embed_hosts = [
@ -276,6 +238,6 @@ video_sub_regex = re.compile(f'(<p>[^<]*)(https:\\/\\/([a-z0-9-]+\\.)*({hosts})\
procoins_li = (0,2500,5000,10000,25000,50000,125000,250000)
from files.helpers.regex import *
from files.helpers.config.regex import *
def make_name(*args, **kwargs): return request.base_url

View file

@ -0,0 +1,109 @@
'''
Environment data. Please don't use `files.helpers.config.const` for things that
aren't constants. If it's an environment configuration, it should go in here.
'''
from os import environ
from files.helpers.strings import bool_from_string
SITE = environ.get("DOMAIN", '').strip()
SITE_ID = environ.get("SITE_ID").strip()
SITE_TITLE = environ.get("SITE_TITLE").strip()
SCHEME = environ.get('SCHEME', 'http' if 'localhost' in SITE else 'https')
if "localhost" in SITE:
SITE_FULL = 'http://' + SITE
else:
SITE_FULL = 'https://' + SITE
WELCOME_MSG = (
f"Welcome to {SITE_TITLE}! Please read [the rules](/rules) first. "
"Then [read some of our current conversations](/) and feel free to comment "
"or post!\n"
"We encourage people to comment even if they aren't sure they fit in; as "
"long as your comment follows [community rules](/rules), we are happy to "
"have posters from all backgrounds, education levels, and specialties."
)
SQLALCHEMY_TRACK_MODIFICATIONS = False
DATABASE_URL = environ.get("DATABASE_URL", "postgresql://postgres@localhost:5432")
SECRET_KEY = environ.get('MASTER_KEY', '')
SERVER_NAME = environ.get("DOMAIN").strip()
SESSION_COOKIE_SECURE = "localhost" not in SERVER_NAME
DEFAULT_COLOR = environ.get("DEFAULT_COLOR", "fff").strip()
DEFAULT_TIME_FILTER = environ.get("DEFAULT_TIME_FILTER", "all").strip()
HCAPTCHA_SITEKEY = environ.get("HCAPTCHA_SITEKEY","").strip()
HCAPTCHA_SECRET = environ.get("HCAPTCHA_SECRET","").strip()
if not SECRET_KEY:
raise Exception("Secret key not set!")
# spam filter
SPAM_SIMILARITY_THRESHOLD = float(environ.get("SPAM_SIMILARITY_THRESHOLD", 0.5))
''' Spam filter similarity threshold (posts) '''
SPAM_URL_SIMILARITY_THRESHOLD = float(environ.get("SPAM_URL_SIMILARITY_THRESHOLD", 0.1))
''' Spam filter similarity threshold for URLs (posts) '''
SPAM_SIMILAR_COUNT_THRESHOLD = int(environ.get("SPAM_SIMILAR_COUNT_THRESHOLD", 10))
''' Spam filter similarity count (posts) '''
COMMENT_SPAM_SIMILAR_THRESHOLD = float(environ.get("COMMENT_SPAM_SIMILAR_THRESHOLD", 0.5))
''' Spam filter similarity threshold (comments)'''
COMMENT_SPAM_COUNT_THRESHOLD = int(environ.get("COMMENT_SPAM_COUNT_THRESHOLD", 10))
''' Spam filter similarity count (comments) '''
CACHE_REDIS_URL = environ.get("REDIS_URL", "redis://localhost")
MAIL_SERVER = environ.get("MAIL_SERVER", "").strip()
MAIL_PORT = 587
MAIL_USE_TLS = True
MAIL_USERNAME = environ.get("MAIL_USERNAME", "").strip()
MAIL_PASSWORD = environ.get("MAIL_PASSWORD", "").strip()
DESCRIPTION = environ.get("DESCRIPTION", "DESCRIPTION GOES HERE").strip()
SQLALCHEMY_DATABASE_URI = DATABASE_URL
MENTION_LIMIT = int(environ.get('MENTION_LIMIT', 100))
''' Maximum amount of username mentions '''
MULTIMEDIA_EMBEDDING_ENABLED = bool_from_string(environ.get('MULTIMEDIA_EMBEDDING_ENABLED', False))
'''
Whether multimedia will be embedded into a page. Note that this does not
affect posts or comments retroactively.
'''
RESULTS_PER_PAGE_COMMENTS = int(environ.get('RESULTS_PER_PAGE_COMMENTS', 50))
SCORE_HIDING_TIME_HOURS = int(environ.get('SCORE_HIDING_TIME_HOURS', 0))
ENABLE_SERVICES = bool_from_string(environ.get('ENABLE_SERVICES', False))
'''
Whether to start up deferred tasks. Usually `True` when running as an app and
`False` when running as a script (for example to perform migrations).
See https://github.com/themotte/rDrama/pull/427 for more info.
'''
DBG_VOLUNTEER_PERMISSIVE = bool_from_string(environ.get('DBG_VOLUNTEER_PERMISSIVE', False))
VOLUNTEER_JANITOR_ENABLE = bool_from_string(environ.get('VOLUNTEER_JANITOR_ENABLE', True))
RATE_LIMITER_ENABLED = not bool_from_string(environ.get('DBG_LIMITER_DISABLED', False))
ENABLE_DOWNVOTES = not bool_from_string(environ.get('DISABLE_DOWNVOTES', False))
CARD_VIEW = bool_from_string(environ.get("CARD_VIEW", True))
FINGERPRINT_TOKEN = environ.get("FP", None)
# other stuff from const.py that aren't constants
CLUB_TRUESCORE_MINIMUM = int(environ.get("DUES").strip())
IMGUR_KEY = environ.get("IMGUR_KEY", "").strip()
PUSHER_ID = environ.get("PUSHER_ID", "").strip()
PUSHER_KEY = environ.get("PUSHER_KEY", "").strip()
YOUTUBE_KEY = environ.get("YOUTUBE_KEY", "").strip()
CF_KEY = environ.get("CF_KEY", "").strip()
CF_ZONE = environ.get("CF_ZONE", "").strip()
CF_HEADERS = {
"Authorization": f"Bearer {CF_KEY}",
"Content-Type": "application/json"
}

View file

@ -0,0 +1,80 @@
import re
# usernames
valid_username_chars = 'a-zA-Z0-9_\\-'
valid_username_regex = re.compile("^[a-zA-Z0-9_\\-]{3,25}$", flags=re.A)
mention_regex = re.compile('(^|\\s|<p>)@(([a-zA-Z0-9_\\-]){1,25})', flags=re.A)
mention_regex2 = re.compile('<p>@(([a-zA-Z0-9_\\-]){1,25})', flags=re.A)
valid_password_regex = re.compile("^.{8,100}$", flags=re.A)
marsey_regex = re.compile("[a-z0-9]{1,30}", flags=re.A)
tags_regex = re.compile("[a-z0-9: ]{1,200}", flags=re.A)
valid_sub_regex = re.compile("^[a-zA-Z0-9_\\-]{3,20}$", flags=re.A)
query_regex = re.compile("(\\w+):(\\S+)", flags=re.A)
title_regex = re.compile("[^\\w ]", flags=re.A)
based_regex = re.compile("based and (.{1,20}?)(-| )pilled", flags=re.I|re.A)
controversial_regex = re.compile('["> ](https:\\/\\/old\\.reddit\\.com/r/[a-zA-Z0-9_]{3,20}\\/comments\\/[\\w\\-.#&/=\\?@%+]{5,250})["< ]', flags=re.A)
fishylinks_regex = re.compile("https?://\\S+", flags=re.A)
spoiler_regex = re.compile('''\\|\\|(.+)\\|\\|''', flags=re.A)
reddit_regex = re.compile('(^|\\s|<p>)\\/?((r|u)\\/(\\w|-){3,25})(?![^<]*<\\/(code|pre|a)>)', flags=re.A)
sub_regex = re.compile('(^|\\s|<p>)\\/?(h\\/(\\w|-){3,25})', flags=re.A)
unwanted_bytes_regex = re.compile("\u200e|\u200f|\u200b|\ufeff|\U0001242a")
'''
Bytes that shouldn't be allowed in user-submitted text
U+200E is LTR toggle, U+200F is RTL toggle, U+200B and U+FEFF are Zero-Width
Spaces, and U+1242A is a massive and terrifying cuneiform numeral
'''
whitespace_regex = re.compile('\\s+')
strikethrough_regex = re.compile('''~{1,2}([^~]+)~{1,2}''', flags=re.A)
mute_regex = re.compile("/mute @([a-z0-9_\\-]{3,25}) ([0-9])+", flags=re.A)
emoji_regex = re.compile(f"[^a]>\\s*(:[!#@]{{0,3}}[{valid_username_chars}]+:\\s*)+<\\/", flags=re.A)
emoji_regex2 = re.compile(f"(?<!\"):([!#@{valid_username_chars}]{{1,31}}?):", flags=re.A)
emoji_regex3 = re.compile(f"(?<!\"):([!@{valid_username_chars}]{{1,31}}?):", flags=re.A)
snappy_url_regex = re.compile('<a href=\\"(https?:\\/\\/[a-z]{1,20}\\.[\\w:~,()\\-.#&\\/=?@%;+]{5,250})\\" rel=\\"nofollow noopener noreferrer\\" target=\\"_blank\\">([\\w:~,()\\-.#&\\/=?@%;+]{5,250})<\\/a>', flags=re.A)
email_regex = re.compile('[^@]+@[^@]+\\.[^@]+', flags=re.A)
'''
Regex to use for email addresses.
.. note::
Technically this allows stuff that is not a valid email address, but
realistically we care "does this email go to the correct person" rather
than "is this email address syntactically valid", so if we care we should
be sending a confirmation link, and otherwise should be pretty liberal in
what we accept here.
'''
utm_regex = re.compile('utm_[a-z]+=[a-z0-9_]+&', flags=re.A)
utm_regex2 = re.compile('[?&]utm_[a-z]+=[a-z0-9_]+', flags=re.A)
# urls
youtube_regex = re.compile('(<p>[^<]*)(https:\\/\\/youtube\\.com\\/watch\\?v\\=([a-z0-9-_]{5,20})[\\w\\-.#&/=\\?@%+]*)', flags=re.I|re.A)
yt_id_regex = re.compile('[a-z0-9-_]{5,20}', flags=re.I|re.A)
image_regex = re.compile("(^|\\s)(https:\\/\\/[\\w\\-.#&/=\\?@%;+]{5,250}(\\.png|\\.jpg|\\.jpeg|\\.gif|\\.webp|maxwidth=9999|fidelity=high))($|\\s)", flags=re.I|re.A)
linefeeds_regex = re.compile("([^\\n])\\n([^\\n])", flags=re.A)
html_title_regex = re.compile(r"<title>(.{1,200})</title>", flags=re.I)
css_url_regex = re.compile(r'url\(\s*[\'"]?(.*?)[\'"]?\s*\)', flags=re.I|re.A)

View file

@ -1,16 +1,92 @@
from typing import Any, TYPE_CHECKING, Optional, Union
from __future__ import annotations
from files.helpers.const import PERMS
import random
import urllib.parse
from typing import TYPE_CHECKING, Optional, Union
from sqlalchemy.orm import Session
from files.helpers.config.const import PERMS
if TYPE_CHECKING:
from files.classes import Submission, Comment, User
else:
Submission = Any
Comment = Any
User = Any
from files.classes import Comment, Submission, User
Submittable = Union[Submission, Comment]
def moderated_body(target:Union[Submission, Comment],
v:Optional[User]) -> Optional[str]:
def _replace_urls(url:str) -> str:
def _replace_extensions(url:str, exts:list[str]) -> str:
for ext in exts:
url = url.replace(f'.{ext}', '.webp')
return url
for rd in ("://reddit.com", "://new.reddit.com", "://www.reddit.com", "://redd.it", "://libredd.it", "://teddit.net"):
url = url.replace(rd, "://old.reddit.com")
url = url.replace("nitter.net", "twitter.com") \
.replace("old.reddit.com/gallery", "reddit.com/gallery") \
.replace("https://youtu.be/", "https://youtube.com/watch?v=") \
.replace("https://music.youtube.com/watch?v=", "https://youtube.com/watch?v=") \
.replace("https://streamable.com/", "https://streamable.com/e/") \
.replace("https://youtube.com/shorts/", "https://youtube.com/watch?v=") \
.replace("https://mobile.twitter", "https://twitter") \
.replace("https://m.facebook", "https://facebook") \
.replace("m.wikipedia.org", "wikipedia.org") \
.replace("https://m.youtube", "https://youtube") \
.replace("https://www.youtube", "https://youtube") \
.replace("https://www.twitter", "https://twitter") \
.replace("https://www.instagram", "https://instagram") \
.replace("https://www.tiktok", "https://tiktok")
if "/i.imgur.com/" in url:
url = _replace_extensions(url, ['png', 'jpg', 'jpeg'])
elif "/media.giphy.com/" in url or "/c.tenor.com/" in url:
url = _replace_extensions(url, ['gif'])
elif "/i.ibb.com/" in url:
url = _replace_extensions(url, ['png', 'jpg', 'jpeg', 'gif'])
if url.startswith("https://streamable.com/") and not url.startswith("https://streamable.com/e/"):
url = url.replace("https://streamable.com/", "https://streamable.com/e/")
return url
def _httpsify_and_remove_tracking_urls(url:str) -> urllib.parse.ParseResult:
parsed_url = urllib.parse.urlparse(url)
domain = parsed_url.netloc
is_reddit_twitter_instagram_tiktok:bool = domain in \
('old.reddit.com','twitter.com','instagram.com','tiktok.com')
if is_reddit_twitter_instagram_tiktok:
query = ""
else:
qd = urllib.parse.parse_qs(parsed_url.query)
filtered = {k: val for k, val in qd.items() if not k.startswith('utm_') and not k.startswith('ref_')}
query = urllib.parse.urlencode(filtered, doseq=True)
new_url = urllib.parse.ParseResult(
scheme="https",
netloc=parsed_url.netloc,
path=parsed_url.path,
params=parsed_url.params,
query=query,
fragment=parsed_url.fragment,
)
return new_url
def canonicalize_url(url:str) -> str:
return _replace_urls(url)
def canonicalize_url2(url:str, *, httpsify:bool=False) -> urllib.parse.ParseResult:
url_parsed = _replace_urls(url)
if httpsify:
url_parsed = _httpsify_and_remove_tracking_urls(url)
else:
url_parsed = urllib.parse.urlparse(url)
return url_parsed
def moderated_body(target:Submittable, v:Optional[User]) -> Optional[str]:
if v and (v.admin_level >= PERMS['POST_COMMENT_MODERATION'] \
or v.id == target.author_id):
return None
@ -18,3 +94,39 @@ def moderated_body(target:Union[Submission, Comment],
if target.is_banned or target.filter_state == 'removed': return 'Removed'
if target.filter_state == 'filtered': return 'Filtered'
return None
def body_displayed(target:Submittable, v:Optional[User], is_html:bool) -> str:
moderated:Optional[str] = moderated_body(target, v)
if moderated: return moderated
body = target.body_html if is_html else target.body
if not body: return ""
if not v: return body
body = body.replace("old.reddit.com", v.reddit)
if v.nitter and '/i/' not in body and '/retweets' not in body:
body = body.replace("www.twitter.com", "nitter.net").replace("twitter.com", "nitter.net")
return body
def execute_shadowbanned_fake_votes(db:Session, target:Submittable, v:Optional[User]):
if not target or not v: return
if not v.shadowbanned: return
if v.id != target.author_id: return
if not (86400 > target.age_seconds > 20): return
ti = max(target.age_seconds // 60, 1)
maxupvotes = min(ti, 11)
rand = random.randint(0, maxupvotes)
if target.upvotes >= rand: return
amount = random.randint(0, 3)
if amount != 1: return
if hasattr(target, 'views'):
target.views += amount*random.randint(3, 5)
target.upvotes += amount
db.add(target)
db.commit()

View file

@ -5,7 +5,7 @@ from typing import Any, TYPE_CHECKING
from sqlalchemy.sql import func
from sqlalchemy.orm import Query
from files.helpers.const import *
from files.helpers.config.const import *
if TYPE_CHECKING:
from files.classes.comment import Comment

59
files/helpers/embeds.py Normal file
View file

@ -0,0 +1,59 @@
'''
Assists with adding embeds to submissions.
This module is not intended to be imported using the `from X import Y` syntax.
Example usage:
```py
import files.helpers.embeds as embeds
embeds.youtube("https://www.youtube.com/watch?v=dQw4w9WgXcQ")
```
'''
import urllib.parse
from typing import Optional
import requests
from files.helpers.config.environment import YOUTUBE_KEY
from files.helpers.config.regex import yt_id_regex
__all__ = ('twitter', 'youtube',)
def twitter(url:str) -> Optional[str]:
try:
return requests.get(
url="https://publish.twitter.com/oembed",
params={"url":url, "omit_script":"t"}, timeout=5).json()["html"]
except:
return None
def youtube(url:str) -> Optional[str]:
if not YOUTUBE_KEY: return None
url = urllib.parse.unquote(url).replace('?t', '&t')
yt_id = url.split('https://youtube.com/watch?v=')[1].split('&')[0].split('%')[0]
if not yt_id_regex.fullmatch(yt_id): return None
try:
req = requests.get(
url=f"https://www.googleapis.com/youtube/v3/videos?id={yt_id}&key={YOUTUBE_KEY}&part=contentDetails",
timeout=5).json()
except:
return None
if not req.get('items'): return None
params = urllib.parse.parse_qs(urllib.parse.urlparse(url).query)
t = params.get('t', params.get('start', [0]))[0]
if isinstance(t, str): t = t.replace('s','')
embed = f'<lite-youtube videoid="{yt_id}" params="autoplay=1&modestbranding=1'
if t:
try:
embed += f'&start={int(t)}'
except:
pass
embed += '"></lite-youtube>'
return embed

View file

@ -1,12 +1,14 @@
from __future__ import annotations
from collections import defaultdict
from typing import Callable, Iterable, List, Optional, Type, Union
from flask import g
from flask import abort, g
from sqlalchemy import and_, or_, func
from sqlalchemy.orm import Query, scoped_session, selectinload
from files.classes import *
from files.helpers.const import AUTOJANNY_ID
from files.helpers.config.const import AUTOJANNY_ID
from files.helpers.contentsorting import sort_comment_results
@ -78,20 +80,22 @@ def get_account(
id:Union[str,int],
v:Optional[User]=None,
graceful:bool=False,
include_blocks:bool=False) -> Optional[User]:
include_blocks:bool=False,
db:Optional[scoped_session]=None) -> Optional[User]:
try:
id = int(id)
except:
if graceful: return None
abort(404)
user = g.db.get(User, id)
if not db: db = g.db
user = db.get(User, id)
if not user:
if graceful: return None
abort(404)
if v and include_blocks:
user = _add_block_props(user, v)
user = _add_block_props(user, v, db)
return user
@ -387,8 +391,10 @@ def get_domain(s:str) -> Optional[BannedDomain]:
def _add_block_props(
target:Union[Submission, Comment, User],
v:Optional[User]):
v:Optional[User],
db:Optional[scoped_session]=None):
if not v: return target
if not db: db = g.db
id = None
if any(isinstance(target, cls) for cls in [Submission, Comment]):
@ -408,7 +414,7 @@ def _add_block_props(
target.is_blocked = False
return target
block = g.db.query(UserBlock).filter(
block = db.query(UserBlock).filter(
or_(
and_(
UserBlock.user_id == v.id,

View file

@ -1,13 +1,19 @@
from os import listdir, environ
import random
import time
from os import listdir
from jinja2 import pass_context
from files.__main__ import app
from .get import *
from .const import *
from files.classes.cron.tasks import ScheduledTaskType
from files.helpers.assetcache import assetcache_path
from files.helpers.config.environment import (CARD_VIEW, DEFAULT_COLOR,
ENABLE_DOWNVOTES, FINGERPRINT_TOKEN, PUSHER_ID, SITE, SITE_FULL, SITE_ID,
SITE_TITLE)
from files.helpers.time import format_age, format_datetime
from .config.const import *
from .get import *
@app.template_filter("computer_size")
def computer_size(size_bytes:int) -> str:
@ -31,36 +37,15 @@ def post_embed(id, v):
if p: return render_template("submission_listing.html", listing=[p], v=v)
return ''
@app.template_filter("timestamp")
def timestamp(timestamp):
if not timestamp: return ''
return format_datetime(timestamp)
age = int(time.time()) - timestamp
if age < 60:
return "just now"
elif age < 3600:
minutes = int(age / 60)
return f"{minutes}m ago"
elif age < 86400:
hours = int(age / 3600)
return f"{hours}hr ago"
elif age < 2678400:
days = int(age / 86400)
return f"{days}d ago"
now = time.gmtime()
ctd = time.gmtime(timestamp)
months = now.tm_mon - ctd.tm_mon + 12 * (now.tm_year - ctd.tm_year)
if now.tm_mday < ctd.tm_mday:
months -= 1
if months < 12:
return f"{months}mo ago"
else:
years = int(months / 12)
return f"{years}yr ago"
@app.template_filter("agestamp")
def agestamp(timestamp):
if not timestamp: return ''
return format_age(timestamp)
@app.template_filter("asset")
@ -71,7 +56,6 @@ def template_asset(asset_path):
@app.context_processor
def inject_constants():
return {
"environ":environ,
"SITE":SITE,
"SITE_ID":SITE_ID,
"SITE_TITLE":SITE_TITLE,
@ -84,8 +68,12 @@ def inject_constants():
"CC_TITLE":CC_TITLE,
"listdir":listdir,
"config":app.config.get,
"ENABLE_DOWNVOTES": ENABLE_DOWNVOTES,
"CARD_VIEW": CARD_VIEW,
"FINGERPRINT_TOKEN": FINGERPRINT_TOKEN,
"COMMENT_BODY_LENGTH_MAXIMUM":COMMENT_BODY_LENGTH_MAXIMUM,
"SUBMISSION_BODY_LENGTH_MAXIMUM":SUBMISSION_BODY_LENGTH_MAXIMUM,
"SUBMISSION_TITLE_LENGTH_MAXIMUM":SUBMISSION_TITLE_LENGTH_MAXIMUM,
"DEFAULT_COLOR":DEFAULT_COLOR,
"COLORS":COLORS,
"THEMES":THEMES,
@ -95,6 +83,7 @@ def inject_constants():
"SORTS_COMMENTS":SORTS_COMMENTS,
"SORTS_POSTS":SORTS_POSTS,
"CSS_LENGTH_MAXIMUM":CSS_LENGTH_MAXIMUM,
"ScheduledTaskType":ScheduledTaskType,
}

View file

@ -1,18 +1,13 @@
# Prevents certain properties from having to be recomputed each time they
# are referenced
def lazy(f):
'''
Prevents certain properties from having to be recomputed each time they are
referenced
'''
def wrapper(*args, **kwargs):
o = args[0]
if "_lazy" not in o.__dict__: o.__dict__["_lazy"] = {}
if f.__name__ not in o.__dict__["_lazy"]: o.__dict__["_lazy"][f.__name__] = f(*args, **kwargs)
if f.__name__ not in o.__dict__["_lazy"]:
o.__dict__["_lazy"][f.__name__] = f(*args, **kwargs)
return o.__dict__["_lazy"][f.__name__]
wrapper.__name__ = f.__name__
return wrapper

140
files/helpers/listing.py Normal file
View file

@ -0,0 +1,140 @@
"""
Module for listings.
"""
import time
from typing import Final
from flask import g
from sqlalchemy.sql.expression import not_
from files.__main__ import cache
from files.classes.submission import Submission
from files.classes.user import User
from files.classes.votes import Vote
from files.helpers.contentsorting import apply_time_filter, sort_objects
from files.helpers.strings import sql_ilike_clean
FRONTLIST_TIMEOUT_SECS: Final[int] = 86400
USERPAGELISTING_TIMEOUT_SECS: Final[int] = 86400
CHANGELOGLIST_TIMEOUT_SECS: Final[int] = 86400
@cache.memoize(timeout=FRONTLIST_TIMEOUT_SECS)
def frontlist(v=None, sort='new', page=1, t="all", ids_only=True, ccmode="false", filter_words='', gt=0, lt=0, sub=None, site=None):
posts = g.db.query(Submission)
if v and v.hidevotedon:
voted = [x[0] for x in g.db.query(Vote.submission_id).filter_by(user_id=v.id).all()]
posts = posts.filter(Submission.id.notin_(voted))
if not v or v.admin_level < 2:
filter_clause = (Submission.filter_state != 'filtered') & (Submission.filter_state != 'removed')
if v:
filter_clause = filter_clause | (Submission.author_id == v.id)
posts = posts.filter(filter_clause)
if sub: posts = posts.filter_by(sub=sub.name)
elif v: posts = posts.filter((Submission.sub == None) | Submission.sub.notin_(v.all_blocks))
if gt: posts = posts.filter(Submission.created_utc > gt)
if lt: posts = posts.filter(Submission.created_utc < lt)
if not gt and not lt:
posts = apply_time_filter(posts, t, Submission)
if (ccmode == "true"):
posts = posts.filter(Submission.club == True)
posts = posts.filter_by(is_banned=False, private=False, deleted_utc = 0)
if ccmode == "false" and not gt and not lt:
posts = posts.filter_by(stickied=None)
if v and v.admin_level < 2:
posts = posts.filter(Submission.author_id.notin_(v.userblocks))
if not (v and v.changelogsub):
posts = posts.filter(not_(Submission.title.ilike('[changelog]%')))
if v and filter_words:
for word in filter_words:
word = sql_ilike_clean(word).strip()
posts = posts.filter(not_(Submission.title.ilike(f'%{word}%')))
if not (v and v.shadowbanned):
posts = posts.join(User, User.id == Submission.author_id).filter(User.shadowbanned == None)
posts = sort_objects(posts, sort, Submission)
if v:
size = v.frontsize or 25
else:
size = 25
posts = posts.offset(size * (page - 1)).limit(size+1).all()
next_exists = (len(posts) > size)
posts = posts[:size]
if page == 1 and ccmode == "false" and not gt and not lt:
pins = g.db.query(Submission).filter(Submission.stickied != None, Submission.is_banned == False)
if sub: pins = pins.filter_by(sub=sub.name)
elif v:
pins = pins.filter((Submission.sub == None) | Submission.sub.notin_(v.all_blocks))
if v.admin_level < 2:
pins = pins.filter(Submission.author_id.notin_(v.userblocks))
pins = pins.all()
for pin in pins:
if pin.stickied_utc and int(time.time()) > pin.stickied_utc:
pin.stickied = None
pin.stickied_utc = None
g.db.add(pin)
pins.remove(pin)
posts = pins + posts
if ids_only: posts = [x.id for x in posts]
g.db.commit()
return posts, next_exists
@cache.memoize(timeout=USERPAGELISTING_TIMEOUT_SECS)
def userpagelisting(u:User, site=None, v=None, page=1, sort="new", t="all"):
if u.shadowbanned and not (v and (v.admin_level > 1 or v.id == u.id)): return []
posts = g.db.query(Submission.id).filter_by(author_id=u.id, is_pinned=False)
if not (v and (v.admin_level > 1 or v.id == u.id)):
posts = posts.filter_by(deleted_utc=0, is_banned=False, private=False, ghost=False)
posts = apply_time_filter(posts, t, Submission)
posts = sort_objects(posts, sort, Submission)
posts = posts.offset(25 * (page - 1)).limit(26).all()
return [x[0] for x in posts]
@cache.memoize(timeout=CHANGELOGLIST_TIMEOUT_SECS)
def changeloglist(v=None, sort="new", page=1, t="all", site=None):
posts = g.db.query(Submission.id).filter_by(is_banned=False, private=False,).filter(Submission.deleted_utc == 0)
if v.admin_level < 2:
posts = posts.filter(Submission.author_id.notin_(v.userblocks))
admins = [x[0] for x in g.db.query(User.id).filter(User.admin_level > 0).all()]
posts = posts.filter(Submission.title.ilike('_changelog%'), Submission.author_id.in_(admins))
if t != 'all':
posts = apply_time_filter(posts, t, Submission)
posts = sort_objects(posts, sort, Submission)
posts = posts.offset(25 * (page - 1)).limit(26).all()
return [x[0] for x in posts]

View file

@ -1,13 +0,0 @@
import re
youtube_regex = re.compile('(<p>[^<]*)(https:\\/\\/youtube\\.com\\/watch\\?v\\=([a-z0-9-_]{5,20})[\\w\\-.#&/=\\?@%+]*)', flags=re.I|re.A)
yt_id_regex = re.compile('[a-z0-9-_]{5,20}', flags=re.I|re.A)
image_regex = re.compile("(^|\\s)(https:\\/\\/[\\w\\-.#&/=\\?@%;+]{5,250}(\\.png|\\.jpg|\\.jpeg|\\.gif|\\.webp|maxwidth=9999|fidelity=high))($|\\s)", flags=re.I|re.A)
linefeeds_regex = re.compile("([^\\n])\\n([^\\n])", flags=re.A)
html_title_regex = re.compile(r"<title>(.{1,200})</title>", flags=re.I)
css_url_regex = re.compile(r'url\(\s*[\'"]?(.*?)[\'"]?\s*\)', flags=re.I|re.A)

View file

@ -1,20 +1,27 @@
import functools
import html
import bleach
from bs4 import BeautifulSoup
from bleach.linkifier import LinkifyFilter, build_url_re
from functools import partial
from .get import *
from os import path, environ
import re
from mistletoe import markdown
from json import loads, dump
from random import random, choice
import urllib.parse
from functools import partial
from os import path
from typing import Optional
import bleach
import gevent
import time
import requests
from files.helpers.regex import *
from files.__main__ import app
from bleach.linkifier import LinkifyFilter, build_url_re
from bs4 import BeautifulSoup
from flask import abort, g
from mistletoe import markdown
from files.classes.domains import BannedDomain
from files.classes.marsey import Marsey
from files.helpers.config.const import (embed_fullmatch_regex,
image_check_regex, video_sub_regex)
from files.helpers.config.environment import (MENTION_LIMIT,
MULTIMEDIA_EMBEDDING_ENABLED,
SITE_FULL)
from files.helpers.config.regex import *
from files.helpers.get import get_user, get_users
TLDS = ('ac','ad','ae','aero','af','ag','ai','al','am','an','ao','aq','ar',
'arpa','as','asia','at','au','aw','ax','az','ba','bb','bd','be','bf','bg',
@ -43,7 +50,7 @@ allowed_tags = ('b','blockquote','br','code','del','em','h1','h2','h3','h4',
'tbody','th','thead','td','tr','ul','a','span','ruby','rp','rt',
'spoiler',)
if app.config['MULTIMEDIA_EMBEDDING_ENABLED']:
if MULTIMEDIA_EMBEDDING_ENABLED:
allowed_tags += ('img', 'lite-youtube', 'video', 'source',)
@ -119,11 +126,9 @@ def render_emoji(html, regexp, edit, marseys_used=set(), b=False):
emoji = i.group(1).lower()
attrs = ''
if b: attrs += ' b'
if not edit and len(emojis) <= 20 and random() < 0.0025 and ('marsey' in emoji or emoji in marseys_const2): attrs += ' g'
old = emoji
emoji = emoji.replace('!','').replace('#','')
if emoji == 'marseyrandom': emoji = choice(marseys_const2)
emoji_partial_pat = '<img loading="lazy" alt=":{0}:" src="{1}"{2}>'
emoji_partial = '<img loading="lazy" data-bs-toggle="tooltip" alt=":{0}:" title=":{0}:" src="{1}"{2}>'
@ -182,7 +187,7 @@ def sanitize(sanitized, alert=False, comment=False, edit=False):
# double newlines, eg. hello\nworld becomes hello\n\nworld, which later becomes <p>hello</p><p>world</p>
sanitized = linefeeds_regex.sub(r'\1\n\n\2', sanitized)
if app.config['MULTIMEDIA_EMBEDDING_ENABLED']:
if MULTIMEDIA_EMBEDDING_ENABLED:
# turn eg. https://wikipedia.org/someimage.jpg into ![](https://wikipedia.org/someimage.jpg)
sanitized = image_regex.sub(r'\1![](\2)\4', sanitized)
@ -220,8 +225,8 @@ def sanitize(sanitized, alert=False, comment=False, edit=False):
names = set(m.group(2) for m in matches)
users = get_users(names,graceful=True)
if len(users) > app.config['MENTION_LIMIT']:
abort(400, f'Mentioned {len(users)} users but limit is {app.config["MENTION_LIMIT"]}')
if len(users) > MENTION_LIMIT:
abort(400, f'Mentioned {len(users)} users but limit is {MENTION_LIMIT}')
for u in users:
if not u: continue
@ -232,7 +237,7 @@ def sanitize(sanitized, alert=False, comment=False, edit=False):
soup = BeautifulSoup(sanitized, 'lxml')
if app.config['MULTIMEDIA_EMBEDDING_ENABLED']:
if MULTIMEDIA_EMBEDDING_ENABLED:
for tag in soup.find_all("img"):
if tag.get("src") and not tag["src"].startswith('/pp/'):
tag["loading"] = "lazy"
@ -281,13 +286,13 @@ def sanitize(sanitized, alert=False, comment=False, edit=False):
if "https://youtube.com/watch?v=" in sanitized: sanitized = sanitized.replace("?t=", "&t=")
if app.config['MULTIMEDIA_EMBEDDING_ENABLED']:
if MULTIMEDIA_EMBEDDING_ENABLED:
captured = []
for i in youtube_regex.finditer(sanitized):
if i.group(0) in captured: continue
captured.append(i.group(0))
params = parse_qs(urlparse(i.group(2).replace('&amp;','&')).query)
params = urllib.parse.parse_qs(urllib.parse.urlparse(i.group(2).replace('&amp;','&')).query)
t = params.get('t', params.get('start', [0]))[0]
if isinstance(t, str): t = t.replace('s','')
@ -297,7 +302,7 @@ def sanitize(sanitized, alert=False, comment=False, edit=False):
sanitized = sanitized.replace(i.group(0), htmlsource)
if app.config['MULTIMEDIA_EMBEDDING_ENABLED']:
if MULTIMEDIA_EMBEDDING_ENABLED:
sanitized = video_sub_regex.sub(r'\1<video controls preload="none"><source src="\2"></video>', sanitized)
if comment:
@ -318,8 +323,6 @@ def sanitize(sanitized, alert=False, comment=False, edit=False):
strip=True,
).clean(sanitized)
soup = BeautifulSoup(sanitized, 'lxml')
links = soup.find_all("a")
@ -331,7 +334,7 @@ def sanitize(sanitized, alert=False, comment=False, edit=False):
href = link.get("href")
if not href: continue
url = urlparse(href)
url = urllib.parse.urlparse(href)
domain = url.netloc
url_path = url.path
domain_list.add(domain+url_path)

View file

@ -1,12 +1,11 @@
from werkzeug.security import *
from os import environ
from files.helpers.config.environment import SECRET_KEY
def generate_hash(string):
msg = bytes(string, "utf-16")
return hmac.new(key=bytes(environ.get("MASTER_KEY"), "utf-16"),
return hmac.new(key=bytes(SECRET_KEY, "utf-16"),
msg=msg,
digestmod='md5'
).hexdigest()
@ -18,6 +17,5 @@ def validate_hash(string, hashstr):
def hash_password(password):
return generate_password_hash(
password, method='pbkdf2:sha512', salt_length=8)

View file

@ -2,15 +2,17 @@ import sys
import gevent
from pusher_push_notifications import PushNotifications
from sqlalchemy.orm import scoped_session
from sqlalchemy.orm import Session
from files.classes.leaderboard import (LeaderboardMeta, ReceivedDownvotesLeaderboard,
GivenUpvotesLeaderboard)
from files.__main__ import db_session, service
from files.classes.leaderboard import (GivenUpvotesLeaderboard,
LeaderboardMeta,
ReceivedDownvotesLeaderboard)
from files.helpers.assetcache import assetcache_path
from files.helpers.const import PUSHER_ID, PUSHER_KEY, SITE_FULL, SITE_ID
from files.__main__ import app, db_session
from files.helpers.config.environment import (ENABLE_SERVICES, PUSHER_ID,
PUSHER_KEY, SITE_FULL, SITE_ID)
if PUSHER_ID != 'blahblahblah':
if service.enable_services and ENABLE_SERVICES and PUSHER_ID != 'blahblahblah':
beams_client = PushNotifications(instance_id=PUSHER_ID, secret_key=PUSHER_KEY)
else:
beams_client = None
@ -47,7 +49,7 @@ _lb_given_upvotes_meta = LeaderboardMeta("Upvotes", "given upvotes", "given-upvo
def leaderboard_thread():
global lb_downvotes_received, lb_upvotes_given
db:scoped_session = db_session() # type: ignore
db: Session = db_session()
lb_downvotes_received = ReceivedDownvotesLeaderboard(_lb_received_downvotes_meta, db)
lb_upvotes_given = GivenUpvotesLeaderboard(_lb_given_upvotes_meta, db)
@ -55,5 +57,5 @@ def leaderboard_thread():
db.close()
sys.stdout.flush()
if app.config["ENABLE_SERVICES"]:
if service.enable_services and ENABLE_SERVICES:
gevent.spawn(leaderboard_thread())

View file

@ -7,9 +7,11 @@ def sql_ilike_clean(my_str):
return my_str.replace(r'\\', '').replace('_', r'\_').replace('%', '').strip()
# this will also just return a bool verbatim
def bool_from_string(input: typing.Union[str, bool]) -> bool:
def bool_from_string(input: typing.Union[str, int, bool]) -> bool:
if isinstance(input, bool):
return input
elif isinstance(input, int):
return bool(input)
if input.lower() in ("yes", "true", "t", "on", "1"):
return True
if input.lower() in ("no", "false", "f", "off", "0"):

View file

@ -1,20 +1,58 @@
import calendar
import time
from datetime import datetime
from datetime import datetime, timedelta
from typing import Final, Union
DATE_FORMAT: Final[str] = '%Y %B %d'
DATETIME_FORMAT: Final[str] = '%Y %B %d %H:%M:%S UTC'
AgeFormattable = Union[int, timedelta]
TimestampFormattable = Union[int, float, datetime, time.struct_time]
def format_datetime(timestamp:TimestampFormattable) -> str:
def format_datetime(timestamp: TimestampFormattable | None) -> str:
return _format_timestamp(timestamp, DATETIME_FORMAT)
def format_date(timestamp:TimestampFormattable) -> str:
def format_date(timestamp: TimestampFormattable | None) -> str:
return _format_timestamp(timestamp, DATE_FORMAT)
def _format_timestamp(timestamp:TimestampFormattable, format:str) -> str:
if isinstance(timestamp, datetime):
def format_age(timestamp: TimestampFormattable | None) -> str:
if timestamp is None:
return ""
timestamp = _make_timestamp(timestamp)
age:int = int(time.time()) - timestamp
if age < 60: return "just now"
if age < 3600:
minutes = int(age / 60)
return f"{minutes}m ago"
if age < 86400:
hours = int(age / 3600)
return f"{hours}hr ago"
if age < 2678400:
days = int(age / 86400)
return f"{days}d ago"
now = time.gmtime()
ctd = time.gmtime(timestamp)
months = now.tm_mon - ctd.tm_mon + 12 * (now.tm_year - ctd.tm_year)
if now.tm_mday < ctd.tm_mday:
months -= 1
if months < 12:
return f"{months}mo ago"
else:
years = int(months / 12)
return f"{years}yr ago"
def _format_timestamp(timestamp: TimestampFormattable | None, format: str) -> str:
if timestamp is None:
return ""
elif isinstance(timestamp, datetime):
return timestamp.strftime(format)
elif isinstance(timestamp, (int, float)):
timestamp = time.gmtime(timestamp)
@ -22,3 +60,12 @@ def _format_timestamp(timestamp:TimestampFormattable, format:str) -> str:
raise TypeError("Invalid argument type (must be one of int, float, "
"datettime, or struct_time)")
return time.strftime(format, timestamp)
def _make_timestamp(timestamp: TimestampFormattable) -> int:
if isinstance(timestamp, (int, float)):
return int(timestamp)
if isinstance(timestamp, datetime):
return int(timestamp.timestamp())
if isinstance(timestamp, time.struct_time):
return calendar.timegm(timestamp)

192
files/helpers/validators.py Normal file
View file

@ -0,0 +1,192 @@
import shutil
import time
import urllib.parse
from dataclasses import dataclass
from typing import Optional
from flask import Request, abort, request
from werkzeug.datastructures import FileStorage
import files.helpers.embeds as embeds
import files.helpers.sanitize as sanitize
from files.helpers.config.environment import SITE_FULL, YOUTUBE_KEY
from files.helpers.config.const import (SUBMISSION_BODY_LENGTH_MAXIMUM,
SUBMISSION_TITLE_LENGTH_MAXIMUM,
SUBMISSION_URL_LENGTH_MAXIMUM)
from files.helpers.content import canonicalize_url2
from files.helpers.media import process_image
def guarded_value(val:str, min_len:int, max_len:int) -> str:
'''
Get request value `val` and ensure it is within length constraints
Requires a request context and either aborts early or returns a good value
'''
raw = request.values.get(val, '').strip()
raw = raw.replace('\u200e', '')
if len(raw) < min_len: abort(400, f"Minimum length for {val} is {min_len}")
if len(raw) > max_len: abort(400, f"Maximum length for {val} is {max_len}")
# TODO: it may make sense to do more sanitisation here
return raw
def int_ranged(val:str, min:int, max:int) -> int:
raw:Optional[int] = request.values.get(val, default=None, type=int)
if raw is None or raw < min or raw > max:
abort(400,
f"Invalid input ('{val}' must be an integer and be between {min} and {max})")
return raw
@dataclass(frozen=True, kw_only=True, slots=True)
class ValidatedSubmissionLike:
title: str
title_html: str
body: str
body_raw: Optional[str]
body_html: str
url: Optional[str]
thumburl: Optional[str]
@property
def embed_slow(self) -> Optional[str]:
url:Optional[str] = self.url
url_canonical: Optional[urllib.parse.ParseResult] = self.url_canonical
if not url or not url_canonical: return None
embed:Optional[str] = None
domain:str = url_canonical.netloc
if domain == "twitter.com":
embed = embeds.twitter(url)
if url.startswith('https://youtube.com/watch?v=') and YOUTUBE_KEY:
embed = embeds.youtube(url)
if SITE_FULL in domain and "/post/" in url and "context" not in url:
id = url.split("/post/")[1]
if "/" in id: id = id.split("/")[0]
embed = str(int(id))
return embed if embed and len(embed) <= 1500 else None
@property
def repost_search_url(self) -> Optional[str]:
search_url = self.url_canonical_str
if not search_url: return None
if search_url.endswith('/'):
search_url = search_url[:-1]
return search_url
@property
def url_canonical(self) -> Optional[urllib.parse.ParseResult]:
if not self.url: return None
return canonicalize_url2(self.url, httpsify=True)
@property
def url_canonical_str(self) -> Optional[str]:
url_canonical:Optional[urllib.parse.ParseResult] = self.url_canonical
if not url_canonical: return None
return url_canonical.geturl()
@classmethod
def from_flask_request(cls,
request:Request,
*,
allow_embedding:bool,
allow_media_url_upload:bool=True,
embed_url_file_key:str="file2",
edit:bool=False) -> "ValidatedSubmissionLike":
'''
Creates the basic structure for a submission and validating it. The
normal submission API has a lot of duplicate code and while this is not
a pretty solution, this essentially forces all submission-likes through
a central interface.
:param request: The Flask Request object.
:param allow_embedding: Whether to allow embedding. This should usually
be the value from the environment.
:param allow_media_url_upload: Whether to allow media URL upload. This
should generally be `True` for submission submitting if file uploads
are allowed and `False` in other contexts (such as editing)
:param embed_url_file_key: The key to use for inline file uploads.
:param edit: The value of `edit` to pass to `sanitize`
'''
def _process_media(file:Optional[FileStorage]) -> tuple[bool, Optional[str], Optional[str]]:
if request.headers.get("cf-ipcountry") == "T1": # forbid Tor uploads
return False, None, None
elif not file:
# We actually care about falseyness, not just `is not None` because
# no attachment is <FileStorage: '' ('application/octet-stream')>
# (at least from Firefox 111).
return False, None, None
elif not file.content_type.startswith('image/'):
abort(415, "Image files only")
name = f'/images/{time.time()}'.replace('.','') + '.webp'
file.save(name)
url:Optional[str] = process_image(name)
if not url: return False, None, None
name2 = name.replace('.webp', 'r.webp')
shutil.copyfile(name, name2)
thumburl:Optional[str] = process_image(name2, resize=100)
return True, url, thumburl
def _process_media2(body:str, file2:Optional[list[FileStorage]]) -> tuple[bool, str]:
if request.headers.get("cf-ipcountry") == "T1": # forbid Tor uploads
return False, body
elif not file2: # empty list or None
return False, body
file2 = file2[:4]
if not all(file for file in file2):
# Falseyness check to handle <'' ('application/octet-stream')>
return False, body
for file in file2:
if not file.content_type.startswith('image/'):
abort(415, "Image files only")
name = f'/images/{time.time()}'.replace('.','') + '.webp'
file.save(name)
image = process_image(name)
if allow_embedding:
body += f"\n\n![]({image})"
else:
body += f'\n\n<a href="{image}">{image}</a>'
return True, body
title = guarded_value("title", 1, SUBMISSION_TITLE_LENGTH_MAXIMUM)
title = sanitize.sanitize_raw(title, allow_newlines=False, length_limit=SUBMISSION_TITLE_LENGTH_MAXIMUM)
url = guarded_value("url", 0, SUBMISSION_URL_LENGTH_MAXIMUM)
body_raw = guarded_value("body", 0, SUBMISSION_BODY_LENGTH_MAXIMUM)
body_raw = sanitize.sanitize_raw(body_raw, allow_newlines=True, length_limit=SUBMISSION_BODY_LENGTH_MAXIMUM)
if not url and allow_media_url_upload:
has_file, url, thumburl = _process_media(request.files.get("file"))
else:
has_file = False
thumburl = None
has_file2, body = _process_media2(body_raw, request.files.getlist(embed_url_file_key))
if not body_raw and not url and not has_file and not has_file2:
raise ValueError("Please enter a URL or some text")
title_html = sanitize.filter_emojis_only(title, graceful=True)
if len(title_html) > 1500:
raise ValueError("Rendered title is too big!")
return ValidatedSubmissionLike(
title=title,
title_html=sanitize.filter_emojis_only(title, graceful=True),
body=body,
body_raw=body_raw,
body_html=sanitize.sanitize(body, edit=edit),
url=url,
thumburl=thumburl,
)

View file

@ -1,11 +1,16 @@
from .get import *
from .alerts import *
from files.helpers.const import *
from files.__main__ import db_session
from random import randint
import user_agents
import functools
import time
import user_agents
from files.helpers.alerts import *
from files.helpers.config.const import *
from files.helpers.config.environment import SITE
from files.helpers.get import *
from files.routes.importstar import *
from files.__main__ import app, cache, db_session
def get_logged_in_user():
if hasattr(g, 'v'):
return g.v
@ -80,68 +85,52 @@ def get_logged_in_user():
g.v = v
return v
def check_ban_evade(v):
if v and not v.patron and v.admin_level < 2 and v.ban_evade and not v.unban_utc:
v.shadowbanned = "AutoJanny"
g.db.add(v)
g.db.commit()
def auth_desired(f):
@functools.wraps(f)
def wrapper(*args, **kwargs):
v = get_logged_in_user()
check_ban_evade(v)
return make_response(f(*args, v=v, **kwargs))
wrapper.__name__ = f.__name__
return wrapper
def auth_required(f):
@functools.wraps(f)
def wrapper(*args, **kwargs):
v = get_logged_in_user()
if not v: abort(401)
check_ban_evade(v)
return make_response(f(*args, v=v, **kwargs))
wrapper.__name__ = f.__name__
return wrapper
def is_not_permabanned(f):
@functools.wraps(f)
def wrapper(*args, **kwargs):
v = get_logged_in_user()
if not v: abort(401)
check_ban_evade(v)
if v.is_suspended_permanently:
abort(403, "You are permanently banned")
return make_response(f(*args, v=v, **kwargs))
wrapper.__name__ = f.__name__
return wrapper
def admin_level_required(x):
def wrapper_maker(f):
@functools.wraps(f)
def wrapper(*args, **kwargs):
v = get_logged_in_user()
if not v: abort(401)
if v.admin_level < x: abort(403)
return make_response(f(*args, v=v, **kwargs))
wrapper.__name__ = f.__name__
return wrapper
return wrapper_maker