Beep Boop Bip
[Return] [Entire Thread] [Last 50 posts] [First 100 posts]
Posting mode: Reply
Name
Email
Subject   (reply to 1547)
Message
BB Code
File
File URL
Embed   Help
Password  (for post and file deletion)
  • Supported file types are: BMP, C, CPP, CSS, EPUB, FLAC, FLV, GIF, JPG, OGG, PDF, PNG, PSD, RAR, TORRENT, TXT, WEBM, ZIP
  • Maximum file size allowed is 10000 KB.
  • Images greater than 260x260 pixels will be thumbnailed.
  • Currently 904 unique user posts.
  • board catalog

File 150448609042.jpg - (110.47KB , 1280x720 , mpv-shot0028.jpg )
1547 No. 1547 [Edit]
It doesn't matter if you're a beginner or Dennis Ritchie, come here to talk about what you are doing, your favorite language and all that stuff.
I've been learning python because c++ was too hard for me (I'm sorry nenecchi I failed to you), reached OOP and it feels weird compared to the latter one, anyway I never got it completely.
202 posts omitted. Last 50 shown. Expand all images
>> No. 2746 [Edit]
>>2745
Ah I see I was looking at the docs for the go mysql package, not the sqlite one: https://pkg.go.dev/github.com/mxk/go-sqlite/sqlite3?utm_source=godoc#hdr-Concurrency

>A single connection instance and all of its derived objects (prepared statements, backup operations, etc.) may NOT be used concurrently from multiple goroutines without external synchronization.

So you're right, I think manually creating the threadpool is the best way to do it
>> No. 2747 [Edit]
>>2745
>Did I get that right? It's very clever, is this an idiomatic Go pattern?
Yep. I got the idea and how to do it from this article.
https://turriate.com/articles/making-sqlite-faster-in-go

I looked at the mentioned other drivers, but apart from being much less popular, the more recently updated one(two years ago) had cryptic documentation and alien syntax(to me at least). So I figured it wasn't worth whatever performance gain it might bring.
>> No. 2748 [Edit]
File 165384761178.png - (138.88KB , 1042x555 , highlight.png )
2748
Small, but nice update. Linked to posts are now highlighted using CSS' :target selector. Both tc and lainchan rely on js for this.

Also, I used hey to test tc's speed with this link http://tohno-chan.com/navi/res/1547.html#2747
I only got 32 requests/second. It's not directly comparable considering the difference in database size(and I'm not sure what link is used to retrieve a post preview), but it looks really bad. Not sure how much of it is tc's implementation, and how much it is from their hosting.

On lainchan, with this link https://lainchan.org/%CE%A9/res/60314.html#60333 I got 318 requests/second.

edit: yeah, the link used makes a big difference, doing https://[200:c5b0:cfeb:5db:c054:d66d:eb6f:7412]/content/media/toggle/#no2 which is on my own site, gives me 299 requests/second.

edit2: using the network tool, I've figured out that the link tc uses to get a post preview is probably
"http://tohno-chan.com/read.php?b=navi&t=1547&p=2745&single="
This link gives me about 25 requests/second
Lainchan seems to load the entire page a post is on, then uses js to parse it out to get a preview. So the speed is no different from loading a page.

edit3: This link "http://tohno-chan.com/read.php?b=navi&t=1547&p=2744&single=" gave 15 requests/second the first time, then 28 requests/second the second time. Maybe caching has to do with that difference. Also, the response is of type text/html with a size of 2.47 kb. The contents includes unnecessary things like checkboxes and links, as you can see here https://files.catbox.moe/8lnz6r.txt

Mine send text/plain, of sizes ranging from 9 b to 1.5 kb, the contents of which are then just shoved into the page by htmx.

Post edited on 29th May 2022, 11:45am
>> No. 2749 [Edit]
>>2748
>Mine send text/plai
Have you also considered adding gzip compression on top? Probably won't help much for post preview since that's usually pretty small, but it could be useful for full page loads.
>> No. 2750 [Edit]
>>2749
I did better, I use brotli compression. Talked about it and how it requires https here http://tohno-chan.com/ot/res/37253.html#39678
>> No. 2751 [Edit]
Using the <a> tags's download attribute, you can set a file to be downloaded with a different name from the one stored on the server. No need to edit headers. For imageboards, this feature would be especially useful, yet despite being available for at least 7 years now, it doesn't seem to be used by any.
>> No. 2752 [Edit]
>>2751
Don't imageboards running jschan do this? The <a> tag surrounding the user-supplied filename has its download attribute set to that filename, while the <a> tag surrounding the thumbnail does not.
>> No. 2753 [Edit]
>>2752
Not sure. I didn't do that much research or have heard of jschan before. Don't recall seeing it in the wild.
>> No. 2754 [Edit]
>>2748
Are you doing everything with no JS?

I was considering writing my own imageboard until I effectively concluded that I can't really think of any features that i'd like to add that I'm missing .. I'm fairly happy with imageboards as they are it seems.
>> No. 2755 [Edit]
>>2754
Not everything. I'm using htmx, a js library, to take care of post previews. Doing it otherwise would be an architectural nightmare.
>> No. 2756 [Edit]
I'm on the fence about how to implement posting. I'm not sure whether to edit stored html files, or use templates to generate requested pages every time they're asked for. Which is standard, and which is faster?

edit: generating it every time seems better. My intuition says this is inefficient, but for "dynamic content" it makes a lot more sense. Adding, editing, and deleting posts can be handled entirely within the database without editing anything else. That's also probably how tc and others work.

Post edited on 31st May 2022, 7:05am
>> No. 2757 [Edit]
>>2756
Latter is better. Note that you can do a hybrid where you maintain a cache of generated html pages, and after updating raw text you invalidate cache for all pages that depend on the updated item (e.g. the thread, the catalog, the homepage).

The reason why editing stored html file directly is not a good option is that it results in duplication. For instance, let's say an OP post gets edited: you're going to have to edit html for the catalog, the post, and the post preview. Whereas in the latter approach you only have to implement page generation, and since the DB is effectively in "normalized form" you don't need to update anything else. It also presents issues with needing to backfill edits if e.g. you change the html structure.

So overall it's almost always a good idea to decouple the presentation layer from the underlying data sources. It allows you to do things like edit an attached image without also changing the text of the post

Post edited on 31st May 2022, 1:04pm
>> No. 2758 [Edit]
>>2757
I'm still learning how to use templates. I guess I'll generate a page when a thread is made, then whenever in that thread a post is added, deleted, or edited, I'll regenerate the page.

Not sure how else I'd invalidate cache.

Post edited on 31st May 2022, 3:38pm
>> No. 2759 [Edit]
>>2758
There's two ways you can use the cache: lazily or not-lazily. Let's assume you already have a function which will generate html given the raw db fields.
With the lazy approach, whenever a post is added/deleted/edited, you remove the thread from the cache. Then whenever someone next requests it, the cache lookup will miss and it will be generated and added to the cache. With the non-lazy approach, you immediately repopulate the cache with the updated html.

There's a ok-ish summary of the pros vs. cons of each in [1]

[1] https://docs.aws.amazon.com/AmazonElastiCache/latest/mem-ug/Strategies.html

I don't know if you've decided on the specific cache implementation yet. Maybe see if go has any simple in-memory key-value store libraries? Even the simple stdlib hashmap would probably work actually, since we don't need to set any expiration here.
If you want to get really fancy you could have a 2-tier in-memory and on-disk cache, and LRU spill from in-memory to on-disk.

Post edited on 31st May 2022, 4:30pm
>> No. 2760 [Edit]
>>2759
For my use case, write-through makes more sense in my opinion. I'm not doing crazy, amazon/google level infrastructure. I don't need that much flexibility because I don't expect any kind of breaking, retrieval failure. It also wont take so much space I would need to delete cached content.

>the specific cache implementation yet
I was thinking thread names would do all the work, since they're all nearly identical to each other and tied to actions that change them(adding, deleting posts and editing). There would be no difference between the pages served and the pages "cached". I don't think there's any reason to do something more fancy.

Post edited on 31st May 2022, 10:46pm
>> No. 2761 [Edit]
File 165428976695.png - (404.31KB , 1686x946 , template.png )
2761
I'm making progress with templates.
>> No. 2762 [Edit]
>>2761
What template library is that?
>> No. 2763 [Edit]
>>2762
Go's built-in text/template library.
https://pkg.go.dev/text/template

html/template has the same interface, but I don't need it for my purposes.

Post edited on 3rd Jun 2022, 5:03pm
>> No. 2764 [Edit]
File template_test.zip - (1.92MB , template test.zip )

2764
>>2763
And here's all the files if you're interested.

good summary of how to use the library
https://blog.gopheracademy.com/advent-2017/using-go-templates/

Post edited on 3rd Jun 2022, 5:08pm
>> No. 2765 [Edit]
>>2763
Neat, that's very powerful to include in stdlib. It's interesting that google decided to create separate template language for Go instead of reusing closure templates. I guess the stdlib template is more powerful since it allows interop'ing with Go function calls whereas closure templates is kind of in its own world (which is in some ways nice as it makes templates self-contained and usable between languages, but also means you have to work around things sometimes).
>> No. 2766 [Edit]
File backup2.zip - (7.24MB )

2766
update: I've(mostly) remade the model thread using templates. File contains the entire current project.

Post edited on 4th Jun 2022, 12:30am
>> No. 2772 [Edit]
File 165454185234.png - (329.31KB , 1920x644 , success.png )
2772
Basic thumbnail support added. I'll make a git repo for the project now.

edit: https://gitgud.io/nvtelen/ogai

Post edited on 6th Jun 2022, 12:23pm
>> No. 2773 [Edit]
File 165457130847.png - (3.79KB , 491x59 , replies to.png )
2773
Anyone know how "replies to:" is usually implemented? You can't have multiple values in an sql column, so I don't know how that info is usually stored.

Any suggestions?
>> No. 2774 [Edit]
>>2773
https://condor.depaul.edu/gandrus/240IT/accesspages/relationships.htm
>> No. 2775 [Edit]
>>2773
>Anyone know how "replies to:" is usually implemented

Most places I've seen do this client-side in JS, Tohnochan seems to be an exception in doing it server-side. As you mentioned it's not easy to express this is a SQL query since there's no efficient way to do the join. (Theoretically if you use a more heavyweight sql engine like postgres I guess you could probably store a list of referenced posts in the row, and then do some sort of join & array filter, but it's probably not going to be efficient). The two other options are storing this reference data separately, a simple map from post id -> replies is all you need, but that's also not efficient since it requires you to do an update for an old key on a post reply.

I think that TC does it by basically moving the client-side approach server-side, or what I mean is they reconstruct the forward-referencing reply-tos after all the posts are retrieved. I guess this based on the fact that the "replies" only works within the same thread, and doesn't work from the post-preview, although I could be wrong.

So I guess basically after you retrieve the sql output for the posts in a thread, just iterate over them and construct the map of replies. You can do a neat trick where since the posts are guarnateed to be monotically increasing order, you don't need to do a hashmap of post number to arr of replies, but can instead just allocate an array of length (# of posts in thread), and iterate based on the index.
>> No. 2776 [Edit]
File 165458034195.png - (108.15KB , 1014x556 , behavior.png )
2776
>>2775
Based on >>2774 I've come up with a solution. Thread content is updated by putting values from an sql query into structs(golang), and then inserting the values of those structs into a template, the result of which overwrites a thread's html file.

However, you can probably have different members in one struct be populated by different queries' results. So I can get content data from one table, and reply data from another. I plan to make another table(one for each board) that contains a "source" column and "replier" column, and adding a reply member(string array) to the post struct.

When someone quotes another person, the reply table has a new row added. When the thread is updated in some way, it is reconstructed by getting values from both tables. Not sure how efficient this is, but it should work.

>"replies" only works within the same thread
Actually, I think they work across an entire board, but it's kinda glitchy(pic related).

Post edited on 6th Jun 2022, 10:40pm
>> No. 2777 [Edit]
>>2776
Yeah I think that's the "reference data separately (map from post id -> replies)" solution, although I personally don't like it since it means one single reply can invalidate the cache for (and thus force regneration of) multiple threads. I.e. consider a "mass quote" type scenario (which is not uncommon on 4chan at least), where a reply quotes one post each from every thread that exists. THen you'll have to regenerate all threads.

It does have the advantage that it allows cross-thread replies to show up though. The performance probably won't be a concern for small image board though.
>> No. 2778 [Edit]
>>2777
There's a simple fix for this: only add a row to the reply table if the "source" and "replier" post have the same "parent" value(belong to the same thread). This is an acceptable tradeoff in my opinion because the "replies to" feature is mostly useful for following inner-thread conversations. You don't get that much from knowing a post was quoted in another thread.

Post edited on 7th Jun 2022, 12:30am
>> No. 2779 [Edit]
File 165462606561.png - (158.17KB , 943x669 , progress5.png )
2779
"Replies to" added. The next thing I plan on adding is post formatting, which I think will be quite a hurdle because of the parsing part.

Post edited on 7th Jun 2022, 11:22am
>> No. 2780 [Edit]
>>2779
Regex should get you most of the way there unless you want to support something like markdown, and even then, go should have packages galore for that.
>> No. 2781 [Edit]
>>2779
>>2780
I'd also second the suggestion to consider markdown over bbcode just because bbcode syntax is really annoying when typing, but on the flipside I've never liked that markdown mangles line breaks, and bbcode is more classic-imageboard style.

Also, if you aren't doing escaping already you need to this add this, otherwise you will be vulnerable to reflected xss attacks. You mentioned using text/template instead of html/template; this is a mistake, you really want to use the latter as it will do the escaping for you. Actually, seems like even html/template has issues, and the suggested replacement is Google's SafeHTML which should be a drop-in replacement. (One thing I liked about Closure Templates was that it had a robust set of sanitizers).
>> No. 2782 [Edit]
>>2781
The other thing about markdown is that its popular implementations don't support spoiling text, or at least that's been my experience. So that would have to be added separately.
>> No. 2783 [Edit]
>>2782
>popular implementations don't support spoiling text
That's a good point. The syntax i've seen in a few places like stackoverflow uses >! as the delimiter, but if it isn't already supported in the markdown library you'd probably need to add it yourself. Also same for rarer features like switching to ms pgothic font.

Also I never noticed the "test" bbcode button in TC, I wonder what that does... <test>test test</test> [test]test test[/test]

Post edited on 7th Jun 2022, 6:59pm
>> No. 2784 [Edit]
File 165465609418.png - (79.61KB , 485x208 , huhh.png )
2784
>>2781
I plan on copying schemeBBS' rules. I have no attachment to bbcode.

>using text/template instead of html/template
I'm already using html.EscapeString() for this. I don't want to use html/template because I'm just shoving all of a post's contents into a single <p> tag, so I need html, that I add, to be placed within the template. This is a lot simpler than making a scheme that breaks a post up into multiple <p> or <blockquote> tags.

This is secure enough in my opinion. I think a lot of the more "advanced" security measures exist to protect paying clients on sites with sensitive information, even if they're using something like internet explorer. My server has a CSP, so anybody using a modern browser should be fine. Correct me if I'm wrong though.
>> No. 2785 [Edit]
Hitting a bit of a snag because GO's standard regular expression library doesn't support negative lookahead. So > and >> are inoperable. Now I'm looking for an external library.
>> No. 2786 [Edit]
>>2785
I don't follow, this should be possible without negative lookahead? ^>[^>]* will match for the line quotes, and >>[0-9]+ will match the post numbers.
>> No. 2787 [Edit]
>>2786
^> wont match for quotes that aren't at the start of a line >like this. (actually, I don't think that works on tc either...)

[^ >] wont work because I've already replaced > with &gt; and [^ (&gt;)] gives false matches.

I have figured out a work around though. First search for &gt;&gt; and replace them with &#62;&#62;
>> No. 2788 [Edit]
>>2787
>aren't at start of line
Yes, that was intentional to match the TC behavior (I think other imageboards also only allow quote on its own separate line). You could always drop the anchor though if you really wanted.

>and [^ (&gt;)] gives false matches
That's because it's doing negation on the individual characters instead of the entire string. It should technically be possible to convert a negative lookahead into a strictly regular expression, but it probably blows up your state space. E.g in a simple example of "foo(?!bar)" you could convert to the union of {foo, foo + ^b, foo + !ba, foo + !bar} and then manually unroll !ba (negation of regular language is also regular, but the actual regex for it is probably ugly due to exponential blowup of powerset construction).

But 2 things: why not just do a second pass over the text to exclude the negated match, (i.e. if you want to match everything except foo, instead of trying to do this directly via a regex for (?!foo) just check for instead and then invert your result.

But there's no need to even worry about excluding match, since you can just prioritize match order: e.g. if ">>asdf" matches both >.* and >>.*, then just prioritize the latter match.

Also why not figure out the formatting before you do the escaping? You'll need to tag the quotes as a separate class anyway for css styling, so it seems easier to do escaping after?

Post edited on 9th Jun 2022, 12:33am
>> No. 2789 [Edit]
>>2788
>You could always drop the anchor
Before, I didn't want this> to be a valid quote. Now I don't really care.
>you can just prioritize match order
GO does not provide this out of the box. I checked. https://stackoverflow.com/questions/61836985/regexp-find-a-match-with-priority-order
>why not figure out the formatting before you do the escaping?
I already had escaping implemented and doing it first feels safer.

Anyway, the solution I came up with works and is simple.
>> No. 2790 [Edit]
File 165476357172.png - (21.24KB , 1116x312 , format.png )
2790
>>2789
Yep. It's all working.
>> No. 2791 [Edit]
>>2790
>GO does not provide this out of the box.
Just do the logic in go itself, instead of trying to do the prioritization inside a regex directly.

>implemented and doing it first feels safer.
As long as you be sure to escape before surrounding with tags, it should be equally safe.

That said, as long as it's working the exact method doesn't matter I suppose.
>> No. 2792 [Edit]
File 165491770874.jpg - (131.39KB , 1202x1738 , e99b27493a98506f87c991a73f14912a.jpg )
2792
So I've been messing around with the finer details. When I tried using the hey tool both on the link that gets post previews, and on the link used to create posts, the database locked up.

So I experimented with various connection strings used when opening the database, and I arrived at setting the cache to private and using the wal journal mode.
https://sqlite.org/wal.html

This fixed the issue and had the added bonus of making preview retrieval nearly twice faster. Now though, I want to prevent other people from spamming blank posts from the command line just by sending requests to the right link. Not sure how to accomplish this. Https referer is apparently kinda deprecated and inaccurate.

Any suggestions?
>> No. 2793 [Edit]
>>2792
> I want to prevent other people from spamming blank posts from the command line just by sending requests to the right link
Any form of request checkin based on client-provided information is just going to be bypassed by a determined adversary, especially since headless automation is hard to distinguish from a "real" client. Just avoid the cat-mouse game and use rate limiting with exponential backoff. Fail2Ban server-side usually works well as a catch-all.
>> No. 2794 [Edit]
>>2793
Yeah, I've been thinking the same thing. Would prefer to implement rate limiting myself though. So I'm gonna look for ideas in this longish article
https://mauricio.github.io/2021/12/30/rate-limiting-in-go.html
>> No. 2795 [Edit]
>>2794
Can you do rate limiting at nginx level?
>> No. 2796 [Edit]
File 165492601011.jpg - (369.49KB , 1295x1812 , 7d7a4f200804d6a0c20214693801be67.jpg )
2796
>>2795
I dunno, maybe? I probably wouldn't be able to make it as granular though. All requests to the imageboard go through one reverse proxy, which is located at the location /im/ (an actual folder which I just keep empty).

From there, requests go to different functions based on the rest of their path, like /im/post/ to the new post function. Thing is, I want to limit that much more than I limit how fast someone can get post previews. Doing it programmatically also has the added benefit of making the program more portable. My nginx configuration isn't really part of the program I'm writing.

Post edited on 10th Jun 2022, 10:41pm
>> No. 2799 [Edit]
File 165497425711.png - (387.04KB , 1460x835 , burichan.png )
2799
I added a theme picker by setting a cookie with Go and using nginx's substitution module to replace the name of the css file loaded on the page based on it. Zero javascript was needed. I got the idea from reading this largely unrelated article
https://scotthelme.co.uk/csp-nonce-support-in-nginx/

I tried adding the same functionality to my bbs a few months ago, before I knew about the substitution module, and the impression I got from my research was js being the only possible way of doing it without have two separate html files(the solution I ended up going with).

I'm amazed by how simple this was, yet how uncommonly it's used(at least in this sort of way). Really it should be a standard part of webdevs' toolkit.

edit: side note, it's really annoying you can't have every option within the select tag act like the submit input. W3C arbitrarily decided clicking options can only do something on their own if you use js.

Post edited on 11th Jun 2022, 12:20pm
>> No. 2806 [Edit]
I built a massive function pipeline that made LLD kill itself. Both proud and annoyed.
>> No. 2807 [Edit]
>>2806
Try using mold: https://github.com/rui314/mold
>> No. 2808 [Edit]
>>2807
I would, but I'm using Windows.
[Return] [Entire Thread] [Last 50 posts] [First 100 posts]

View catalog

Delete post []
Password  
Report post
Reason  


[Home] [Manage]



[ Rules ] [ an / foe / ma / mp3 / vg / vn ] [ cr / fig / navi ] [ mai / ot / so / tat ] [ arc / ddl / irc / lol / ns / pic ] [ home ]