Where the autistic get artistic.
[Return] [Entire Thread] [Last 50 posts] [First 100 posts]
Posting mode: Reply
Name
Email
Subject   (reply to 3322)
Message
BB Code
File
File URL
Embed   Help
Password  (for post and file deletion)
  • Supported file types are: FLV, GIF, JPEG, JPG, MP3, OGG, PDF, PNG, RAR, SWF, WEBM, ZIP
  • Maximum file size allowed is 25000 KB.
  • Images greater than 260x260 pixels will be thumbnailed.
  • Currently 1068 unique user posts.
  • board catalog

File 165518288236.png - (3.87MB , 3808x940 , comp6.png )
3322 No. 3322 [Edit]
I don't want to keep flooding the /navi/ thread, so I'll post updates here from now on. Crushed some bugs, and added some features, including rate limiting(ended up only needing the stdlib for that).

I added a feature I'm slightly unsure about since it's pretty unconventional. CSS allows you to define the maximum height and width an image can occupy based on the viewport. I decided to use this to limit how large a thumbnail can expand. I think this improves the user experience, since you'll never scroll to look at an image piece by piece, or open it in a new tab.

I'm a little worried it'll mislead people into thinking images are lower res than they really are . The file info does include the real image dimensions though. Pic is a comparison of my behavior compared to tc's.

https://gitgud.io/nvtelen/ogai

Post edited on 13th Jun 2022, 10:10pm
Expand all images
>> No. 3341 [Edit]
File 16554060588.png - (175.51KB , 1056x896 , update.png )
3341
I've implemented the reply to feature. Also a file size and character limit.

Next, I'm going to add support for a configuration file, which I'll use the TOML format for cause I've heard good things about it. The location will be hardcoded to /ect/ogai.toml since that's kind of standard. This will let the base path of the imageboard be chosen by the user more easily.

After that, I'll finally get started on multiple-thread support.
>> No. 3342 [Edit]
So what will your imageboard be about? What is the main audience? It's always nice to see new chans coming to life.
>> No. 3343 [Edit]
File 165542332882.png - (198.58KB , 500x500 , saya.png )
3343
>>3342
The theme is Saya no Uta(hence why the engine is called Ogai). It's hard to pin down the audience(I'm not sure they even exist), but for the most part I'd like it to be like tc with more of a technology and creativity focus.

I don't have a waifu myself, so I don't plan for that to be part of it, and I don't want there to be as many restrictions on 2d adult content. Posting loli anywhere will be fine. Made this thread a while ago http://tohno-chan.com/ot/res/32344.html

Ideally, people who have a personality more similar to mine would come, but I'm kind of doubtful that will happen. If nothing else, maybe what I'm making will suit someone's needs better than anything currently available.
>> No. 3344 [Edit]
>>3343
>It's hard to pin down the audience(I'm not sure they even exist), but for the most part I'd like it to be like tc with more of a technology and creativity focus.
What I fear is that usually unless new imageboards have a very specific niche not served by existing ones, activity on one usually will cannibalize the others. As much as I'd like to say that it isn't a zero-sum game, people don't usually bother cross-posting threads, so any thread made on one is at the expense of the other, and so on.
>> No. 3345 [Edit]
>>3344
You're right. However, while I don't have a very specific niche in mind, I do feel tc is missing something. Few posts are related to what users make. Part of that is probably because not many people here make things. Maybe the people needed just aren't anywhere in the "image board space", but I'd like to imagine threads being made about long-term projects. "Productive" hobbies rather than just consumption hobbies. Somewhere people could bounce ideas around and give advice, and in other fields besides programming too.

Relaxed restrictions on what images people can post also interests me.
>> No. 3346 [Edit]
>>3345
I'd like that - I've enjoyed reading about your process of working on the imageboard, so a board centered around tech projects would be up my alley. Although I suppose there's a fine line between having that turn into what's essentially a blog. TC has /cr/ but that's not really active, so there's a chance to get a foodhold there.

I don't have an opinion on the relaxed image restrictions other than that TC's no 3D rule is the greatest thing since sliced bread. Relaxing that to ecchi as well shouldn't really make a difference in my opinion since post content is first and foremost, while the images are merely dessert.
>> No. 3348 [Edit]
I just want to say I've been silently following your progress as well and I'm even a very, very minuscule part of the process since you used my maneki neko gif to do some testing, which made me smile. Whatever the theme of your ib I'll try to post in there as well. If you keep it well moderated and free from spam and trash, you'll likely find an audience.

>TC's no 3D rule is the greatest thing since sliced bread.
Yes, I can't agree enough, it's brilliant.
>> No. 3349 [Edit]
>>3348
Thanks. Your gif is nice. Try making some like that with cute girls.
>> No. 3350 [Edit]
File 165573252299.png - (8.33KB , 288x96 , trio.png )
3350
>>3349
I tend to go with Japanese/Chinese myth/folklore creatures and lighthearted Buddhist stuff but I'll try to make a few cute girls too. Anyway, that's offtopic, please keep us posted on your progress.
>> No. 3351 [Edit]
File 165602958536.png - (122.90KB , 1263x1433 , multi thread update.png )
3351
Made quite a bit of progress on multiple threads. Pic doesn't show it, but they reorder based on which was last posted in.

Post edited on 23rd Jun 2022, 5:20pm
>> No. 3352 [Edit]
>>3345
>Few posts are related to what users make.
I've been trying to get into making midi music (and stuff for some older sound chips) over the past few years. I sorta wanted to post something on Tohno at some point, but I haven't really gotten anything I'm satisfied with yet. Maybe I'll post something on your new board if you get things up and running, but I've never played Saya no Uta.
>> No. 3353 [Edit]
>>3352
The creation process itself is interesting enough to post about I think. Not just final products.
>I've never played Saya no Uta
Do yourself a favor and experience it.
>> No. 3354 [Edit]
>>3353
>The creation process itself is interesting enough to post about I think. Not just final products.
I guess you're right, but I'm kind of apprehensive towards posting something that might not ever get finished, since I have hundreds of midi files in my folder but nothing that could really be called a complete song. I have been making a bit more headway recently, and I've even finished a few midi arrangements of some existing game music, so maybe I'll end up changing my mind.
>Do yourself a favor and experience it.
I want to try actually reading VNs for a change but it feels like I can never get around to it. I will try to read Saya, though.
>> No. 3355 [Edit]
File 165664402927.png - (440.79KB , 1263x638 , catalog progress.png )
3355
Added a catalog.
>> No. 3356 [Edit]
>>3355
Neat. Although I'm curious, does anyone use the catalog on TC? It doesn't seem useful on boards with little thread turnover since once you've gone through the board once, you already know which posts are there. I've only used it a few times for searching, but even then I'd prefer a dedicated full-text index so the replies could be searched as well.
>> No. 3357 [Edit]
>>3356
I use it whenever a thread I want to reply to isn't on the front page, or to look at older threads in general(often whenever I'm on lainchan). I've decided not to add multiple pages. Instead, after the first ten updated threads, everything else will accessible only through the catalog.

I might look into a simple, exact text search engine later on.
>> No. 3358 [Edit]
>>3357
Sqlite does full text search as well (FTS3/FTS4 is the the term to google). So you don't have to do anything special, just add an extra index.
>> No. 3359 [Edit]
>>3358
https://www.sqlite.org/fts5.html
So I'd have to make "virtual tables" that contain another copy of every post, and if there's ever any down-time I'd have to remake these tables?

That doesn't sound very efficient or desirable, and I would consider using a separate module something special. Keep in mind, I'm not familiar with any of this.
>> No. 3360 [Edit]
>>3359
Note here that the "virtual" in virtual table is purely implementation difference, it behaves like any other sqlite table. So you have several options here:

* You could do as you suggested as maintain a separate copy of posts in a separate FTS table, but as you said this is undesirable since it requires more logic to manage replicated state

- Note that this is not as wasteful as it seems, because FTS4 supports external content references, so the virtual table only really stores the index rather than the text itself. You still have to do state management though, as stated in the fts4 doc - "in order to keep an FTS in sync with an external content table, any UPDATE or DELETE operations must be applied first to the FTS table, and then to the external content table"

Note sure what you mean by needing to re-make tables during downtime though. Just that any operation performed on the main table will also need to be performed on the shadow table.

* You can just have only the fts4 virtual table and use it as your main post table itself. You can exclude columns from the index (e.g. binary blob data) using the notindexed option. I'm not sure if this has any performance impact, you could test it out and verify.

Both seem like OK options to me, depending on how your schema currently looks like. It's also pretty easy to drop-in so you could defer this until later.
>> No. 3361 [Edit]
>>3360
>Note sure what you mean by needing to re-make tables during downtime though
These tables aren't stored in permanent storage, but ram memory if I understand correctly, or maybe a temporary file. That means, if the program ever terminates for whatever reason, these tables would need to be remade.

If they only contain references though, I guess that's not a huge deal. I'll think about it.
>> No. 3362 [Edit]
>>3361
No, as far as I understand they are proper tables. They're only "virtual" in that their on-disk format isn't necessarily the usual sqlite format. (If that didn't make sense, consider that for instance you can also map a csv file as a "virtual table", and then all sql operations will be materialized onto that csv file.)

Please link otherwise if I've misunderstood myself.. I've used FTS3 for a very simple dictionary lookup program before, and the FTS index it seemed to persist fine between terminations.

Post edited on 4th Jul 2022, 9:40pm
>> No. 3363 [Edit]
File 165714030222.png - (348.33KB , 1180x940 , subjects.png )
3363
Small update, sujects have been implemented. On a board, to go to a thread you click on its subject. If a thread has no subject, you click on a link that's labeled "Reply". I think this is both intuitive for users and cleaner than what's typically done(always a separate link).

edit: also, I've reached 1039 lines of code, not including templates and css. Sorry if that's disappointing to anybody.

Post edited on 6th Jul 2022, 1:54pm
>> No. 3364 [Edit]
>>3363
More than 1k lines of code is a bit surprising. Can you list line-counts by filename (or by function) and see what the bulk contributor is? It feels like it shouldn't take more than 1k
>> No. 3365 [Edit]
>>3364
Update is the longest at 260 lines
https://gitgud.io/nvtelen/ogai/-/blob/master/command/update.go

Maybe I could refactor to shave some lines down, but honestly I don't think it's worth it. The performance difference and decrease in executable size would probably be negligible.

Post edited on 7th Jul 2022, 6:39pm
>> No. 3366 [Edit]
>>3365
Also, to put things into a bit of perspective, kusuba x is over 4,000 lines and lynxchan is over 20,000 lines.
>> No. 3367 [Edit]
File 165740967864.jpg - (316.32KB , 2048x1821 , c3255351ea084256658105058decd2e7.jpg )
3367
I've ran into a huge problem. Table names can't be parametrized in SQL apparently. So you can't have statements like 'SELECT Id FROM ?' or 'CREATE TABLE ?'. Everything I've written so far has been with the assumption that this would be possible, because it just makes sense and is intuitive. I never, ever would have guessed that there would be a stupid fucking caveman restriction like this.

What am I supposed to do? Shove everything into the same 3 bloated, obese tables? Or never use prepared statements and take a massive performance hit? What's even the point of being able to make as many tables as you want if you can't do something this simple? I'm so close to just giving up because of this.

Post edited on 9th Jul 2022, 5:33pm
>> No. 3368 [Edit]
>>3367
Why do you need multiple tables generated on-the-fly?
>> No. 3369 [Edit]
>>3368
I don't. I need three tables for every board, and functions to retrieve data from a specific table based on a "board" value it gets from html forms.

I've settled on opening around (15 * 5 * # of boards) connections to the database, each with their own prepared statement.
>> No. 3370 [Edit]
>>3369
Can't you just add the board name as a column instead of needing 3 tables for each?
>> No. 3371 [Edit]
>>3370
I could, but that would bloat the tables and logic, so every single statement would have to check an extra value. It just feels wrong to me. You don't shove everything in the same folder, so why should I shove everything in the same table?
>> No. 3372 [Edit]
File 165741966580.png - (857.13KB , 1920x1054 , multi board progress.png )
3372
I have it set up now. Kind of overreacted and it ended up being pretty straight-forward to implement. I don't know the difference between good and bad practice when it comes to this, but it's working.

The structure I have is this:
channel (
map (key : string, value: map (
key: string, value : prepared sql statement)))

Post edited on 9th Jul 2022, 7:29pm
>> No. 3373 [Edit]
>>3371
>I could, but that would bloat the tables and logic, so every single statement would have to check an extra value.
I don't see what's wrong with that. It's not bloat, it's just an extra dimension you filter on. The folder analogy doesn't work because an RDMS isn't a hierarchical structure, each column is effectively another orthogonal dimension.
>> No. 3374 [Edit]
>>3373
Well, what's done is done. I hope there isn't any harm in the way I chose to implement things. Though if there is, I could change it.

>an RDMS isn't a hierarchical structure
I see. I'd prefer a hierarchical structure since that's the way I'm used to thinking about data.
>> No. 3376 [Edit]
File 165747128836.png - (291.01KB , 559x1149 , sage update.png )
3376
Added a sage option.
>> No. 3383 [Edit]
File 165904177542.jpg - (673.23KB , 1600x1088 , 23b9de3c2cfb67b29387bd835e226fa7.jpg )
3383
I'm thinking about how to implement the home page and I've realized >>3373 is right. Things would be far simpler if I had all posts be in the same table with a column for board value.

I'll have to spend some time this weekend restructuring .
>> No. 3384 [Edit]
>>3383
Getting the schema right is one of the hardest things about working with a DB. Schema migrations are a pain to do once you've filled it with data, so better now than later. (That's probably why people are attracted to schema-on-read solutions like storing documents in json, but that's an even bigger mess). Also that's a great haifuri image.
>> No. 3385 [Edit]
>>3384
Makes sense. Doing it this way does introduce a new problem in that I can't just use autoincrement for post ids. The value of an id will have to also depend on the value in the "board column" and it looks like sqlite doesn't have a way to do that automatically.

Post edited on 28th Jul 2022, 3:23pm
>> No. 3386 [Edit]
>>3385
Hm that's a good point I didn't think of. My initial thought would be to have a separate table with one row for each board that stores the next post number. You use this to get the post number to set whenever adding a new post, and then use a trigger (look into sqlite triggers) to automatically increment whenever a row gets added to posts. Maybe you can also avoid the additional DB lookup for each post by caching the next post number in memory, but you'd have to go through your code carefully to make sure that the in-memory counter is always consistent with the persisted on-disk one.

There's also the disadvantage you no longer can have a primary key constraint on the post id column (since uniqueness can be violated).

I wonder how vichan/tinyboard have their schema set up.
>> No. 3387 [Edit]
>>3386
Based on https://github.com/vichan-devel/vichan/blob/19151def82e9c2033429e0439f2f86087452204d/templates/posts.sql

looks like they go with the "one table per board" approach. And they get around the prepared-statements issue by doing a hybrid approach where they format the build the sql string manually with the board name and then go ahead and parameterize the column

https://github.com/vichan-devel/vichan/blob/master/post.php#L216

I guess you could do something similar where you basically cache the prepared statements for each table.

It's not clear to me offhand which approach is overall better though – one table for each board simplifies post-id related state (can use autoincrement, unique constraint on primary key probably speeds up post-id query) but makes queries involving multiple boards annoying. Vice-vera for one table for all boards.
>> No. 3388 [Edit]
>>3387
I'm thinking of using two statements, one which uses MAX to get the current largest id in a board, and another to actually insert the post.

Do you see a disadvantage in this approach?
>> No. 3389 [Edit]
>>3388
>one which uses MAX to get the current largest id in a board
You'd need to ensure both statements run in the same transaction for correctness guarantee. If you do that it should work, but I suppose my only potential concern with this is performance, for 2 reasons: the more minor one is that you're doing an indexed lookup on each post, but that's only a log(n) factor assuming they use a b-tree index or whatever (you should verify that sqlite does use single index lookup as opposed to a full-table scan though, otherwise you'll kill your throughput. You probably want to create a multi-column index on board-name + post-id). The more concering part is that it might impact throughput since every DB write now requires a read (from the same table) as well, leading to locking – I'm not well versed in the details of how sqlite handles transaction concurrency, and I think sqlite only supports DB-level locking anyway (as opposed to other RDBMS which can do fine-grained table or row-level locking), so it probably wouldn't make much of a difference whether you used a separate table to store the next post id or not.

But out of curiosity, why not go with the separate table to hold next post id approach? That way you can avoid having to do a MAX, and can let sqlite handle the lookups.

Basically your insert just becomes a single statement

INSERT INTO POSTS (id, "board1", "post_body", etc.) SELECT id FROM next_post_id WHERE board_name = "board1"

and then you have an ON INSERT trigger that updates the next_post_id table.

(If you're really interested in optimizing, you should probably avoid looking up board-name by string and instead have an enum mapping somewhere from board-name to an int board id, and use the board id everywhere).
>> No. 3390 [Edit]
>>3389
>You'd need to ensure both statements run in the same transaction for correctness guarantee.
Well shoot. I have another problem then. I use a read statement to check if a post's "parent"(given through an html form) exists or not, and if it doesn't, the post is treated as a new thread. I'm not sure whether or not this is a problem, but I don't think it can cause too much issue.

>But out of curiosity, why not go with the separate table to hold next post id approach?
I guess I will, but it feels clumsy. I'm growing increasingly frustrated with SQL's inflexibility. I'd rather use something that's a bit slower, but much smarter.
>> No. 3391 [Edit]
>>3390
> I use a read statement to check if a post's "parent"(given through an html form) exists or not, and if it doesn't, the post is treated as a new thread

My brain is a bit fried at the moment, why do you need to check if the parent exists or not? If a thread id is not supplied, you should treat it as a new thread. If it is supplied and the thread id is valid, treat it is a new post in that threat. If it is supplied and such a thread does not exist, reject the request.

Regardless, I think if you don't lookup thread id and write to the table in the same transaction, there's a risk of a race-condition if the thread is deleted after the lookup but before the insertion.

Consider (with T1 and T2 being separate system threads)

T1) Handle request: Post{thread_id = 123, body='my cool post'}
T1) Lookup thread_id = 123. Found
T2) Handle request: DeleteThread{thread_id = 123, password='hunter2'}
T2) Deletes Thread
T1) # At this point T1 believes the thread exists and tries to insert a new post for that thread. This is invalid state though, as the thread no longer exists so request should fail.

Note that despite SQLite preventing multiple writers, it's the lack of transaction on the read-before-write that causes you to write inconsistent state. Isolation is the key property to look at here [1]. This is really something that all DBs should make clear, but unfortunately you have to read inbetween the lines a lot to figure out exactly what guarantees a DB provides.

I think SQLite actually does provide read-write transactions though, via the paragraph that mentions that on an attempt to escalate from a read-snapshot to a write, the txn will abort if it was modified by any other writer. This seems to be a form of optimistic locking, leaving you to deal with the retry. It also seems to have a form of pessimistic locking via "BEGIN IMMEDIATE". There are apparently some weird gotchas where transactions basically only work if each transaction is one a separate db connection, but I think this is usually what you do assuming you have one thread per connection.

Even Postgres honestly makes it hard to figure out exactly what guarantees it provides, by default it's "read committed" which doesn't actually give you a read-write transaction.

[1] https://www.sqlite.org/isolation.html

Also once you start a transaction you can still do non-sql things before committing the transaction. In such a way you aren't limited to doing logic in SQL.

>I'm growing increasingly frustrated with SQL's inflexibility. I'd rather use something that's a bit slower, but much smarter.
I think the restrictions imposed by SQL (the so-called inflexibility) is what forces you to define your things in a way that actually scales. If it was looser with things, then in order to guarantee correctness it would have to be even more pessimistic, and basically serialize all transactions, hurting throughput.

But yes SQLite isn't the poshest db in terms of what you have to work with. It does the job, but it's a bit barren. Postgres is one level up (I think it's the best OSS db?) in terms of things it can do, but there's a lot more complexity in configuring it. And if you go to paid offerings, things like gcp bigquery can do absolutely amazing data transformations directly in sql.

Post edited on 28th Jul 2022, 11:08pm
>> No. 3392 [Edit]
>>3391
>why do you need to check if the parent exists or not?
To prevent replying to nonexistent threads. I didn't think to have the new thread condition being no provided parent because the statements which insert new posts require a parameter for parent thread. I also didn't know my implementation could cause issues.

Additional logic will need to be added, but doing it in the way you suggest should be straight forward.
>> No. 3393 [Edit]
>>3391
On second thought,
>If it is supplied and such a thread does not exist
How would I do this without using sql to check? It seems the only solution is shoving the parent check into the same transaction as the writing, something I don't know how to do. And I have no idea how the logic of that would work, unless I'd have to write the logic in sql too. In that case, it doesn't really matter what condition I use to determine if a new thread should be made, except maybe a slight performance advantage.

Post edited on 29th Jul 2022, 7:44am
>> No. 3394 [Edit]
>>3393
>And I have no idea how the logic of that would work, unless I'd have to write the logic in sql too
No, as long as you issue a BEGIN TRANSACTION you can interleave code-logic with sql queries. In python, you'd do something like


db.execute('BEGIN TRANSACTION')
v = db.execute('SELECT * FROM foo').fetchone()
v = frobincate(v)
db.execute('INSERT INTO foo VALUES(?)', v)


And the second execute will throw an exception if the db-lock could not be escalated, leaving you to handle the retry.

Do note that SQLite doesn't supported nested transactions (at least I think it doesn't?) so if you're already inside a transaction context you should re-use it when e.g. determining the next thread id number.
>> No. 3395 [Edit]
Although this does make me appreciate that vichan is actually pretty lightweight considering all it supports. I.e. when you consider that it also handles the long-tail of imageboard needs (mod support, markup, captchas, thumbnail generation, banning) then the line count seems pretty reasonable (at least much more so than lynxchan).
>> No. 3396 [Edit]
>>3394
Hmm, I'll look into this. Not sure if it'll work with prepared statements.
>> No. 3397 [Edit]
sigh
https://stackoverflow.com/questions/5391564/how-to-use-distinct-and-order-by-in-same-select-statement
https://blog.jooq.org/how-sql-distinct-and-order-by-are-related/
>The SQL language is quirky. This is mostly because the syntactical order of operations doesn’t match the logical order of operations.

Post edited on 29th Jul 2022, 3:43pm
>> No. 3398 [Edit]
>>3397
Yes, that's a peeve of mine too. And this leads to a bunch of annoyances like not being able to reference column aliases in the where clause (but you can in the group by).

C#'s LINQ is what sql should have been as a dsl.
>> No. 3399 [Edit]
>>3391
>isn't the poshest db in terms of what you have to work with
It's worth noting that sqlite supports extensions so you can get slightly closer to bigquery-level stdlib with something like https://github.com/nalgeon/sqlean

That said, these should really be thought of as row-valued functions (which can be used in e.g. a SELECT clause to transform some value into another value) and table-valued functions (which can produce an entire table, perhaps lazily). They can be useful to avoid dropping down into code for such data-transformations, but but it wouldn't solve your particular issue.
>> No. 3400 [Edit]
File 165936936077.png - (505.50KB , 1896x1958 , homepage progress.png )
3400
It's not integrated yet, but I've been working on a prototype of the home page. I think for every other "tab", the contents will be pulled from a text file the owner edits manually. While this does require them to know a little html, it offers the most flexibility for them and the least amount of work for me.
>> No. 3401 [Edit]
>>3400
Looks good, and I like the feel of the board (although I personally prefer the cool blue of yotsuba v2 myself).
>> No. 3402 [Edit]
File 165938614891.png - (489.10KB , 1892x1944 , buri home.png )
3402
>>3401
>I personally prefer the cool blue of yotsuba v2 myself
Theming is fairly straightforward. Also made it more compact for functionality's sake.
>> No. 3411 [Edit]
File 166873284684.webm - (830.37KB , homedemo.webm )
3411
It's been a while. I've implemented half of the home page, the most flashy parts. File is my clumsy demo. Next I'll work on nitty gritty transaction stuff like discussed above.

edit: sometimes posting is very fast, other times it takes a while. File size is a factor, but doesn't seem to be the main one.

Post edited on 17th Nov 2022, 5:10pm
>> No. 3412 [Edit]
>>3411
Nice, I was wondering whether you were still alive. You should scatter some logging/timing statements about to see which part is taking the most time. Writing a single DB row should not take long at all, so I suspect something weird is going on, like you're using read/write transaction for read-only SQL, which will kill your throughput (I recently made this mistake).
>> No. 3413 [Edit]
File 166874465133.jpg - (96.52KB , 507x732 , 6bd4154a7d02b0e63fc189b82bf71b3e.jpg )
3413
>>3412
>Writing a single DB row should not take long at all
When a new post with an image is made, several things happen. The post is checked in various ways, goes through a bunch of formatting functions, its parent form value is checked(read), if it's not found, the latest id for a board is retrieved(read) its subject, if it has one, is added to the subject table(write), if it has an image attached, a thumbnail is created, the thumbnail's info is added to the homethumb table(write), the post is added to the posts table(write) and if not empty, also the homepost table(write), if the post replied to anyone, all of those are added to the reply table(variable writes).

Then, the thread(also home, board and catalog) is rebuilt, by doing a bunch of reads and feeding that info into a template. So there's a lot going on every time a post is made. Adding timers sounds like a good idea, but I'm not sure whether there is a way to speed things up with all this stuff happening behind the scenes. So far, I'm not using any transactions, so maybe that's the bottleneck?

Post edited on 17th Nov 2022, 8:14pm
>> No. 3414 [Edit]
Checked with timers. The initial check actually took the longest(but only stalls sometimes), which has nothing to do with sql. Every other part takes less than a second.

Post edited on 18th Nov 2022, 9:20am
>> No. 3415 [Edit]
>>3414
>>3413
Can you post the timing breakdown? Even 1 second per post is too much, SQLite can easily handle _at least_ 10K QPS writes, so it should almost never be the bottleneck. Transactions should not make a difference in terms of speed for the case of 1 single post.

>So there's a lot going on every time a post is made
It honestly is not much, you should only be limited by IO throughput to/from disk, everything else is done in-memory, just simple lookups and string operations. So if it's taking more than a few dozen ms, something is very wrong, I suspect some of the template building is not as optimized as it should be.

> The initial check actually took the longest(but only stalls sometimes), which has nothing to do with sql. Every other part takes less than a second
Yes, that's what I would expect. SQLite will almost never be the bottleneck, and I'd bet good money on that. (And if your benchmarks revealed sqlite was a bottleneck, I'd again bet good money that it's because you're doing your queries ineffeciently (e.g. using a write txn where it's not needed, haven't enabled WAL mode, etc.).

Can you elaborate more on your checks? What are they doing? For the formatting, I recall you did this via regex, and go uses re2 so that should be a roughly linear pass over the input, which should be OK.
>> No. 3416 [Edit]
>>3415
>SQLite can easily handle _at least_ 10K QPS write
Actually that's not quite true, I was mixing up parallel and sequential workloads.

If doing sequential updates one after the other with standard rollback journal, you'll only get around 50 updates per second. The performance will be terrible unless you enable write ahead logging. With WAL enabled, you should be able to hit 1k or so. Doing multiple updates in a single transaction will also help.

When talking about how SQLite can scale to multiple parallel writers, the answer is that it doesn't, it enforces db-level locking to serialize the writes. So multiple parallel writers is equivalent to multiple sequential writers, so refer to above paragraph.

For multiple sequential readers, SQLite can do pretty well: https://www.sqlite.org/np1queryprob.html. A given query should not take more than 1ms or so, so you get around 1k qps. Usually the bottleneck will be IO speed, or ffi interop between your language and sqlite.

Multiple parallel readers is where SQLite really shines, you'll probably hit file descriptor or memory limits before sqlite becomes the bottleneck. I'd guesstimate maybe 10k-100k, although apparently with a beefy machine and the right configs you can get to 4M

https://www.sqlite.org/speed.html
https://blog.expensify.com/2018/01/08/scaling-sqlite-to-4m-qps-on-a-single-server/
>> No. 3417 [Edit]
>>3416
Although I must say I got sniped into benchmarking this on my personal usages of sqlite where I have a particular case that involves a ton of sequential bulk-lookups. Each individual lookup completes in about 2ms or less, but with 1000+ lookups that's easily 2 seconds. And the time for a single query is same whether I use sqlite3 cli or from my program, so ffi time is irrelevant here.

I do observe that batching reads using transactions does help, which was actually surprising to me, but I guess even read transactions involve posix file-level locking: https://stackoverflow.com/questions/7349189/optimizing-select-with-transaction-under-sqlite-3

So I guess 500-1k selects per second is indeed about right.
>> No. 3418 [Edit]
File 166879209673.png - (23.03KB , 400x400 , a90508beeacbc15deb0e36716223dee0.png )
3418
>>3415
>>3416
>>3417
>Can you post the timing breakdown?

2022/11/18 11:55:56 Starting: 2022/11/18 11:56:02 Initial check complete: 2022/11/18 11:56:02 Formatting complete: 2022/11/18 11:56:02 Parent checking & subject adding complete: 2022/11/18 11:56:02 Image check complete: 2022/11/18 11:56:02 Image copying complete: 2022/11/18 11:56:02 Thumbnail making complete: 2022/11/18 11:56:02 Table inserting complete: 2022/11/18 11:56:02 Template making complete: 2022/11/18 11:56:02 Creating new thread file complete: 2022/11/18 11:56:02 Post retrieval complete: 2022/11/18 11:56:02 Subject retreival complete: 2022/11/18 11:56:02 Template execution complete: 2022/11/18 11:56:02 Complete: 2022/11/18 11:56:02 Indirect time:


I don't think my source code is too spaghetti to read. The starting point to the initial check being finished is this block of code.

if req.Method != "POST" { http.Error(w, "Method not allowed.", http.StatusMethodNotAllowed) return } file, handler, file_err := req.FormFile("file") no_text := (strings.TrimSpace(req.FormValue("newpost")) == "") if file_err != nil && no_text { http.Error(w, "Empty post.", http.StatusBadRequest) return } req.Body = http.MaxBytesReader(w, req.Body, max_upload_size) if err := req.ParseMultipartForm(max_upload_size); err != nil { http.Error(w, "Request size exceeds limit(10MB).", http.StatusBadRequest) return } post_length := len([]rune(req.FormValue("newpost"))) if post_length > max_post_length { http.Error(w, "Post exceeds character limit(10000). Post length: " + strconv.Itoa(post_length), http.StatusBadRequest) return } parent := req.FormValue("parent") board := req.FormValue("board") subject := req.FormValue("subject") option := req.FormValue("option") if parent == "" || board == "" { http.Error(w, "Board or parent thread not specified.", http.StatusBadRequest) return } if !(slices.Contains(Boards, board)) { http.Error(w, "Board is invalid.", http.StatusBadRequest) return }


Maybe slices is the problem? It's part of the standard library and all I use it for it checking if the given board is part of the valid board array. Maybe I should make the board array into a hashset or something and do away with it. In any case, I'll probably need to subdivide this block with timers to figure out the exact thing that which takes the longest.
>> No. 3419 [Edit]
>>3419
>>3418
Did the timestamps get cut off or did you only log at seconds-level granularity? That's way too coarse to be useful, log in millis.

Also Go should have some profiling tool that will give you a flamgraph.

Post edited on 18th Nov 2022, 10:45am
>> No. 3420 [Edit]
>>3419
Those are the default time stamps. I'll see if I can change that. And post results.
>> No. 3421 [Edit]
>>3420
Basically have a global var for lastTimestamp and function like endTimeAndRecord(msg) which will log(msg, now - lastTimestamp) and set lastTimestamp = now. Then you just need to initialize lastTimestamp = now at the very beginning of the request, and anywhere in the request lifecycle you can add endTimeAndRecord(msg) and it will log the delta since the last log message.
>> No. 3422 [Edit]
>>3421
Go's log has a millisecond option. I think I've narrowed it down pretty conclusively. Two times it took a while -

2022/11/18 14:33:33.658508 Starting 2022/11/18 14:33:33.658688 Method check complete 2022/11/18 14:33:39.516935 FormFile get complete 2022/11/18 14:33:39.517187 No text get complete 2022/11/18 14:33:39.517293 Empty test complete 2022/11/18 14:33:39.517385 Size check complete 2022/11/18 14:33:39.517472 Text length check complete 2022/11/18 14:33:39.517559 Meta get complete 2022/11/18 14:33:39.517644 Empty parent or board check complete 2022/11/18 14:33:39.517731 Valid board check complete. Initial check complete.


2022/11/18 14:41:36.135516 Starting 2022/11/18 14:41:36.135969 Method check complete 2022/11/18 14:41:40.612392 FormFile get complete 2022/11/18 14:41:40.612965 No text get complete 2022/11/18 14:41:40.613319 Empty test complete 2022/11/18 14:41:40.613516 Size check complete 2022/11/18 14:41:40.613685 Text length check complete 2022/11/18 14:41:40.613844 Meta get complete 2022/11/18 14:41:40.613969 Empty parent or board check complete 2022/11/18 14:41:40.614087 Valid board check complete. Initial check complete.


Almost every step takes about .00100 milliseconds, except the formfile one, which is a single line of code. It takes waaaaay longer.

file, handler, file_err := req.FormFile("file")

>> No. 3423 [Edit]
>>3422
I don't know anything about go, maybe see this [1]. It should not be taking 4 seconds to parse the input request, unless the client is intentionally throttling their upload.

[1] https://old.reddit.com/r/golang/comments/nihs0n/help_needed_for_uploading_large_file_with_little/
>> No. 3424 [Edit]
Also interesting https://sqlite-users.sqlite.narkive.com/2KMQOyUd/performance-of-select-in-transactions

If you're doing on the order of ~10k select queries, the cost of acquiring/releasing the shared lock on the DB starts to add up. The linked post mentions about a 1 sec difference (1 sec with all in 1 transaction, 2 sec if done individually), but I'm seeing much more, about a 6 seconds difference on about 50k reads (2 sec if done all in 1 transaction, 8 sec otherwise). It's possible that the language binding I'm using is doing some extra bookkeeping at the start of every transaction, and that's also contributing.
>> No. 3425 [Edit]
>>3423
The first recommended strategy in that thread (uploading in a stream), is good for uploading large files on systems with low ram, to prevent the OS from killing the program when ram runs out. My research did not indicate there would be any other advantages, like performance.

So I think the bottle neck is actually my hardware. I'm only working with 8gb of ram, a lot of which is being used up by other things(it's my laptop).

This link has a good explanation
https://dev.to/tobychui/upload-a-file-larger-than-ram-size-in-go-4m2i

Post edited on 18th Nov 2022, 7:11pm
>> No. 3426 [Edit]
>>3425
But isn't the file you're uploading only a few megs? Unless you're hitting swap I still don't see why there'd be a several second delay.

Some more tests to try: if you post replies without any images, is it always fast? I recommend you maybe log the request processing times to a file somewhere, so you can create a histogram and get the p50, p95, and p99 latencies. Is the lag dependent on file upload size? If you are indeed memory bound and hitting swap, uploading a kb file should not trigger anything while uploading a several MB file shoould.
>> No. 3427 [Edit]
File 16688294683.png - (1.51MB , 900x900 , yuki.png )
3427
>>3426
>if you post replies without any images, is it always fast?
Yes.

>log the request processing times
Don't know how to do that. Do you mean processing times according to my browser? At this point, I think I've already pinpointed the culprit to that one function.

>Is the lag dependent on file upload size?
Larger files always take longer. The ones below 1mb are usually instantaneous.

>uploading a kb file should not trigger anything
It's probably asking too much, but checking out the performance on another machine would be helpful. If the problem persists on a beefier machine, I'll know my hardware isn't likely the problem. If it counts for anything, TC is usually slower.

Post edited on 18th Nov 2022, 7:46pm
>> No. 3428 [Edit]
>>3427
Maybe someone else who already has a Go environment set up can assist with testing (I don't want to download golang just for this..)

I still don't think it's a hardware issue though. Assuming you're on linux with just 1 tab open on a browser, there should still be plenty of headroom. What's your swap utilization?
>> No. 3429 [Edit]
File 166883071273.png - (84.18KB , 2550x952 , curr usage.png )
3429
>>3428
>Assuming you're on linux with just 1 tab open on a browser
Nope, to all of that.
>> No. 3430 [Edit]
>>3429
You've still got about a gig left though. So I don't see why you'd be running into issues ingesting a few megs. I can't help much more, I don't know anything about that particular go library. You can probably profile inside that function with gdb or pprof or something, to see if there's anything blocking. Or use strace/bpftrace to see if you're being blocked on a syscall or something.

Worst case there should surely be some 3rd party replacement, just use that.
>> No. 3431 [Edit]
>>3430
>You can probably profile inside that function with gdb or pprof or something, to see if there's anything blocking. Or use strace/bpftrace to see if you're being blocked on a syscall or something.
That's way above my pay grade. I don't know if it was you, but somebody expressed a desire to use the engine I'm writing. If it's a problem for them as well, they can do that testing and propose a solution. I'm all ears. The advantage of oss is collaboration. I'm not selling a product.
>> No. 3432 [Edit]
>>3431
After doing more testing and looking at the source of things, I've narrowed it down to a function called ReadForm
https://cs.opensource.google/go/go/+/refs/tags/go1.19.3:src/mime/multipart/formdata.go;l=34

From the source code, it's described like this:
> ReadForm parses an entire multipart message whose parts have
> a Content-Disposition of "form-data".
> It stores up to maxMemory(parameter) bytes + 10MB (reserved for non-file parts)
> in memory. File parts which can't be stored in memory will be stored on
> disk in temporary files.

Can't dig any deeper because of private fields in necessary structs preventing me from copying code and adding timers.
>> No. 3433 [Edit]
File 166893071811.png - (371.17KB , 2269x2083 , profile005.png )
3433
>>3432
Also, here's a picture showing allocated memory. Doesn't really help me though.
>> No. 3434 [Edit]
>>3433
I've figured out that I can edit GO's source code, which comes with every installation of GO, and add logs to that. So I've narrowed it down even further to an io function called CopyN
https://cs.opensource.google/go/go/+/refs/tags/go1.19.3:src/io/io.go;drc=58a2db181b7cb2d51e462b6ea9c0026bba520055;l=361

> CopyN copies n bytes (or until an error) from src to dst.
> It returns the number of bytes copied and the earliest
> error encountered while copying.

I can't add logs to io, because the log package depends on io itself. I'm farily certain though that io.copyBuffer is ultimately responsible.

https://cs.opensource.google/go/go/+/refs/tags/go1.19.3:src/io/io.go;drc=58a2db181b7cb2d51e462b6ea9c0026bba520055;bpv=1;bpt=1;l=405

> copyBuffer is the actual implementation of Copy and CopyBuffer.
> if buf is nil, one is allocated.

Even if I knew what could be changed, I don't really feel that editing GO's source code is an appropriate solution to any technical problem.
>> No. 3435 [Edit]
>>3434
Seems like it's just blocking on IO then. Are you on ssd or spinning rust? But I still can't see why it'd block for 4 whole seconds to write a few mb to disk.

Post edited on 20th Nov 2022, 1:20pm
>> No. 3436 [Edit]
>>3435
>Are you on ssd or spinning rust?
SSD, 256gb.
>> No. 3437 [Edit]
>>3436
WSL might be at fault. Will try moving data outside of /mnt/ and see if that helps.
https://github.com/Microsoft/WSL/issues/873#issuecomment-411845341

Edit: did that and maybe there's some marginal improvement, but it's still not fast on files over 1mb.

Edit2: others have the same problem
https://github.com/microsoft/WSL/issues/4498#issuecomment-1265178042

Post edited on 22nd Nov 2022, 9:51am
>> No. 3438 [Edit]
File 166903915462.webm - (616.83KB , speedshow.webm )
3438
Sorry, for harping on this, but after just restarting my computer, it's now super fast, as you can see in this recording.
>> No. 3439 [Edit]
>>3438
I haven't been following closely as most of this thread's content is a bit beyond me, but I do like you having the reply box at the bottom of the page. Looking good.
>> No. 3440 [Edit]
File 167018546394.png - (3.89KB , 334x215 , tooltip.png )
3440
Finally changed posting and related queries to be inside an sql transaction. New threads are indicated by no parent being given, which removes the possibility of accidentally replying to a thread when you meant to make a new one.

Also changed the configuration format from TOML to INI. I'm not opinionated on confirmation formats given my little experience. Reading this article convinced me that INI is better. Go also has a nice INI library.
https://github.com/madmurphy/libconfini/wiki/An-INI-critique-of-TOML

Post edited on 4th Dec 2022, 12:25pm
>> No. 3441 [Edit]
File 167095428367.png - (325.11KB , 1890x854 , home link added.png )
3441
Small navigation improvements.
>> No. 3442 [Edit]
File 167106514813.webm - (896.22KB , crossdemo.webm )
3442
A demo showing the recent additions and improvements, cross board replying and banners being the two biggest.

Just like with themes, I used nginx text replacement to implement banners instead of javascript. While js is typically used now, some older engines use php.
>> No. 3443 [Edit]
File 16711668133.png - (308.37KB , 1340x942 , peachdemo.png )
3443
New theme. I'll now get started on the actual management side of things: deleting posts, authentication, user credentials, etc. Will have to research the proper way to do that...
>> No. 3444 [Edit]
File 167123265076.png - (400.55KB , 1882x858 , flash.png )
3444
I had a realization. I can implement reply viewing without javascript, by retrieving the reply content during page generation. When I wrote >>/navi/2731 I didn't know about template functions, but today I put two and two together.

The old approach worked like this: every time somebody would hover over a reply, javascript(through htmx) would be used to make a request and place the response(from a query) into a box element based on html id. Now, during page generation, post content is searched, for empty boxes, a query is made, and the result is placed into the box.

This has a few advantages and disadvantages. Advantages are that javascript isn't needed, reply viewing is instantaneous, and there's no need to make randomized ids to distinguish between target boxes(a requirement when using htmx). htmx would also inexplicably fail to fill a box sometimes, when the replies were hovered over in a certain order.

Disadvantages are that page generation will be slightly slower, but this may be mitigated by simpler content formatting. Another is that a bad actor could reply to like 100 posts, and until that post is deleted, it'll slow down every new post in that thread. Page loading itself could also slow down since the page will be larger. If I implement post editing, until a thread is updated, replies to an edited post will show a version from when the reply was made, and not the current version.

I decided to mitigate this by limiting the number of (viewable)replies in a post to 4, after that the boxes will just remain empty. 4 seems like a reasonable number to me, and I already thought mass replying is annoying.
>> No. 3445 [Edit]
>>3444
I don't get it, are you basically just pre-rendering all the previews? Can you post a sample html of how this actually looks like from the client-side? Why wouldn't it work with edits, you have to regenerate the page anyway? And doesn't this mean if 10 different people all quote the same post, you're going to have 10 copies of that post in the markup?

Post edited on 16th Dec 2022, 3:27pm
>> No. 3446 [Edit]
File 1.zip - (1.06KB )

3446
>>3445
Sure. Here's a zip file with the html a client gets.

In the thread html template, there is this
{{range slice .Posts 0 1}} ....other stuff..... <p class="pcontent">{{getprev .Content}}</p>


getprev is a template function defined in update.go
var Prevboxreg = regexp.MustCompile(`<box class="prev" board="(\S+)" sid="(\S+)">`) var prevfuncmap = template.FuncMap { "getprev": func(content string) string { counter := 0 output := Prevboxreg.ReplaceAllStringFunc(content, func(match string) string { if counter > 3 { return match } else { cbox := Prevboxreg.FindStringSubmatch(match) prevcontent := Get_prev(cbox[1], cbox[2]) counter++ return match + prevcontent } }) return output }, }


Get_prev does the database query using the subgroups in the Preboxreg regular expression.
When a new post is made, the template is used to generate a new html page for the thread.
>> No. 3447 [Edit]
>>3445
>Why wouldn't it work with edits, you have to regenerate the page anyway?
If you reply to a post in another thread.
>if 10 different people all quote the same post, you're going to have 10 copies of that post in the markup?
Yes, but I don't think this will actually make much of a difference in load times. I could be wrong about that though.
>> No. 3448 [Edit]
>>3447
Ok, I see what you mean. And then I assume you're using CSS to hide the pre-rendered preview so it only shows on hover. I don't think this is a good idea, you've eliminated the JS dependency at the expense of introducing a CSS one, and you've polluted the HTML markup in the process. I don't think the tradeoff is worth it for this particular case. If you view the website without CSS (e.g. via lynx, or more usually via google cache, or text archives) then the result will be a mess where it's impossible to follow due to the duplication of posts.

As an example, view TC without stylesheets enabled at all. It's barren, but the essential structure of the document is still there and followable. It'd be better if TC used a default tags h1/h2 etc. and then restyled them via css so that by default there'd be more separation between posts, but other than that it's perfectly serviceable.

If you really do want to keep the pre-rendered HTML, maybe wrap it in html's summary tag and then style that. That way, any html5 compliant viewer will hide it by default, even without css enabled.

Post edited on 16th Dec 2022, 3:46pm
>> No. 3449 [Edit]
>>3448
>you've eliminated the JS dependency at the expense of introducing a CSS one
The CSS dependency was always there.
>maybe wrap it in html's summary tag and then style that
I will look into that considering the cached version being a mess problem. Performance is definitely a concern for me, but in all honesty, I don't care about lynx users enough to base my decisions on them.

edit:
After looking into it, I decided to go back to the old method, which does not have a problem with css-less display.

Post edited on 16th Dec 2022, 5:02pm
>> No. 3450 [Edit]
File 167202914573.png - (1.28MB , 1240x944 , video embed.png )
3450
Added video embedding. Instead of directly embedding youtube videos, it embeds them through an invidious instance. Hopefully this has privacy benefits. Finding an instance that allows embedding in an iframe, wasn't that easy.

The prior mentioned "details" tag is perfect for this use case, where you click to show or hide the embedded video. No js needed for that.
>> No. 3451 [Edit]
File 167212413877.png - (822.91KB , 1920x1864 , firefox is stupid.png )
3451
Firefox's implementation of lazy loading is dumb.
>The loading attribute on an <img> element (or the loading attribute on an <iframe>) can be used to instruct the browser to defer loading of images/iframes that are off-screen until the user scrolls near them.
It's not on-screen, you fuck, it's hidden by the details tag. That's even stated in your documentation
>The <details> HTML element creates a disclosure widget in which information is visible only when the widget is toggled into an "open" state.

>Chrome uses the following criteria to determine whether an iframe is hidden:
>display: none or visibility: hidden is applied.
Wow, that's nice. Firefox doesn't support that either. I don't even really want "lazy" loading, I want something to not load, until another thing is clicked, without using js.

Post edited on 26th Dec 2022, 10:56pm
>> No. 3452 [Edit]
>>3451
Apparently firefox doesn't support it for iframes at all.
https://caniuse.com/loading-lazy-attr
https://bugzilla.mozilla.org/show_bug.cgi?id=1622090
>> No. 3453 [Edit]
>>3452
Just leave it as loading=lazy, it will at least work for webkit based browsers and it's not too bunch of a bandwidth issue for firefox since the video itself is only actually loaded when the user clicks play.
>> No. 3454 [Edit]
File 167276327671.jpg - (238.38KB , 715x1024 , ea68e9049ab71829f3f89c417056cbeb.jpg )
3454
I've added account creation(for the admin at least), and session cookies. Using argon2 to store password, since I've heard it's the best option, and it won some competition.

Now I have a decision to make: add the controls to the posts(delete, ban, etc.), or put those options in separate, generated pages. I don't know what the norm is, or which would be better, so I'd like some suggestions. Editing the normal pages, based on session cookies, might be tricky without php, but would have the benefit of simplicity.
>> No. 3455 [Edit]
>>3454
you don't use session cookie returned by the server to edit, at least not in the kusaba implementaiton. The session cookie only holds the default generated password, but the user can override that by supplying their own password if they want persistent sessions. This way also works without JS since it's just a form submit. You'd store that password along with the post. I don't immediately see the difficulty having the link to edit the post inline, when they click on it, just go to a separate edit form.
36 posts omitted. First 100 shown. [Return] [Entire Thread] [Last 50 posts] [First 100 posts]

View catalog

Delete post []
Password  
Report post
Reason  


[Home] [Manage]



[ Rules ] [ an / foe / ma / mp3 / vg / vn ] [ cr / fig / navi ] [ mai / ot / so / tat ] [ arc / ddl / irc / lol / ns / pic ] [ home ]