Where the autistic get artistic.
[Return] [Entire Thread] [Last 50 posts]
Posting mode: Reply
Subject   (reply to 3322)
BB Code
File URL
Embed   Help
Password  (for post and file deletion)
  • Supported file types are: FLV, GIF, JPEG, JPG, MP3, OGG, PDF, PNG, RAR, SWF, WEBM, ZIP
  • Maximum file size allowed is 7000 KB.
  • Images greater than 260x260 pixels will be thumbnailed.
  • Currently 1022 unique user posts.
  • board catalog

File 165518288236.png - (3.87MB , 3808x940 , comp6.png )
3322 No. 3322 [Edit]
I don't want to keep flooding the /navi/ thread, so I'll post updates here from now on. Crushed some bugs, and added some features, including rate limiting(ended up only needing the stdlib for that).

I added a feature I'm slightly unsure about since it's pretty unconventional. CSS allows you to define the maximum height and width an image can occupy based on the viewport. I decided to use this to limit how large a thumbnail can expand. I think this improves the user experience, since you'll never scroll to look at an image piece by piece, or open it in a new tab.

I'm a little worried it'll mislead people into thinking images are lower res than they really are . The file info does include the real image dimensions though. Pic is a comparison of my behavior compared to tc's.


Post edited on 13th Jun 2022, 10:10pm
Expand all images
>> No. 3341 [Edit]
File 16554060588.png - (175.51KB , 1056x896 , update.png )
I've implemented the reply to feature. Also a file size and character limit.

Next, I'm going to add support for a configuration file, which I'll use the TOML format for cause I've heard good things about it. The location will be hardcoded to /ect/ogai.toml since that's kind of standard. This will let the base path of the imageboard be chosen by the user more easily.

After that, I'll finally get started on multiple-thread support.
>> No. 3342 [Edit]
So what will your imageboard be about? What is the main audience? It's always nice to see new chans coming to life.
>> No. 3343 [Edit]
File 165542332882.png - (198.58KB , 500x500 , saya.png )
The theme is Saya no Uta(hence why the engine is called Ogai). It's hard to pin down the audience(I'm not sure they even exist), but for the most part I'd like it to be like tc with more of a technology and creativity focus.

I don't have a waifu myself, so I don't plan for that to be part of it, and I don't want there to be as many restrictions on 2d adult content. Posting loli anywhere will be fine. Made this thread a while ago http://tohno-chan.com/ot/res/32344.html

Ideally, people who have a personality more similar to mine would come, but I'm kind of doubtful that will happen. If nothing else, maybe what I'm making will suit someone's needs better than anything currently available.
>> No. 3344 [Edit]
>It's hard to pin down the audience(I'm not sure they even exist), but for the most part I'd like it to be like tc with more of a technology and creativity focus.
What I fear is that usually unless new imageboards have a very specific niche not served by existing ones, activity on one usually will cannibalize the others. As much as I'd like to say that it isn't a zero-sum game, people don't usually bother cross-posting threads, so any thread made on one is at the expense of the other, and so on.
>> No. 3345 [Edit]
You're right. However, while I don't have a very specific niche in mind, I do feel tc is missing something. Few posts are related to what users make. Part of that is probably because not many people here make things. Maybe the people needed just aren't anywhere in the "image board space", but I'd like to imagine threads being made about long-term projects. "Productive" hobbies rather than just consumption hobbies. Somewhere people could bounce ideas around and give advice, and in other fields besides programming too.

Relaxed restrictions on what images people can post also interests me.
>> No. 3346 [Edit]
I'd like that - I've enjoyed reading about your process of working on the imageboard, so a board centered around tech projects would be up my alley. Although I suppose there's a fine line between having that turn into what's essentially a blog. TC has /cr/ but that's not really active, so there's a chance to get a foodhold there.

I don't have an opinion on the relaxed image restrictions other than that TC's no 3D rule is the greatest thing since sliced bread. Relaxing that to ecchi as well shouldn't really make a difference in my opinion since post content is first and foremost, while the images are merely dessert.
>> No. 3348 [Edit]
I just want to say I've been silently following your progress as well and I'm even a very, very minuscule part of the process since you used my maneki neko gif to do some testing, which made me smile. Whatever the theme of your ib I'll try to post in there as well. If you keep it well moderated and free from spam and trash, you'll likely find an audience.

>TC's no 3D rule is the greatest thing since sliced bread.
Yes, I can't agree enough, it's brilliant.
>> No. 3349 [Edit]
Thanks. Your gif is nice. Try making some like that with cute girls.
>> No. 3350 [Edit]
File 165573252299.png - (8.33KB , 288x96 , trio.png )
I tend to go with Japanese/Chinese myth/folklore creatures and lighthearted Buddhist stuff but I'll try to make a few cute girls too. Anyway, that's offtopic, please keep us posted on your progress.
>> No. 3351 [Edit]
File 165602958536.png - (122.90KB , 1263x1433 , multi thread update.png )
Made quite a bit of progress on multiple threads. Pic doesn't show it, but they reorder based on which was last posted in.

Post edited on 23rd Jun 2022, 5:20pm
>> No. 3352 [Edit]
>Few posts are related to what users make.
I've been trying to get into making midi music (and stuff for some older sound chips) over the past few years. I sorta wanted to post something on Tohno at some point, but I haven't really gotten anything I'm satisfied with yet. Maybe I'll post something on your new board if you get things up and running, but I've never played Saya no Uta.
>> No. 3353 [Edit]
The creation process itself is interesting enough to post about I think. Not just final products.
>I've never played Saya no Uta
Do yourself a favor and experience it.
>> No. 3354 [Edit]
>The creation process itself is interesting enough to post about I think. Not just final products.
I guess you're right, but I'm kind of apprehensive towards posting something that might not ever get finished, since I have hundreds of midi files in my folder but nothing that could really be called a complete song. I have been making a bit more headway recently, and I've even finished a few midi arrangements of some existing game music, so maybe I'll end up changing my mind.
>Do yourself a favor and experience it.
I want to try actually reading VNs for a change but it feels like I can never get around to it. I will try to read Saya, though.
>> No. 3355 [Edit]
File 165664402927.png - (440.79KB , 1263x638 , catalog progress.png )
Added a catalog.
>> No. 3356 [Edit]
Neat. Although I'm curious, does anyone use the catalog on TC? It doesn't seem useful on boards with little thread turnover since once you've gone through the board once, you already know which posts are there. I've only used it a few times for searching, but even then I'd prefer a dedicated full-text index so the replies could be searched as well.
>> No. 3357 [Edit]
I use it whenever a thread I want to reply to isn't on the front page, or to look at older threads in general(often whenever I'm on lainchan). I've decided not to add multiple pages. Instead, after the first ten updated threads, everything else will accessible only through the catalog.

I might look into a simple, exact text search engine later on.
>> No. 3358 [Edit]
Sqlite does full text search as well (FTS3/FTS4 is the the term to google). So you don't have to do anything special, just add an extra index.
>> No. 3359 [Edit]
So I'd have to make "virtual tables" that contain another copy of every post, and if there's ever any down-time I'd have to remake these tables?

That doesn't sound very efficient or desirable, and I would consider using a separate module something special. Keep in mind, I'm not familiar with any of this.
>> No. 3360 [Edit]
Note here that the "virtual" in virtual table is purely implementation difference, it behaves like any other sqlite table. So you have several options here:

* You could do as you suggested as maintain a separate copy of posts in a separate FTS table, but as you said this is undesirable since it requires more logic to manage replicated state

- Note that this is not as wasteful as it seems, because FTS4 supports external content references, so the virtual table only really stores the index rather than the text itself. You still have to do state management though, as stated in the fts4 doc - "in order to keep an FTS in sync with an external content table, any UPDATE or DELETE operations must be applied first to the FTS table, and then to the external content table"

Note sure what you mean by needing to re-make tables during downtime though. Just that any operation performed on the main table will also need to be performed on the shadow table.

* You can just have only the fts4 virtual table and use it as your main post table itself. You can exclude columns from the index (e.g. binary blob data) using the notindexed option. I'm not sure if this has any performance impact, you could test it out and verify.

Both seem like OK options to me, depending on how your schema currently looks like. It's also pretty easy to drop-in so you could defer this until later.
>> No. 3361 [Edit]
>Note sure what you mean by needing to re-make tables during downtime though
These tables aren't stored in permanent storage, but ram memory if I understand correctly, or maybe a temporary file. That means, if the program ever terminates for whatever reason, these tables would need to be remade.

If they only contain references though, I guess that's not a huge deal. I'll think about it.
>> No. 3362 [Edit]
No, as far as I understand they are proper tables. They're only "virtual" in that their on-disk format isn't necessarily the usual sqlite format. (If that didn't make sense, consider that for instance you can also map a csv file as a "virtual table", and then all sql operations will be materialized onto that csv file.)

Please link otherwise if I've misunderstood myself.. I've used FTS3 for a very simple dictionary lookup program before, and the FTS index it seemed to persist fine between terminations.

Post edited on 4th Jul 2022, 9:40pm
>> No. 3363 [Edit]
File 165714030222.png - (348.33KB , 1180x940 , subjects.png )
Small update, sujects have been implemented. On a board, to go to a thread you click on its subject. If a thread has no subject, you click on a link that's labeled "Reply". I think this is both intuitive for users and cleaner than what's typically done(always a separate link).

edit: also, I've reached 1039 lines of code, not including templates and css. Sorry if that's disappointing to anybody.

Post edited on 6th Jul 2022, 1:54pm
>> No. 3364 [Edit]
More than 1k lines of code is a bit surprising. Can you list line-counts by filename (or by function) and see what the bulk contributor is? It feels like it shouldn't take more than 1k
>> No. 3365 [Edit]
Update is the longest at 260 lines

Maybe I could refactor to shave some lines down, but honestly I don't think it's worth it. The performance difference and decrease in executable size would probably be negligible.

Post edited on 7th Jul 2022, 6:39pm
>> No. 3366 [Edit]
Also, to put things into a bit of perspective, kusuba x is over 4,000 lines and lynxchan is over 20,000 lines.
>> No. 3367 [Edit]
File 165740967864.jpg - (316.32KB , 2048x1821 , c3255351ea084256658105058decd2e7.jpg )
I've ran into a huge problem. Table names can't be parametrized in SQL apparently. So you can't have statements like 'SELECT Id FROM ?' or 'CREATE TABLE ?'. Everything I've written so far has been with the assumption that this would be possible, because it just makes sense and is intuitive. I never, ever would have guessed that there would be a stupid fucking caveman restriction like this.

What am I supposed to do? Shove everything into the same 3 bloated, obese tables? Or never use prepared statements and take a massive performance hit? What's even the point of being able to make as many tables as you want if you can't do something this simple? I'm so close to just giving up because of this.

Post edited on 9th Jul 2022, 5:33pm
>> No. 3368 [Edit]
Why do you need multiple tables generated on-the-fly?
>> No. 3369 [Edit]
I don't. I need three tables for every board, and functions to retrieve data from a specific table based on a "board" value it gets from html forms.

I've settled on opening around (15 * 5 * # of boards) connections to the database, each with their own prepared statement.
>> No. 3370 [Edit]
Can't you just add the board name as a column instead of needing 3 tables for each?
>> No. 3371 [Edit]
I could, but that would bloat the tables and logic, so every single statement would have to check an extra value. It just feels wrong to me. You don't shove everything in the same folder, so why should I shove everything in the same table?
>> No. 3372 [Edit]
File 165741966580.png - (857.13KB , 1920x1054 , multi board progress.png )
I have it set up now. Kind of overreacted and it ended up being pretty straight-forward to implement. I don't know the difference between good and bad practice when it comes to this, but it's working.

The structure I have is this:
channel (
map (key : string, value: map (
key: string, value : prepared sql statement)))

Post edited on 9th Jul 2022, 7:29pm
>> No. 3373 [Edit]
>I could, but that would bloat the tables and logic, so every single statement would have to check an extra value.
I don't see what's wrong with that. It's not bloat, it's just an extra dimension you filter on. The folder analogy doesn't work because an RDMS isn't a hierarchical structure, each column is effectively another orthogonal dimension.
>> No. 3374 [Edit]
Well, what's done is done. I hope there isn't any harm in the way I chose to implement things. Though if there is, I could change it.

>an RDMS isn't a hierarchical structure
I see. I'd prefer a hierarchical structure since that's the way I'm used to thinking about data.
>> No. 3376 [Edit]
File 165747128836.png - (291.01KB , 559x1149 , sage update.png )
Added a sage option.
>> No. 3383 [Edit]
File 165904177542.jpg - (673.23KB , 1600x1088 , 23b9de3c2cfb67b29387bd835e226fa7.jpg )
I'm thinking about how to implement the home page and I've realized >>3373 is right. Things would be far simpler if I had all posts be in the same table with a column for board value.

I'll have to spend some time this weekend restructuring .
>> No. 3384 [Edit]
Getting the schema right is one of the hardest things about working with a DB. Schema migrations are a pain to do once you've filled it with data, so better now than later. (That's probably why people are attracted to schema-on-read solutions like storing documents in json, but that's an even bigger mess). Also that's a great haifuri image.
>> No. 3385 [Edit]
Makes sense. Doing it this way does introduce a new problem in that I can't just use autoincrement for post ids. The value of an id will have to also depend on the value in the "board column" and it looks like sqlite doesn't have a way to do that automatically.

Post edited on 28th Jul 2022, 3:23pm
>> No. 3386 [Edit]
Hm that's a good point I didn't think of. My initial thought would be to have a separate table with one row for each board that stores the next post number. You use this to get the post number to set whenever adding a new post, and then use a trigger (look into sqlite triggers) to automatically increment whenever a row gets added to posts. Maybe you can also avoid the additional DB lookup for each post by caching the next post number in memory, but you'd have to go through your code carefully to make sure that the in-memory counter is always consistent with the persisted on-disk one.

There's also the disadvantage you no longer can have a primary key constraint on the post id column (since uniqueness can be violated).

I wonder how vichan/tinyboard have their schema set up.
>> No. 3387 [Edit]
Based on https://github.com/vichan-devel/vichan/blob/19151def82e9c2033429e0439f2f86087452204d/templates/posts.sql

looks like they go with the "one table per board" approach. And they get around the prepared-statements issue by doing a hybrid approach where they format the build the sql string manually with the board name and then go ahead and parameterize the column


I guess you could do something similar where you basically cache the prepared statements for each table.

It's not clear to me offhand which approach is overall better though – one table for each board simplifies post-id related state (can use autoincrement, unique constraint on primary key probably speeds up post-id query) but makes queries involving multiple boards annoying. Vice-vera for one table for all boards.
>> No. 3388 [Edit]
I'm thinking of using two statements, one which uses MAX to get the current largest id in a board, and another to actually insert the post.

Do you see a disadvantage in this approach?
>> No. 3389 [Edit]
>one which uses MAX to get the current largest id in a board
You'd need to ensure both statements run in the same transaction for correctness guarantee. If you do that it should work, but I suppose my only potential concern with this is performance, for 2 reasons: the more minor one is that you're doing an indexed lookup on each post, but that's only a log(n) factor assuming they use a b-tree index or whatever (you should verify that sqlite does use single index lookup as opposed to a full-table scan though, otherwise you'll kill your throughput. You probably want to create a multi-column index on board-name + post-id). The more concering part is that it might impact throughput since every DB write now requires a read (from the same table) as well, leading to locking – I'm not well versed in the details of how sqlite handles transaction concurrency, and I think sqlite only supports DB-level locking anyway (as opposed to other RDBMS which can do fine-grained table or row-level locking), so it probably wouldn't make much of a difference whether you used a separate table to store the next post id or not.

But out of curiosity, why not go with the separate table to hold next post id approach? That way you can avoid having to do a MAX, and can let sqlite handle the lookups.

Basically your insert just becomes a single statement

INSERT INTO POSTS (id, "board1", "post_body", etc.) SELECT id FROM next_post_id WHERE board_name = "board1"

and then you have an ON INSERT trigger that updates the next_post_id table.

(If you're really interested in optimizing, you should probably avoid looking up board-name by string and instead have an enum mapping somewhere from board-name to an int board id, and use the board id everywhere).
>> No. 3390 [Edit]
>You'd need to ensure both statements run in the same transaction for correctness guarantee.
Well shoot. I have another problem then. I use a read statement to check if a post's "parent"(given through an html form) exists or not, and if it doesn't, the post is treated as a new thread. I'm not sure whether or not this is a problem, but I don't think it can cause too much issue.

>But out of curiosity, why not go with the separate table to hold next post id approach?
I guess I will, but it feels clumsy. I'm growing increasingly frustrated with SQL's inflexibility. I'd rather use something that's a bit slower, but much smarter.
>> No. 3391 [Edit]
> I use a read statement to check if a post's "parent"(given through an html form) exists or not, and if it doesn't, the post is treated as a new thread

My brain is a bit fried at the moment, why do you need to check if the parent exists or not? If a thread id is not supplied, you should treat it as a new thread. If it is supplied and the thread id is valid, treat it is a new post in that threat. If it is supplied and such a thread does not exist, reject the request.

Regardless, I think if you don't lookup thread id and write to the table in the same transaction, there's a risk of a race-condition if the thread is deleted after the lookup but before the insertion.

Consider (with T1 and T2 being separate system threads)

T1) Handle request: Post{thread_id = 123, body='my cool post'}
T1) Lookup thread_id = 123. Found
T2) Handle request: DeleteThread{thread_id = 123, password='hunter2'}
T2) Deletes Thread
T1) # At this point T1 believes the thread exists and tries to insert a new post for that thread. This is invalid state though, as the thread no longer exists so request should fail.

Note that despite SQLite preventing multiple writers, it's the lack of transaction on the read-before-write that causes you to write inconsistent state. Isolation is the key property to look at here [1]. This is really something that all DBs should make clear, but unfortunately you have to read inbetween the lines a lot to figure out exactly what guarantees a DB provides.

I think SQLite actually does provide read-write transactions though, via the paragraph that mentions that on an attempt to escalate from a read-snapshot to a write, the txn will abort if it was modified by any other writer. This seems to be a form of optimistic locking, leaving you to deal with the retry. It also seems to have a form of pessimistic locking via "BEGIN IMMEDIATE". There are apparently some weird gotchas where transactions basically only work if each transaction is one a separate db connection, but I think this is usually what you do assuming you have one thread per connection.

Even Postgres honestly makes it hard to figure out exactly what guarantees it provides, by default it's "read committed" which doesn't actually give you a read-write transaction.

[1] https://www.sqlite.org/isolation.html

Also once you start a transaction you can still do non-sql things before committing the transaction. In such a way you aren't limited to doing logic in SQL.

>I'm growing increasingly frustrated with SQL's inflexibility. I'd rather use something that's a bit slower, but much smarter.
I think the restrictions imposed by SQL (the so-called inflexibility) is what forces you to define your things in a way that actually scales. If it was looser with things, then in order to guarantee correctness it would have to be even more pessimistic, and basically serialize all transactions, hurting throughput.

But yes SQLite isn't the poshest db in terms of what you have to work with. It does the job, but it's a bit barren. Postgres is one level up (I think it's the best OSS db?) in terms of things it can do, but there's a lot more complexity in configuring it. And if you go to paid offerings, things like gcp bigquery can do absolutely amazing data transformations directly in sql.

Post edited on 28th Jul 2022, 11:08pm
>> No. 3392 [Edit]
>why do you need to check if the parent exists or not?
To prevent replying to nonexistent threads. I didn't think to have the new thread condition being no provided parent because the statements which insert new posts require a parameter for parent thread. I also didn't know my implementation could cause issues.

Additional logic will need to be added, but doing it in the way you suggest should be straight forward.
>> No. 3393 [Edit]
On second thought,
>If it is supplied and such a thread does not exist
How would I do this without using sql to check? It seems the only solution is shoving the parent check into the same transaction as the writing, something I don't know how to do. And I have no idea how the logic of that would work, unless I'd have to write the logic in sql too. In that case, it doesn't really matter what condition I use to determine if a new thread should be made, except maybe a slight performance advantage.

Post edited on 29th Jul 2022, 7:44am
>> No. 3394 [Edit]
>And I have no idea how the logic of that would work, unless I'd have to write the logic in sql too
No, as long as you issue a BEGIN TRANSACTION you can interleave code-logic with sql queries. In python, you'd do something like

v = db.execute('SELECT * FROM foo').fetchone()
v = frobincate(v)
db.execute('INSERT INTO foo VALUES(?)', v)

And the second execute will throw an exception if the db-lock could not be escalated, leaving you to handle the retry.

Do note that SQLite doesn't supported nested transactions (at least I think it doesn't?) so if you're already inside a transaction context you should re-use it when e.g. determining the next thread id number.
>> No. 3395 [Edit]
Although this does make me appreciate that vichan is actually pretty lightweight considering all it supports. I.e. when you consider that it also handles the long-tail of imageboard needs (mod support, markup, captchas, thumbnail generation, banning) then the line count seems pretty reasonable (at least much more so than lynxchan).
>> No. 3396 [Edit]
Hmm, I'll look into this. Not sure if it'll work with prepared statements.
>> No. 3397 [Edit]
>The SQL language is quirky. This is mostly because the syntactical order of operations doesn’t match the logical order of operations.

Post edited on 29th Jul 2022, 3:43pm
>> No. 3398 [Edit]
Yes, that's a peeve of mine too. And this leads to a bunch of annoyances like not being able to reference column aliases in the where clause (but you can in the group by).

C#'s LINQ is what sql should have been as a dsl.
>> No. 3399 [Edit]
>isn't the poshest db in terms of what you have to work with
It's worth noting that sqlite supports extensions so you can get slightly closer to bigquery-level stdlib with something like https://github.com/nalgeon/sqlean

That said, these should really be thought of as row-valued functions (which can be used in e.g. a SELECT clause to transform some value into another value) and table-valued functions (which can produce an entire table, perhaps lazily). They can be useful to avoid dropping down into code for such data-transformations, but but it wouldn't solve your particular issue.
>> No. 3400 [Edit]
File 165936936077.png - (505.50KB , 1896x1958 , homepage progress.png )
It's not integrated yet, but I've been working on a prototype of the home page. I think for every other "tab", the contents will be pulled from a text file the owner edits manually. While this does require them to know a little html, it offers the most flexibility for them and the least amount of work for me.
>> No. 3401 [Edit]
Looks good, and I like the feel of the board (although I personally prefer the cool blue of yotsuba v2 myself).
>> No. 3402 [Edit]
File 165938614891.png - (489.10KB , 1892x1944 , buri home.png )
>I personally prefer the cool blue of yotsuba v2 myself
Theming is fairly straightforward. Also made it more compact for functionality's sake.
>> No. 3411 [Edit]
File 166873284684.webm - (830.37KB , homedemo.webm )
It's been a while. I've implemented half of the home page, the most flashy parts. File is my clumsy demo. Next I'll work on nitty gritty transaction stuff like discussed above.

edit: sometimes posting is very fast, other times it takes a while. File size is a factor, but doesn't seem to be the main one.

Post edited on 17th Nov 2022, 5:10pm
>> No. 3412 [Edit]
Nice, I was wondering whether you were still alive. You should scatter some logging/timing statements about to see which part is taking the most time. Writing a single DB row should not take long at all, so I suspect something weird is going on, like you're using read/write transaction for read-only SQL, which will kill your throughput (I recently made this mistake).
>> No. 3413 [Edit]
File 166874465133.jpg - (96.52KB , 507x732 , 6bd4154a7d02b0e63fc189b82bf71b3e.jpg )
>Writing a single DB row should not take long at all
When a new post with an image is made, several things happen. The post is checked in various ways, goes through a bunch of formatting functions, its parent form value is checked(read), if it's not found, the latest id for a board is retrieved(read) its subject, if it has one, is added to the subject table(write), if it has an image attached, a thumbnail is created, the thumbnail's info is added to the homethumb table(write), the post is added to the posts table(write) and if not empty, also the homepost table(write), if the post replied to anyone, all of those are added to the reply table(variable writes).

Then, the thread(also home, board and catalog) is rebuilt, by doing a bunch of reads and feeding that info into a template. So there's a lot going on every time a post is made. Adding timers sounds like a good idea, but I'm not sure whether there is a way to speed things up with all this stuff happening behind the scenes. So far, I'm not using any transactions, so maybe that's the bottleneck?

Post edited on 17th Nov 2022, 8:14pm
>> No. 3414 [Edit]
Checked with timers. The initial check actually took the longest(but only stalls sometimes), which has nothing to do with sql. Every other part takes less than a second.

Post edited on 18th Nov 2022, 9:20am
>> No. 3415 [Edit]
Can you post the timing breakdown? Even 1 second per post is too much, SQLite can easily handle _at least_ 10K QPS writes, so it should almost never be the bottleneck. Transactions should not make a difference in terms of speed for the case of 1 single post.

>So there's a lot going on every time a post is made
It honestly is not much, you should only be limited by IO throughput to/from disk, everything else is done in-memory, just simple lookups and string operations. So if it's taking more than a few dozen ms, something is very wrong, I suspect some of the template building is not as optimized as it should be.

> The initial check actually took the longest(but only stalls sometimes), which has nothing to do with sql. Every other part takes less than a second
Yes, that's what I would expect. SQLite will almost never be the bottleneck, and I'd bet good money on that. (And if your benchmarks revealed sqlite was a bottleneck, I'd again bet good money that it's because you're doing your queries ineffeciently (e.g. using a write txn where it's not needed, haven't enabled WAL mode, etc.).

Can you elaborate more on your checks? What are they doing? For the formatting, I recall you did this via regex, and go uses re2 so that should be a roughly linear pass over the input, which should be OK.
>> No. 3416 [Edit]
>SQLite can easily handle _at least_ 10K QPS write
Actually that's not quite true, I was mixing up parallel and sequential workloads.

If doing sequential updates one after the other with standard rollback journal, you'll only get around 50 updates per second. The performance will be terrible unless you enable write ahead logging. With WAL enabled, you should be able to hit 1k or so. Doing multiple updates in a single transaction will also help.

When talking about how SQLite can scale to multiple parallel writers, the answer is that it doesn't, it enforces db-level locking to serialize the writes. So multiple parallel writers is equivalent to multiple sequential writers, so refer to above paragraph.

For multiple sequential readers, SQLite can do pretty well: https://www.sqlite.org/np1queryprob.html. A given query should not take more than 1ms or so, so you get around 1k qps. Usually the bottleneck will be IO speed, or ffi interop between your language and sqlite.

Multiple parallel readers is where SQLite really shines, you'll probably hit file descriptor or memory limits before sqlite becomes the bottleneck. I'd guesstimate maybe 10k-100k, although apparently with a beefy machine and the right configs you can get to 4M

>> No. 3417 [Edit]
Although I must say I got sniped into benchmarking this on my personal usages of sqlite where I have a particular case that involves a ton of sequential bulk-lookups. Each individual lookup completes in about 2ms or less, but with 1000+ lookups that's easily 2 seconds. And the time for a single query is same whether I use sqlite3 cli or from my program, so ffi time is irrelevant here.

I do observe that batching reads using transactions does help, which was actually surprising to me, but I guess even read transactions involve posix file-level locking: https://stackoverflow.com/questions/7349189/optimizing-select-with-transaction-under-sqlite-3

So I guess 500-1k selects per second is indeed about right.
>> No. 3418 [Edit]
File 166879209673.png - (23.03KB , 400x400 , a90508beeacbc15deb0e36716223dee0.png )
>Can you post the timing breakdown?

2022/11/18 11:55:56 Starting: 2022/11/18 11:56:02 Initial check complete: 2022/11/18 11:56:02 Formatting complete: 2022/11/18 11:56:02 Parent checking & subject adding complete: 2022/11/18 11:56:02 Image check complete: 2022/11/18 11:56:02 Image copying complete: 2022/11/18 11:56:02 Thumbnail making complete: 2022/11/18 11:56:02 Table inserting complete: 2022/11/18 11:56:02 Template making complete: 2022/11/18 11:56:02 Creating new thread file complete: 2022/11/18 11:56:02 Post retrieval complete: 2022/11/18 11:56:02 Subject retreival complete: 2022/11/18 11:56:02 Template execution complete: 2022/11/18 11:56:02 Complete: 2022/11/18 11:56:02 Indirect time:

I don't think my source code is too spaghetti to read. The starting point to the initial check being finished is this block of code.

if req.Method != "POST" { http.Error(w, "Method not allowed.", http.StatusMethodNotAllowed) return } file, handler, file_err := req.FormFile("file") no_text := (strings.TrimSpace(req.FormValue("newpost")) == "") if file_err != nil && no_text { http.Error(w, "Empty post.", http.StatusBadRequest) return } req.Body = http.MaxBytesReader(w, req.Body, max_upload_size) if err := req.ParseMultipartForm(max_upload_size); err != nil { http.Error(w, "Request size exceeds limit(10MB).", http.StatusBadRequest) return } post_length := len([]rune(req.FormValue("newpost"))) if post_length > max_post_length { http.Error(w, "Post exceeds character limit(10000). Post length: " + strconv.Itoa(post_length), http.StatusBadRequest) return } parent := req.FormValue("parent") board := req.FormValue("board") subject := req.FormValue("subject") option := req.FormValue("option") if parent == "" || board == "" { http.Error(w, "Board or parent thread not specified.", http.StatusBadRequest) return } if !(slices.Contains(Boards, board)) { http.Error(w, "Board is invalid.", http.StatusBadRequest) return }

Maybe slices is the problem? It's part of the standard library and all I use it for it checking if the given board is part of the valid board array. Maybe I should make the board array into a hashset or something and do away with it. In any case, I'll probably need to subdivide this block with timers to figure out the exact thing that which takes the longest.
>> No. 3419 [Edit]
Did the timestamps get cut off or did you only log at seconds-level granularity? That's way too coarse to be useful, log in millis.

Also Go should have some profiling tool that will give you a flamgraph.

Post edited on 18th Nov 2022, 10:45am
>> No. 3420 [Edit]
Those are the default time stamps. I'll see if I can change that. And post results.
>> No. 3421 [Edit]
Basically have a global var for lastTimestamp and function like endTimeAndRecord(msg) which will log(msg, now - lastTimestamp) and set lastTimestamp = now. Then you just need to initialize lastTimestamp = now at the very beginning of the request, and anywhere in the request lifecycle you can add endTimeAndRecord(msg) and it will log the delta since the last log message.
>> No. 3422 [Edit]
Go's log has a millisecond option. I think I've narrowed it down pretty conclusively. Two times it took a while -

2022/11/18 14:33:33.658508 Starting 2022/11/18 14:33:33.658688 Method check complete 2022/11/18 14:33:39.516935 FormFile get complete 2022/11/18 14:33:39.517187 No text get complete 2022/11/18 14:33:39.517293 Empty test complete 2022/11/18 14:33:39.517385 Size check complete 2022/11/18 14:33:39.517472 Text length check complete 2022/11/18 14:33:39.517559 Meta get complete 2022/11/18 14:33:39.517644 Empty parent or board check complete 2022/11/18 14:33:39.517731 Valid board check complete. Initial check complete.

2022/11/18 14:41:36.135516 Starting 2022/11/18 14:41:36.135969 Method check complete 2022/11/18 14:41:40.612392 FormFile get complete 2022/11/18 14:41:40.612965 No text get complete 2022/11/18 14:41:40.613319 Empty test complete 2022/11/18 14:41:40.613516 Size check complete 2022/11/18 14:41:40.613685 Text length check complete 2022/11/18 14:41:40.613844 Meta get complete 2022/11/18 14:41:40.613969 Empty parent or board check complete 2022/11/18 14:41:40.614087 Valid board check complete. Initial check complete.

Almost every step takes about .00100 milliseconds, except the formfile one, which is a single line of code. It takes waaaaay longer.

file, handler, file_err := req.FormFile("file")

>> No. 3423 [Edit]
I don't know anything about go, maybe see this [1]. It should not be taking 4 seconds to parse the input request, unless the client is intentionally throttling their upload.

[1] https://old.reddit.com/r/golang/comments/nihs0n/help_needed_for_uploading_large_file_with_little/
>> No. 3424 [Edit]
Also interesting https://sqlite-users.sqlite.narkive.com/2KMQOyUd/performance-of-select-in-transactions

If you're doing on the order of ~10k select queries, the cost of acquiring/releasing the shared lock on the DB starts to add up. The linked post mentions about a 1 sec difference (1 sec with all in 1 transaction, 2 sec if done individually), but I'm seeing much more, about a 6 seconds difference on about 50k reads (2 sec if done all in 1 transaction, 8 sec otherwise). It's possible that the language binding I'm using is doing some extra bookkeeping at the start of every transaction, and that's also contributing.
>> No. 3425 [Edit]
The first recommended strategy in that thread (uploading in a stream), is good for uploading large files on systems with low ram, to prevent the OS from killing the program when ram runs out. My research did not indicate there would be any other advantages, like performance.

So I think the bottle neck is actually my hardware. I'm only working with 8gb of ram, a lot of which is being used up by other things(it's my laptop).

This link has a good explanation

Post edited on 18th Nov 2022, 7:11pm
>> No. 3426 [Edit]
But isn't the file you're uploading only a few megs? Unless you're hitting swap I still don't see why there'd be a several second delay.

Some more tests to try: if you post replies without any images, is it always fast? I recommend you maybe log the request processing times to a file somewhere, so you can create a histogram and get the p50, p95, and p99 latencies. Is the lag dependent on file upload size? If you are indeed memory bound and hitting swap, uploading a kb file should not trigger anything while uploading a several MB file shoould.
>> No. 3427 [Edit]
File 16688294683.png - (1.51MB , 900x900 , yuki.png )
>if you post replies without any images, is it always fast?

>log the request processing times
Don't know how to do that. Do you mean processing times according to my browser? At this point, I think I've already pinpointed the culprit to that one function.

>Is the lag dependent on file upload size?
Larger files always take longer. The ones below 1mb are usually instantaneous.

>uploading a kb file should not trigger anything
It's probably asking too much, but checking out the performance on another machine would be helpful. If the problem persists on a beefier machine, I'll know my hardware isn't likely the problem. If it counts for anything, TC is usually slower.

Post edited on 18th Nov 2022, 7:46pm
>> No. 3428 [Edit]
Maybe someone else who already has a Go environment set up can assist with testing (I don't want to download golang just for this..)

I still don't think it's a hardware issue though. Assuming you're on linux with just 1 tab open on a browser, there should still be plenty of headroom. What's your swap utilization?
>> No. 3429 [Edit]
File 166883071273.png - (84.18KB , 2550x952 , curr usage.png )
>Assuming you're on linux with just 1 tab open on a browser
Nope, to all of that.
>> No. 3430 [Edit]
You've still got about a gig left though. So I don't see why you'd be running into issues ingesting a few megs. I can't help much more, I don't know anything about that particular go library. You can probably profile inside that function with gdb or pprof or something, to see if there's anything blocking. Or use strace/bpftrace to see if you're being blocked on a syscall or something.

Worst case there should surely be some 3rd party replacement, just use that.
>> No. 3431 [Edit]
>You can probably profile inside that function with gdb or pprof or something, to see if there's anything blocking. Or use strace/bpftrace to see if you're being blocked on a syscall or something.
That's way above my pay grade. I don't know if it was you, but somebody expressed a desire to use the engine I'm writing. If it's a problem for them as well, they can do that testing and propose a solution. I'm all ears. The advantage of oss is collaboration. I'm not selling a product.
>> No. 3432 [Edit]
After doing more testing and looking at the source of things, I've narrowed it down to a function called ReadForm

From the source code, it's described like this:
> ReadForm parses an entire multipart message whose parts have
> a Content-Disposition of "form-data".
> It stores up to maxMemory(parameter) bytes + 10MB (reserved for non-file parts)
> in memory. File parts which can't be stored in memory will be stored on
> disk in temporary files.

Can't dig any deeper because of private fields in necessary structs preventing me from copying code and adding timers.
>> No. 3433 [Edit]
File 166893071811.png - (371.17KB , 2269x2083 , profile005.png )
Also, here's a picture showing allocated memory. Doesn't really help me though.
>> No. 3434 [Edit]
I've figured out that I can edit GO's source code, which comes with every installation of GO, and add logs to that. So I've narrowed it down even further to an io function called CopyN

> CopyN copies n bytes (or until an error) from src to dst.
> It returns the number of bytes copied and the earliest
> error encountered while copying.

I can't add logs to io, because the log package depends on io itself. I'm farily certain though that io.copyBuffer is ultimately responsible.


> copyBuffer is the actual implementation of Copy and CopyBuffer.
> if buf is nil, one is allocated.

Even if I knew what could be changed, I don't really feel that editing GO's source code is an appropriate solution to any technical problem.
>> No. 3435 [Edit]
Seems like it's just blocking on IO then. Are you on ssd or spinning rust? But I still can't see why it'd block for 4 whole seconds to write a few mb to disk.

Post edited on 20th Nov 2022, 1:20pm
>> No. 3436 [Edit]
>Are you on ssd or spinning rust?
SSD, 256gb.
>> No. 3437 [Edit]
WSL might be at fault. Will try moving data outside of /mnt/ and see if that helps.

Edit: did that and maybe there's some marginal improvement, but it's still not fast on files over 1mb.

Edit2: others have the same problem

Post edited on 22nd Nov 2022, 9:51am
>> No. 3438 [Edit]
File 166903915462.webm - (616.83KB , speedshow.webm )
Sorry, for harping on this, but after just restarting my computer, it's now super fast, as you can see in this recording.
>> No. 3439 [Edit]
I haven't been following closely as most of this thread's content is a bit beyond me, but I do like you having the reply box at the bottom of the page. Looking good.
>> No. 3440 [Edit]
File 167018546394.png - (3.89KB , 334x215 , tooltip.png )
Finally changed posting and related queries to be inside an sql transaction. New threads are indicated by no parent being given, which removes the possibility of accidentally replying to a thread when you meant to make a new one.

Also changed the configuration format from TOML to INI. I'm not opinionated on confirmation formats given my little experience. Reading this article convinced me that INI is better. Go also has a nice INI library.

Post edited on 4th Dec 2022, 12:25pm

View catalog

Delete post []
Report post

[Home] [Manage]

[ Rules ] [ an / foe / ma / mp3 / vg / vn ] [ cr / fig / navi ] [ mai / ot / so / tat ] [ arc / ddl / irc / lol / ns / pic ] [ home ]