Where the autistic get artistic.
[Return] [Entire Thread] [Last 50 posts]
Posting mode: Reply
Name
Email
Subject   (reply to 3322)
Message
BB Code
File
File URL
Embed   Help
Password  (for post and file deletion)
  • Supported file types are: FLV, GIF, JPEG, JPG, MP3, OGG, PDF, PNG, RAR, SWF, WEBM, ZIP
  • Maximum file size allowed is 7000 KB.
  • Images greater than 260x260 pixels will be thumbnailed.
  • Currently 1022 unique user posts.
  • board catalog

File 165518288236.png - (3.87MB , 3808x940 , comp6.png )
3322 No. 3322 [Edit]
I don't want to keep flooding the /navi/ thread, so I'll post updates here from now on. Crushed some bugs, and added some features, including rate limiting(ended up only needing the stdlib for that).

I added a feature I'm slightly unsure about since it's pretty unconventional. CSS allows you to define the maximum height and width an image can occupy based on the viewport. I decided to use this to limit how large a thumbnail can expand. I think this improves the user experience, since you'll never scroll to look at an image piece by piece, or open it in a new tab.

I'm a little worried it'll mislead people into thinking images are lower res than they really are . The file info does include the real image dimensions though. Pic is a comparison of my behavior compared to tc's.

https://gitgud.io/nvtelen/ogai

Post edited on 13th Jun 2022, 10:10pm
34 posts omitted. Last 50 shown. Expand all images
>> No. 3383 [Edit]
File 165904177542.jpg - (673.23KB , 1600x1088 , 23b9de3c2cfb67b29387bd835e226fa7.jpg )
3383
I'm thinking about how to implement the home page and I've realized >>3373 is right. Things would be far simpler if I had all posts be in the same table with a column for board value.

I'll have to spend some time this weekend restructuring .
>> No. 3384 [Edit]
>>3383
Getting the schema right is one of the hardest things about working with a DB. Schema migrations are a pain to do once you've filled it with data, so better now than later. (That's probably why people are attracted to schema-on-read solutions like storing documents in json, but that's an even bigger mess). Also that's a great haifuri image.
>> No. 3385 [Edit]
>>3384
Makes sense. Doing it this way does introduce a new problem in that I can't just use autoincrement for post ids. The value of an id will have to also depend on the value in the "board column" and it looks like sqlite doesn't have a way to do that automatically.

Post edited on 28th Jul 2022, 3:23pm
>> No. 3386 [Edit]
>>3385
Hm that's a good point I didn't think of. My initial thought would be to have a separate table with one row for each board that stores the next post number. You use this to get the post number to set whenever adding a new post, and then use a trigger (look into sqlite triggers) to automatically increment whenever a row gets added to posts. Maybe you can also avoid the additional DB lookup for each post by caching the next post number in memory, but you'd have to go through your code carefully to make sure that the in-memory counter is always consistent with the persisted on-disk one.

There's also the disadvantage you no longer can have a primary key constraint on the post id column (since uniqueness can be violated).

I wonder how vichan/tinyboard have their schema set up.
>> No. 3387 [Edit]
>>3386
Based on https://github.com/vichan-devel/vichan/blob/19151def82e9c2033429e0439f2f86087452204d/templates/posts.sql

looks like they go with the "one table per board" approach. And they get around the prepared-statements issue by doing a hybrid approach where they format the build the sql string manually with the board name and then go ahead and parameterize the column

https://github.com/vichan-devel/vichan/blob/master/post.php#L216

I guess you could do something similar where you basically cache the prepared statements for each table.

It's not clear to me offhand which approach is overall better though – one table for each board simplifies post-id related state (can use autoincrement, unique constraint on primary key probably speeds up post-id query) but makes queries involving multiple boards annoying. Vice-vera for one table for all boards.
>> No. 3388 [Edit]
>>3387
I'm thinking of using two statements, one which uses MAX to get the current largest id in a board, and another to actually insert the post.

Do you see a disadvantage in this approach?
>> No. 3389 [Edit]
>>3388
>one which uses MAX to get the current largest id in a board
You'd need to ensure both statements run in the same transaction for correctness guarantee. If you do that it should work, but I suppose my only potential concern with this is performance, for 2 reasons: the more minor one is that you're doing an indexed lookup on each post, but that's only a log(n) factor assuming they use a b-tree index or whatever (you should verify that sqlite does use single index lookup as opposed to a full-table scan though, otherwise you'll kill your throughput. You probably want to create a multi-column index on board-name + post-id). The more concering part is that it might impact throughput since every DB write now requires a read (from the same table) as well, leading to locking – I'm not well versed in the details of how sqlite handles transaction concurrency, and I think sqlite only supports DB-level locking anyway (as opposed to other RDBMS which can do fine-grained table or row-level locking), so it probably wouldn't make much of a difference whether you used a separate table to store the next post id or not.

But out of curiosity, why not go with the separate table to hold next post id approach? That way you can avoid having to do a MAX, and can let sqlite handle the lookups.

Basically your insert just becomes a single statement

INSERT INTO POSTS (id, "board1", "post_body", etc.) SELECT id FROM next_post_id WHERE board_name = "board1"

and then you have an ON INSERT trigger that updates the next_post_id table.

(If you're really interested in optimizing, you should probably avoid looking up board-name by string and instead have an enum mapping somewhere from board-name to an int board id, and use the board id everywhere).
>> No. 3390 [Edit]
>>3389
>You'd need to ensure both statements run in the same transaction for correctness guarantee.
Well shoot. I have another problem then. I use a read statement to check if a post's "parent"(given through an html form) exists or not, and if it doesn't, the post is treated as a new thread. I'm not sure whether or not this is a problem, but I don't think it can cause too much issue.

>But out of curiosity, why not go with the separate table to hold next post id approach?
I guess I will, but it feels clumsy. I'm growing increasingly frustrated with SQL's inflexibility. I'd rather use something that's a bit slower, but much smarter.
>> No. 3391 [Edit]
>>3390
> I use a read statement to check if a post's "parent"(given through an html form) exists or not, and if it doesn't, the post is treated as a new thread

My brain is a bit fried at the moment, why do you need to check if the parent exists or not? If a thread id is not supplied, you should treat it as a new thread. If it is supplied and the thread id is valid, treat it is a new post in that threat. If it is supplied and such a thread does not exist, reject the request.

Regardless, I think if you don't lookup thread id and write to the table in the same transaction, there's a risk of a race-condition if the thread is deleted after the lookup but before the insertion.

Consider (with T1 and T2 being separate system threads)

T1) Handle request: Post{thread_id = 123, body='my cool post'}
T1) Lookup thread_id = 123. Found
T2) Handle request: DeleteThread{thread_id = 123, password='hunter2'}
T2) Deletes Thread
T1) # At this point T1 believes the thread exists and tries to insert a new post for that thread. This is invalid state though, as the thread no longer exists so request should fail.

Note that despite SQLite preventing multiple writers, it's the lack of transaction on the read-before-write that causes you to write inconsistent state. Isolation is the key property to look at here [1]. This is really something that all DBs should make clear, but unfortunately you have to read inbetween the lines a lot to figure out exactly what guarantees a DB provides.

I think SQLite actually does provide read-write transactions though, via the paragraph that mentions that on an attempt to escalate from a read-snapshot to a write, the txn will abort if it was modified by any other writer. This seems to be a form of optimistic locking, leaving you to deal with the retry. It also seems to have a form of pessimistic locking via "BEGIN IMMEDIATE". There are apparently some weird gotchas where transactions basically only work if each transaction is one a separate db connection, but I think this is usually what you do assuming you have one thread per connection.

Even Postgres honestly makes it hard to figure out exactly what guarantees it provides, by default it's "read committed" which doesn't actually give you a read-write transaction.

[1] https://www.sqlite.org/isolation.html

Also once you start a transaction you can still do non-sql things before committing the transaction. In such a way you aren't limited to doing logic in SQL.

>I'm growing increasingly frustrated with SQL's inflexibility. I'd rather use something that's a bit slower, but much smarter.
I think the restrictions imposed by SQL (the so-called inflexibility) is what forces you to define your things in a way that actually scales. If it was looser with things, then in order to guarantee correctness it would have to be even more pessimistic, and basically serialize all transactions, hurting throughput.

But yes SQLite isn't the poshest db in terms of what you have to work with. It does the job, but it's a bit barren. Postgres is one level up (I think it's the best OSS db?) in terms of things it can do, but there's a lot more complexity in configuring it. And if you go to paid offerings, things like gcp bigquery can do absolutely amazing data transformations directly in sql.

Post edited on 28th Jul 2022, 11:08pm
>> No. 3392 [Edit]
>>3391
>why do you need to check if the parent exists or not?
To prevent replying to nonexistent threads. I didn't think to have the new thread condition being no provided parent because the statements which insert new posts require a parameter for parent thread. I also didn't know my implementation could cause issues.

Additional logic will need to be added, but doing it in the way you suggest should be straight forward.
>> No. 3393 [Edit]
>>3391
On second thought,
>If it is supplied and such a thread does not exist
How would I do this without using sql to check? It seems the only solution is shoving the parent check into the same transaction as the writing, something I don't know how to do. And I have no idea how the logic of that would work, unless I'd have to write the logic in sql too. In that case, it doesn't really matter what condition I use to determine if a new thread should be made, except maybe a slight performance advantage.

Post edited on 29th Jul 2022, 7:44am
>> No. 3394 [Edit]
>>3393
>And I have no idea how the logic of that would work, unless I'd have to write the logic in sql too
No, as long as you issue a BEGIN TRANSACTION you can interleave code-logic with sql queries. In python, you'd do something like


db.execute('BEGIN TRANSACTION')
v = db.execute('SELECT * FROM foo').fetchone()
v = frobincate(v)
db.execute('INSERT INTO foo VALUES(?)', v)


And the second execute will throw an exception if the db-lock could not be escalated, leaving you to handle the retry.

Do note that SQLite doesn't supported nested transactions (at least I think it doesn't?) so if you're already inside a transaction context you should re-use it when e.g. determining the next thread id number.
>> No. 3395 [Edit]
Although this does make me appreciate that vichan is actually pretty lightweight considering all it supports. I.e. when you consider that it also handles the long-tail of imageboard needs (mod support, markup, captchas, thumbnail generation, banning) then the line count seems pretty reasonable (at least much more so than lynxchan).
>> No. 3396 [Edit]
>>3394
Hmm, I'll look into this. Not sure if it'll work with prepared statements.
>> No. 3397 [Edit]
sigh
https://stackoverflow.com/questions/5391564/how-to-use-distinct-and-order-by-in-same-select-statement
https://blog.jooq.org/how-sql-distinct-and-order-by-are-related/
>The SQL language is quirky. This is mostly because the syntactical order of operations doesn’t match the logical order of operations.

Post edited on 29th Jul 2022, 3:43pm
>> No. 3398 [Edit]
>>3397
Yes, that's a peeve of mine too. And this leads to a bunch of annoyances like not being able to reference column aliases in the where clause (but you can in the group by).

C#'s LINQ is what sql should have been as a dsl.
>> No. 3399 [Edit]
>>3391
>isn't the poshest db in terms of what you have to work with
It's worth noting that sqlite supports extensions so you can get slightly closer to bigquery-level stdlib with something like https://github.com/nalgeon/sqlean

That said, these should really be thought of as row-valued functions (which can be used in e.g. a SELECT clause to transform some value into another value) and table-valued functions (which can produce an entire table, perhaps lazily). They can be useful to avoid dropping down into code for such data-transformations, but but it wouldn't solve your particular issue.
>> No. 3400 [Edit]
File 165936936077.png - (505.50KB , 1896x1958 , homepage progress.png )
3400
It's not integrated yet, but I've been working on a prototype of the home page. I think for every other "tab", the contents will be pulled from a text file the owner edits manually. While this does require them to know a little html, it offers the most flexibility for them and the least amount of work for me.
>> No. 3401 [Edit]
>>3400
Looks good, and I like the feel of the board (although I personally prefer the cool blue of yotsuba v2 myself).
>> No. 3402 [Edit]
File 165938614891.png - (489.10KB , 1892x1944 , buri home.png )
3402
>>3401
>I personally prefer the cool blue of yotsuba v2 myself
Theming is fairly straightforward. Also made it more compact for functionality's sake.
>> No. 3411 [Edit]
File 166873284684.webm - (830.37KB , homedemo.webm )
3411
It's been a while. I've implemented half of the home page, the most flashy parts. File is my clumsy demo. Next I'll work on nitty gritty transaction stuff like discussed above.

edit: sometimes posting is very fast, other times it takes a while. File size is a factor, but doesn't seem to be the main one.

Post edited on 17th Nov 2022, 5:10pm
>> No. 3412 [Edit]
>>3411
Nice, I was wondering whether you were still alive. You should scatter some logging/timing statements about to see which part is taking the most time. Writing a single DB row should not take long at all, so I suspect something weird is going on, like you're using read/write transaction for read-only SQL, which will kill your throughput (I recently made this mistake).
>> No. 3413 [Edit]
File 166874465133.jpg - (96.52KB , 507x732 , 6bd4154a7d02b0e63fc189b82bf71b3e.jpg )
3413
>>3412
>Writing a single DB row should not take long at all
When a new post with an image is made, several things happen. The post is checked in various ways, goes through a bunch of formatting functions, its parent form value is checked(read), if it's not found, the latest id for a board is retrieved(read) its subject, if it has one, is added to the subject table(write), if it has an image attached, a thumbnail is created, the thumbnail's info is added to the homethumb table(write), the post is added to the posts table(write) and if not empty, also the homepost table(write), if the post replied to anyone, all of those are added to the reply table(variable writes).

Then, the thread(also home, board and catalog) is rebuilt, by doing a bunch of reads and feeding that info into a template. So there's a lot going on every time a post is made. Adding timers sounds like a good idea, but I'm not sure whether there is a way to speed things up with all this stuff happening behind the scenes. So far, I'm not using any transactions, so maybe that's the bottleneck?

Post edited on 17th Nov 2022, 8:14pm
>> No. 3414 [Edit]
Checked with timers. The initial check actually took the longest(but only stalls sometimes), which has nothing to do with sql. Every other part takes less than a second.

Post edited on 18th Nov 2022, 9:20am
>> No. 3415 [Edit]
>>3414
>>3413
Can you post the timing breakdown? Even 1 second per post is too much, SQLite can easily handle _at least_ 10K QPS writes, so it should almost never be the bottleneck. Transactions should not make a difference in terms of speed for the case of 1 single post.

>So there's a lot going on every time a post is made
It honestly is not much, you should only be limited by IO throughput to/from disk, everything else is done in-memory, just simple lookups and string operations. So if it's taking more than a few dozen ms, something is very wrong, I suspect some of the template building is not as optimized as it should be.

> The initial check actually took the longest(but only stalls sometimes), which has nothing to do with sql. Every other part takes less than a second
Yes, that's what I would expect. SQLite will almost never be the bottleneck, and I'd bet good money on that. (And if your benchmarks revealed sqlite was a bottleneck, I'd again bet good money that it's because you're doing your queries ineffeciently (e.g. using a write txn where it's not needed, haven't enabled WAL mode, etc.).

Can you elaborate more on your checks? What are they doing? For the formatting, I recall you did this via regex, and go uses re2 so that should be a roughly linear pass over the input, which should be OK.
>> No. 3416 [Edit]
>>3415
>SQLite can easily handle _at least_ 10K QPS write
Actually that's not quite true, I was mixing up parallel and sequential workloads.

If doing sequential updates one after the other with standard rollback journal, you'll only get around 50 updates per second. The performance will be terrible unless you enable write ahead logging. With WAL enabled, you should be able to hit 1k or so. Doing multiple updates in a single transaction will also help.

When talking about how SQLite can scale to multiple parallel writers, the answer is that it doesn't, it enforces db-level locking to serialize the writes. So multiple parallel writers is equivalent to multiple sequential writers, so refer to above paragraph.

For multiple sequential readers, SQLite can do pretty well: https://www.sqlite.org/np1queryprob.html. A given query should not take more than 1ms or so, so you get around 1k qps. Usually the bottleneck will be IO speed, or ffi interop between your language and sqlite.

Multiple parallel readers is where SQLite really shines, you'll probably hit file descriptor or memory limits before sqlite becomes the bottleneck. I'd guesstimate maybe 10k-100k, although apparently with a beefy machine and the right configs you can get to 4M

https://www.sqlite.org/speed.html
https://blog.expensify.com/2018/01/08/scaling-sqlite-to-4m-qps-on-a-single-server/
>> No. 3417 [Edit]
>>3416
Although I must say I got sniped into benchmarking this on my personal usages of sqlite where I have a particular case that involves a ton of sequential bulk-lookups. Each individual lookup completes in about 2ms or less, but with 1000+ lookups that's easily 2 seconds. And the time for a single query is same whether I use sqlite3 cli or from my program, so ffi time is irrelevant here.

I do observe that batching reads using transactions does help, which was actually surprising to me, but I guess even read transactions involve posix file-level locking: https://stackoverflow.com/questions/7349189/optimizing-select-with-transaction-under-sqlite-3

So I guess 500-1k selects per second is indeed about right.
>> No. 3418 [Edit]
File 166879209673.png - (23.03KB , 400x400 , a90508beeacbc15deb0e36716223dee0.png )
3418
>>3415
>>3416
>>3417
>Can you post the timing breakdown?

2022/11/18 11:55:56 Starting: 2022/11/18 11:56:02 Initial check complete: 2022/11/18 11:56:02 Formatting complete: 2022/11/18 11:56:02 Parent checking & subject adding complete: 2022/11/18 11:56:02 Image check complete: 2022/11/18 11:56:02 Image copying complete: 2022/11/18 11:56:02 Thumbnail making complete: 2022/11/18 11:56:02 Table inserting complete: 2022/11/18 11:56:02 Template making complete: 2022/11/18 11:56:02 Creating new thread file complete: 2022/11/18 11:56:02 Post retrieval complete: 2022/11/18 11:56:02 Subject retreival complete: 2022/11/18 11:56:02 Template execution complete: 2022/11/18 11:56:02 Complete: 2022/11/18 11:56:02 Indirect time:


I don't think my source code is too spaghetti to read. The starting point to the initial check being finished is this block of code.

if req.Method != "POST" { http.Error(w, "Method not allowed.", http.StatusMethodNotAllowed) return } file, handler, file_err := req.FormFile("file") no_text := (strings.TrimSpace(req.FormValue("newpost")) == "") if file_err != nil && no_text { http.Error(w, "Empty post.", http.StatusBadRequest) return } req.Body = http.MaxBytesReader(w, req.Body, max_upload_size) if err := req.ParseMultipartForm(max_upload_size); err != nil { http.Error(w, "Request size exceeds limit(10MB).", http.StatusBadRequest) return } post_length := len([]rune(req.FormValue("newpost"))) if post_length > max_post_length { http.Error(w, "Post exceeds character limit(10000). Post length: " + strconv.Itoa(post_length), http.StatusBadRequest) return } parent := req.FormValue("parent") board := req.FormValue("board") subject := req.FormValue("subject") option := req.FormValue("option") if parent == "" || board == "" { http.Error(w, "Board or parent thread not specified.", http.StatusBadRequest) return } if !(slices.Contains(Boards, board)) { http.Error(w, "Board is invalid.", http.StatusBadRequest) return }


Maybe slices is the problem? It's part of the standard library and all I use it for it checking if the given board is part of the valid board array. Maybe I should make the board array into a hashset or something and do away with it. In any case, I'll probably need to subdivide this block with timers to figure out the exact thing that which takes the longest.
>> No. 3419 [Edit]
>>3419
>>3418
Did the timestamps get cut off or did you only log at seconds-level granularity? That's way too coarse to be useful, log in millis.

Also Go should have some profiling tool that will give you a flamgraph.

Post edited on 18th Nov 2022, 10:45am
>> No. 3420 [Edit]
>>3419
Those are the default time stamps. I'll see if I can change that. And post results.
>> No. 3421 [Edit]
>>3420
Basically have a global var for lastTimestamp and function like endTimeAndRecord(msg) which will log(msg, now - lastTimestamp) and set lastTimestamp = now. Then you just need to initialize lastTimestamp = now at the very beginning of the request, and anywhere in the request lifecycle you can add endTimeAndRecord(msg) and it will log the delta since the last log message.
>> No. 3422 [Edit]
>>3421
Go's log has a millisecond option. I think I've narrowed it down pretty conclusively. Two times it took a while -

2022/11/18 14:33:33.658508 Starting 2022/11/18 14:33:33.658688 Method check complete 2022/11/18 14:33:39.516935 FormFile get complete 2022/11/18 14:33:39.517187 No text get complete 2022/11/18 14:33:39.517293 Empty test complete 2022/11/18 14:33:39.517385 Size check complete 2022/11/18 14:33:39.517472 Text length check complete 2022/11/18 14:33:39.517559 Meta get complete 2022/11/18 14:33:39.517644 Empty parent or board check complete 2022/11/18 14:33:39.517731 Valid board check complete. Initial check complete.


2022/11/18 14:41:36.135516 Starting 2022/11/18 14:41:36.135969 Method check complete 2022/11/18 14:41:40.612392 FormFile get complete 2022/11/18 14:41:40.612965 No text get complete 2022/11/18 14:41:40.613319 Empty test complete 2022/11/18 14:41:40.613516 Size check complete 2022/11/18 14:41:40.613685 Text length check complete 2022/11/18 14:41:40.613844 Meta get complete 2022/11/18 14:41:40.613969 Empty parent or board check complete 2022/11/18 14:41:40.614087 Valid board check complete. Initial check complete.


Almost every step takes about .00100 milliseconds, except the formfile one, which is a single line of code. It takes waaaaay longer.

file, handler, file_err := req.FormFile("file")

>> No. 3423 [Edit]
>>3422
I don't know anything about go, maybe see this [1]. It should not be taking 4 seconds to parse the input request, unless the client is intentionally throttling their upload.

[1] https://old.reddit.com/r/golang/comments/nihs0n/help_needed_for_uploading_large_file_with_little/
>> No. 3424 [Edit]
Also interesting https://sqlite-users.sqlite.narkive.com/2KMQOyUd/performance-of-select-in-transactions

If you're doing on the order of ~10k select queries, the cost of acquiring/releasing the shared lock on the DB starts to add up. The linked post mentions about a 1 sec difference (1 sec with all in 1 transaction, 2 sec if done individually), but I'm seeing much more, about a 6 seconds difference on about 50k reads (2 sec if done all in 1 transaction, 8 sec otherwise). It's possible that the language binding I'm using is doing some extra bookkeeping at the start of every transaction, and that's also contributing.
>> No. 3425 [Edit]
>>3423
The first recommended strategy in that thread (uploading in a stream), is good for uploading large files on systems with low ram, to prevent the OS from killing the program when ram runs out. My research did not indicate there would be any other advantages, like performance.

So I think the bottle neck is actually my hardware. I'm only working with 8gb of ram, a lot of which is being used up by other things(it's my laptop).

This link has a good explanation
https://dev.to/tobychui/upload-a-file-larger-than-ram-size-in-go-4m2i

Post edited on 18th Nov 2022, 7:11pm
>> No. 3426 [Edit]
>>3425
But isn't the file you're uploading only a few megs? Unless you're hitting swap I still don't see why there'd be a several second delay.

Some more tests to try: if you post replies without any images, is it always fast? I recommend you maybe log the request processing times to a file somewhere, so you can create a histogram and get the p50, p95, and p99 latencies. Is the lag dependent on file upload size? If you are indeed memory bound and hitting swap, uploading a kb file should not trigger anything while uploading a several MB file shoould.
>> No. 3427 [Edit]
File 16688294683.png - (1.51MB , 900x900 , yuki.png )
3427
>>3426
>if you post replies without any images, is it always fast?
Yes.

>log the request processing times
Don't know how to do that. Do you mean processing times according to my browser? At this point, I think I've already pinpointed the culprit to that one function.

>Is the lag dependent on file upload size?
Larger files always take longer. The ones below 1mb are usually instantaneous.

>uploading a kb file should not trigger anything
It's probably asking too much, but checking out the performance on another machine would be helpful. If the problem persists on a beefier machine, I'll know my hardware isn't likely the problem. If it counts for anything, TC is usually slower.

Post edited on 18th Nov 2022, 7:46pm
>> No. 3428 [Edit]
>>3427
Maybe someone else who already has a Go environment set up can assist with testing (I don't want to download golang just for this..)

I still don't think it's a hardware issue though. Assuming you're on linux with just 1 tab open on a browser, there should still be plenty of headroom. What's your swap utilization?
>> No. 3429 [Edit]
File 166883071273.png - (84.18KB , 2550x952 , curr usage.png )
3429
>>3428
>Assuming you're on linux with just 1 tab open on a browser
Nope, to all of that.
>> No. 3430 [Edit]
>>3429
You've still got about a gig left though. So I don't see why you'd be running into issues ingesting a few megs. I can't help much more, I don't know anything about that particular go library. You can probably profile inside that function with gdb or pprof or something, to see if there's anything blocking. Or use strace/bpftrace to see if you're being blocked on a syscall or something.

Worst case there should surely be some 3rd party replacement, just use that.
>> No. 3431 [Edit]
>>3430
>You can probably profile inside that function with gdb or pprof or something, to see if there's anything blocking. Or use strace/bpftrace to see if you're being blocked on a syscall or something.
That's way above my pay grade. I don't know if it was you, but somebody expressed a desire to use the engine I'm writing. If it's a problem for them as well, they can do that testing and propose a solution. I'm all ears. The advantage of oss is collaboration. I'm not selling a product.
>> No. 3432 [Edit]
>>3431
After doing more testing and looking at the source of things, I've narrowed it down to a function called ReadForm
https://cs.opensource.google/go/go/+/refs/tags/go1.19.3:src/mime/multipart/formdata.go;l=34

From the source code, it's described like this:
> ReadForm parses an entire multipart message whose parts have
> a Content-Disposition of "form-data".
> It stores up to maxMemory(parameter) bytes + 10MB (reserved for non-file parts)
> in memory. File parts which can't be stored in memory will be stored on
> disk in temporary files.

Can't dig any deeper because of private fields in necessary structs preventing me from copying code and adding timers.
>> No. 3433 [Edit]
File 166893071811.png - (371.17KB , 2269x2083 , profile005.png )
3433
>>3432
Also, here's a picture showing allocated memory. Doesn't really help me though.
>> No. 3434 [Edit]
>>3433
I've figured out that I can edit GO's source code, which comes with every installation of GO, and add logs to that. So I've narrowed it down even further to an io function called CopyN
https://cs.opensource.google/go/go/+/refs/tags/go1.19.3:src/io/io.go;drc=58a2db181b7cb2d51e462b6ea9c0026bba520055;l=361

> CopyN copies n bytes (or until an error) from src to dst.
> It returns the number of bytes copied and the earliest
> error encountered while copying.

I can't add logs to io, because the log package depends on io itself. I'm farily certain though that io.copyBuffer is ultimately responsible.

https://cs.opensource.google/go/go/+/refs/tags/go1.19.3:src/io/io.go;drc=58a2db181b7cb2d51e462b6ea9c0026bba520055;bpv=1;bpt=1;l=405

> copyBuffer is the actual implementation of Copy and CopyBuffer.
> if buf is nil, one is allocated.

Even if I knew what could be changed, I don't really feel that editing GO's source code is an appropriate solution to any technical problem.
>> No. 3435 [Edit]
>>3434
Seems like it's just blocking on IO then. Are you on ssd or spinning rust? But I still can't see why it'd block for 4 whole seconds to write a few mb to disk.

Post edited on 20th Nov 2022, 1:20pm
>> No. 3436 [Edit]
>>3435
>Are you on ssd or spinning rust?
SSD, 256gb.
>> No. 3437 [Edit]
>>3436
WSL might be at fault. Will try moving data outside of /mnt/ and see if that helps.
https://github.com/Microsoft/WSL/issues/873#issuecomment-411845341

Edit: did that and maybe there's some marginal improvement, but it's still not fast on files over 1mb.

Edit2: others have the same problem
https://github.com/microsoft/WSL/issues/4498#issuecomment-1265178042

Post edited on 22nd Nov 2022, 9:51am
>> No. 3438 [Edit]
File 166903915462.webm - (616.83KB , speedshow.webm )
3438
Sorry, for harping on this, but after just restarting my computer, it's now super fast, as you can see in this recording.
>> No. 3439 [Edit]
>>3438
I haven't been following closely as most of this thread's content is a bit beyond me, but I do like you having the reply box at the bottom of the page. Looking good.
>> No. 3440 [Edit]
File 167018546394.png - (3.89KB , 334x215 , tooltip.png )
3440
Finally changed posting and related queries to be inside an sql transaction. New threads are indicated by no parent being given, which removes the possibility of accidentally replying to a thread when you meant to make a new one.

Also changed the configuration format from TOML to INI. I'm not opinionated on confirmation formats given my little experience. Reading this article convinced me that INI is better. Go also has a nice INI library.
https://github.com/madmurphy/libconfini/wiki/An-INI-critique-of-TOML

Post edited on 4th Dec 2022, 12:25pm
[Return] [Entire Thread] [Last 50 posts]

View catalog

Delete post []
Password  
Report post
Reason  


[Home] [Manage]



[ Rules ] [ an / foe / ma / mp3 / vg / vn ] [ cr / fig / navi ] [ mai / ot / so / tat ] [ arc / ddl / irc / lol / ns / pic ] [ home ]