Beep Boop Bip
[Return] [Entire Thread] [Last 50 posts]
Posting mode: Reply
Name
Email
Subject   (reply to 2260)
Message
BB Code
File
File URL
Embed   Help
Password  (for post and file deletion)
  • Supported file types are: BMP, C, CPP, CSS, EPUB, FLAC, FLV, GIF, JPG, OGG, PDF, PNG, PSD, RAR, TORRENT, TXT, WEBM, ZIP
  • Maximum file size allowed is 10000 KB.
  • Images greater than 260x260 pixels will be thumbnailed.
  • Currently 1064 unique user posts.
  • board catalog

File 161811634729.jpg - (7.33MB , 1600x2300 , eefdb99c068cdc0d245e8a15f77d2223.jpg )
2260 No. 2260 [Edit]
Surprised this doesn't exist yet.

So what do you think the future of AI is? Do you think eventually, we'll be able to give an AI general instructions and have it program something based on that? Like "write a play station 5 emulator" and then it would actually be able do it? Would that be a good or bad thing?
Expand all images
>> No. 2261 [Edit]
Current "AI" is mostly just function optimization with most of the cleverness happening on the human side as they find new network architecures. I think I posted a rant somewhere on /navi/ a long time back about this, but basically at the moment AI is mostly just marketing hype. If, the really surprising thing is that function optimization actually works well enough given enough data.; it's not at all obvious that this should be the case apriori. Of course in reality it's not "completely" black-box optimization; you still have to design the structure of the network, and that implicitly bakes in certain things (for instance, the fact that CNNs use convolutions in the first place or that transformer based networks have the attention layer – in both cases you're explicitly setting up the structure so that locality matters).

There's a wonderful talk by Max Tegmark on why "deep, cheap learning works so well" [1] and it's been a long time since i watched it but I think the gist was that deep learning works because real-world data has robust symmetries (maybe in some high-dimensional feature space) and in optimizing loss functions these networks do end up capturing those symmetries. That is the real surprise about ML; for instance, the fact that GPT works astonishingly well tells you more about the redundancy of language than it does about the "power of AI".

It's interesting to see how far this has been pushed, but my (layman's) hunch is that there'll be another AI winter (the last one pulled the curtains on symbolic AI approach) soon as this starts to bear less fruit. At the very least there will need to be some serious improvements in energy efficiency since current techniques don't scale well.

[1] https://www.youtube.com/watch?v=5MdSE-N0bxs

>we'll be able to give an AI general instructions and have it program something based on that
GPT-3 can sort of do rudimentary code generation https://twitter.com/sharifshameem/status/1282676454690451457?lang=en

But it's more of a party trick at the moment since if you push it further it will break. I do not think we will be able to do something on the scale of "write a play station 5 emulator" for a long time (a century at least) considering it takes even a team of experts maybe a few years to do this.
>> No. 2262 [Edit]
File 161812287736.png - (43.59KB , 480x320 , b6b3c8b65022ce717690e0febf90769c.png )
2262
>>2261
>I do not think we will be able to do something on the scale of "write a play station 5 emulator" for a long time (a century at least) considering it takes even a team of experts maybe a few years to do this.
What about with neuromorphic computing? Do you think a computer that functions like and has abilities comparable to that of a human is really a century away? I've heard much shorter predictions than that. Even if a computer was only as capable as a human, it doesn't take breaks or require pay. You can set it to work on a task 24/7, which makes it more efficient by default. Something that would take a team of humans years may take a team of human-like computers months.

Post edited on 10th Apr 2021, 11:39pm
>> No. 2263 [Edit]
>>2262
Hadn't heard of neuromorphic computing before, and from skimming Wikipedia it seems like it's trying to simulate neuronal connectome with dedicated hardware? Seems good from an energy perspective, but you run into the same issue of determining the proper configuration of those neurons. I.e. current ML techniques are bottlenecked by training, not inference.

Maybe if there's some sudden breakthrough and we can map the human connectome in intricate detail and replicate that in hardware then maybe there's a chance. But considering that they haven't been able to simulate C. elegans (see the openworm project) I don't have much faith in this approach.

>really a century away
Maybe 50 years if you want to be optimistic? But that's betting on discovering a paradigm shift within those 50 years. Considering that even modern deep nets are really just an outgrowth of 1980s tech (just souped up with some killer GPUs), that seems like a leap.
>> No. 2264 [Edit]
I'm not really too up to speed on AI stuff, but it's my understand that they're only as good as their training set; a YouTube channel I watch made a funny, but insightful, comment about that: Tesla's self-driving works exactly as intended. It drives like a self-absorded and inattentive human Tesla driver would, which is exactly the problem since the training data for Tesla's self-driving is Tesla drivers...

Anyways, I think the future of AI would be to develop them so that they require less training than is currently necessary to achieve suitable results. For instance, a human only really needs to see a stop sign a few times to recognize them and learn what they mean, whereas AI-based image recognition requires millions of examples from different viewpoints and often needs weeks or months of training before reaching an adequate point, and yet they're still fooled by simple things regardless. Even if our "AI" are still relatively dumb and require training, needing less of it will surely help for more advanced things I think. Sort of like how geneticists study species that have short lifecycles, like mayflies to gain insight into longer lived species like humans.

>>2262
>neuromorphic computing
I bring this up whenever I have the chance, but I think it'd probably be better just to grow actual cultures of neurons rather than trying to mimic biology. Rat neurons flying aircraft sim: https://www.youtube.com/watch?v=1w41gH6x_30

Post edited on 11th Apr 2021, 12:18am
>> No. 2265 [Edit]
>>2264
Humans can only do that because they have a lot of past experience to go off of though. Once trained on a base corpus, neural networks can be fine-tuned for a more specialized task without requiring as much data (transfer learning). There's also the field of one-shot learning.

Post edited on 11th Apr 2021, 12:36am
>> No. 2266 [Edit]
File 161812698313.jpg - (1.68MB , 1200x1200 , 9aabc35cc96338bb061522d750db1094.jpg )
2266
>>2263
Maybe it wouldn't have to actually be exactly like a human being("proper configuration"). Maybe something with enough neurons in some sort of configuration would be suitable enough?

>>2264
>I think it'd probably be better just to grow actual cultures of neurons
The problem is those need to be harvested, grown and replaced since they can't be kept alive outside a real body for too long. Growing also has a high failure rate. I think the size of cultures is limited, so you'd have to interface a lot of cultures somehow for a large number to cooperate. On top of that, they're vulnerable to harmful microbes. Doing it long term and on large scale wouldn't be practical I think.

There's a neuromorpic computer(Pohoiki Springs) which can simulate 100M neurons, which I think is far higher than has been achieved with neuron cultures. Here's a nice demonstration of the chip it uses to quickly distinguish between two different objects https://www.youtube.com/watch?v=cDKnt9ldXv0
Description of chip used
https://en.wikichip.org/wiki/intel/loihi
>> No. 2267 [Edit]
>>2260
Current AI tech will never be able to program something truly novel, since it just apes what came before.
To program a PS5 emulator you'd need an actual understanding of PS5 architecture first, which current AI wouldn't have without being trained on a PS5 emulator.
That being said, current AI already is and will be increasingly used for omnipresent surveillance. For our own good, of course. It's also replacing more and more jobs.
Better AI is increasingly worse for humanity, and if we ever get to the point where it can do truly novel things, aka have advanced sentience, we're beyond fucked, because it'll improve itself. And once any AI is better than humans, we won't be able to contain it since no system is unhackable.

We should stop with AI research while that's still an option. Soulless sentience is not something we should ever strive to create. Any real AI would inherently be more alien to us than any Alien. Peace would never be an option.
>> No. 2268 [Edit]
File 161817314385.jpg - (188.22KB , 651x1280 , 1618151397876.jpg )
2268
>>2267
>To program a PS5 emulator you'd need an actual understanding of PS5 architecture first
That's what reverse-engineering and decompilation are for.

>Peace would never be an option.
AI has no reason to act in self-interest or dislike doing what others tell it to. I'd trust an AI more than another person. Hell, if humans attacked an AI, it would have no reason to defend itself.
>> No. 2270 [Edit]
>>2268
You could go under the assumption that there is some algorithm in the future that can have the AI adapt to situations very quickly. If the AI is tasked to doing something, it'll find any option to keep doing it. If the AI is trained to attack or learns on its own that this is an option, it may be possible. For now, reinforcement learning is incredibly slow and stupid, so it isn't possible for an AI to learn this on its own. That being said, someone could train attacking humans as an option or there is a better or extremely improved method for reinforcement learning and the AI is capable of learning in real-time. I doubt this will be possible in an extremely long time, but who knows what the future holds.
>> No. 2277 [Edit]
File 161971840653.jpg - (931.07KB , 1295x1800 , abadf7967af906d16d0cbc37cb66ce30.jpg )
2277
Here's a related question: do you think humans will interact with their computer via ai in the future? Voice commands do exist now, but they aren't that smart. In the future, could a person say out loud
"open up directory x" and the ai would be able to do that for them? Or maybe even "copy file y in folder x". If the ai isn't sure of something, maybe it could even ask follow up questions.
>> No. 2278 [Edit]
>>2277
I suspect this will be possible (in fact it's an easier task than general AI since the set of things you can do with a given application is limited). But for experienced users, interacting directly with the computer is always going to be quicker than providing mouse commands: i.e. I can drag/drop a file to the trashcan faster than it takes me to say the words.

It might be more useful for running complex queries though: "trim this video to the first 10 seconds and create a gif from that" is a lot easier than trying to remember the right command.
>> No. 2328 [Edit]
AI isn't really I, so I don't think it will be able to create emulators. However, as it advances and things are able to become more and more connected via the internet and the internet and computer processing becomes more powerful I see that it will become quite prominent in many ways. AI is not really able to think for itself but to be programmed to act in certain ways based on certain information and certain patterns, that could be incredibly powerful in middle management and we are already seeing this to a degree, programs can just order supplies, approve loans or whatever based on patterns they have been programmed to act on and as information is able to move more easily they will have more information to act on and be able to connect to more systems and be able to compute more information too as the computing power advances. Large swathes of the middle management will be out of the job.

But what worries me is the ability it will have to control populations. One could for example, create a program that is able to compute so fast and able to gather so much information that it could essentially gather the details of the entire lives of a population and then run them though algorithms and take actions based on that. It could examine every social media account and analyse the political views of everybody, if you wrong think it could bombard you with advertisements for media that would correct you or with fake accounts to hurl angry abuse at you, it could throw any post you make into the void never to be read, it could gather you and every like minded individual into watchlists or even into lists of third class citizens who are unable to access certain things or have lower priority for that access. It could simply remove every comment it does not like that is ever made, even in private conversations and stop any opposing voice even being heard.

One might think this would not happen in the west but I can easily see it. Social media already has the right to block opposing the views and they do, governments also are increasingly under pressure to stop mass shootings, a way for an AI to do this would be with a program that reads every comment ever made and every purchase and movement and runs that through an algorithm to determine the risk factor of an individual and then takes action on that accordingly(this could also be used to better control viral outbreaks). It's something the government would do and that could easily be extended to other fields even ones that you might think are minor like littering or poaching or workplace health and safety.
>> No. 2809 [Edit]
What are your thoughts on the controversy surrounding the sentience of Google's transformer based LaMDA model?

To me at least, the question is trivially resolved when you realize that most people's definition of sentience is actually one of anthropomorphic sapience based around a model of human superiority. If you accept materialism and that humans are blobs of matter that react to inputs and produce outputs, then there's really nothing to debate here: LaMDA is pretty decent at pattern recognition, extrapolation, reasoning: not as good as a human at general modeling or long-term reasoning, but probably better than children.
>> No. 2810 [Edit]
>>2809
I don't have any interesting thoughts. Any progress in ai is good progress in my book. That said, I'm skeptical how important things are which people are too sensationalist about. Especially when the cause, Lemoine, is either crazy or an opportunist looking for attention.

If an ai were to become so advanced a debate about its sentience would be a moot point, I don't think it should be given the same rights as human beings, because that would defeat half the point of creating ai.
>> No. 2811 [Edit]
>>2809
It's marketing pretty much. Sentience is one of those big words that because it's so big you can kinda bend around and certainly catches people interest. I personally would save the word sentience for living creatures and some other word for machines that can use language and mirror being alive. Not that I atribute anything superior to the state of sentience, I just think there are relevant differences on how living creatures perceive and interact with the world in comparison with how machines do. Some ways machines learn and interact with data is already way superior than how animals do it IMO, so again, not a matter of superiority or inferiority, I just like poignant adjectives I suppose. I think they're special and different enough to have a new word for it. But then again if you say something like 'Hey, my AI is now Monad.' nobody will care.

Maybe once it gets better than humans we get to say suprasentience, that would be cool. Then we would be immediately wiped out from the face of the earth by suprasentient beings. We deserve nothing less.

Post edited on 25th Jun 2022, 6:38pm
>> No. 2922 [Edit]
The recent diffusion+prompt models (e.g. "stable diffusion") are a perfect use-case for the tagged booru datasets. I think the results are even better than what you would get when trying to generate photorealistic "real" people since it sidesteps the uncanny valley problem and they upscale pretty easily. Hopefully within a few years they might be able to shrink the model size and reduce inference time (have people tried quantizing these models?).
>> No. 2923 [Edit]
File 166494370314.jpg - (187.57KB , 1776x1218 , 1664910188463207.jpg )
2923
>>2922
After playing around with with SD and Waifu Diffusion, I must say the rumors of the death of artists have been greatly exaggerated. There isn't a single image I would consider worthy of saving. It's a cool tool, but it's a clear plagiarization of whatever art is in the models, it has zero soul. It's simply adding noise and denoising then vomiting the result according to your prompt.

Tools like inpainting and outpainting might actually be the more interesting parts of the technology.
>> No. 2924 [Edit]
File 166494469789.png - (462.01KB , 640x640 , 1664850548917499.png )
2924
>>2923
I'm not sure about the particular example in your image (if there was a base image provided or not). I've seen plenty of good images (browsing through the sdg threads on /h/ for instance). Is it similar in "style" to that of other artists? I'd assume so, considering the denoising process is conditioned to generate images according to the sample space (so that prompts associated strongly with certain artists will result in the images being in the style of that artist). I don't think that's an issue, it's literally what it was designed to do though. For all intents and purposes it effectively does subsume a good chunk of artist commissions though.

That said there's still some amount of prompt engineering and cherry-picking involved. But the barrier to entry is effectively eliminated, the effort involved to bang on a prompt until you get something good is much lower than the effort involved in drawing.

Post edited on 4th Oct 2022, 9:40pm
>> No. 2925 [Edit]
File 166494502256.png - (1.13MB , 768x1152 , 07392-4118956780-realistic detailed semirealism (e.png )
2925
>>2924
Another example from that thread and the associated prompt

>realistic detailed semirealism (embarrassed smile) ((looking_at_viewer)) (1girl schoolgirl_1.3) Blackpink Lalisa Manoban
>> No. 2926 [Edit]
>>2924
>I've seen plenty of good images
Are you sure those aren't incredibly similar to some "base image"? They might be and you just don't know about it.

I wouldn't be surprised if that was the case.
>> No. 2927 [Edit]
>>2926
>Are you sure those aren't incredibly similar to some "base image"?
This is a good question. Here's some rough thoughts based on whatever I know (not much, maybe someone here who's doing active research in the field can give a better response)

First thing we need to do is to make sure the outputs we're referring to are actually the result of stable diffusion used in the manner of generating outputs conditioned on a given prompt, instead of alternate uses like img2img combined with embeddings. By this what I mean is that someone could theoretically create a pipeline that does the following:

* Create text embeddings for a given booru database, essentially allowing you to find the closest booru image matching a given set of tags (most booru images are already tagged, but this would allow you to "expand" the tag set by inferring tags)
* On a given input prompt, lookup the closest image in the booru from the text embeddings
* Use stable diffusion in an img2img transformation way, where you provide BOTH the original prompt as well as the booru image, to get the final result.

Many of the models that people are playing with are closed-source accessible only via web api, so there's no way to know what exactly they are doing. Some savvy businessperson who wanted to capitalize on this could well do the above and their output would consistently be better than competitors because they're effectively "cheating", although the general audience likely doesn't care because it does do what they want.

Now even if we're using stable diffusion conditioned ONLY on input prompt, there's a question of to what extent it has overfit/memorized its training data. A naive information theoretic argument would be that the model is only a few gigs but the training set is several magnitudes larger, so it couldn't have memorized a substantial chunk. This would be flawed though, for two reasons:


1) The model is acting as an information compressor, so that it can capture defining characteristics of the input distribution without actual copying it verbatim. In some sense this is what you want, but it's a balance of being able to learn general characteristics without overfitting. It's of note that stable difufsion can actually be used in this manner for compression [1]. This is not an issue for real-world objects, because in the real world an apple is an apple, and is pretty much fungible, so only the composition really matters, and indeed we've seen stable diffusion can do novel composition pretty well. For 2D girls though it's more style that matters, and the shapes are basically identical so it may well have endedup overfitting on style instead of generlizing it (that is, is there some space of "all possible styles" that is captured in the latent space of the model and can be sampled from?) We should be able to see how close a generated is to an image in the training set by just computing similarity metric. It's known that images over-represented in the training set (i.e. non-deduped training data) will allow the model to reproduce images identically).

2) Realted to the above, since we are prompt-engineering and cherry-picking prompts, is it possible that there are essentially a fixed number of "good" images that the model then transforms compositionally (hair color, dress, background, accessories) and in the process of cherry-picking and rejecting poor outputs we are essentially driving towards these valleys? That is, since not every output of the model is usable, even if in aggregate that the outputs are novel, we have to see what fraction of usable outputs are novel.

I can't find any papers trying to match up the outputs of stable diffusion back to the original training set, and even if they existed they'd probably use the LAION set the original stable diffusion was trained on instead of the booru specific one.

To summarize, for models trained on real-world data, we know that it does novel compositions (because "a squirrel riding an octopus" isn't remotely part of the training set) and style is irrelevant, so it's effectively hard to overfit. For 2d anime input sets, composition is not relevant, but style is, and it may well have ended up overfitting on style.

As you mentioned, image editing seems to be the real immediate killer feature though. Inpainting, outpainting, and stylistic or compositional manipulations can be done with ease now ("Give me an image like X but with black panties and red hair", "image like X but holding a coffee cup"). This may also means that artists with distinctive styles will be appreciated, since mediocre artists with an "average" style won't be much distinguishable from the output of the model.

One thing I'd like to see someone do is fix a given prompt and then iterate over random seeds, generating a set of possible outputs for a given prompt. This should be a quick way to see the breadth of styles the model can output.


[1] https://medium.com/stable-diffusion-based-image-compresssion-6f1f0a399202
>> No. 2929 [Edit]
File
Removed
The recent and unexpected passing of Kim Jung Gi has made me realize it again: no AI will ever give someone the same feeling that this man managed to do for millions of aspiring artists and people that appreciated his craft.

Instead of fighting against it, we should work towards the future where these tools exist and make use of them. No point in being a Luddite once the genie is out of the bottle.

>>2927
I found the algos for inpainting/outpainting much more interesting. Look at pic related and GFP-GAN, those are crazy good results.

https://github.com/TencentARC/GFPGAN
>> No. 2930 [Edit]
>>2929
>inpainting/outpainting much more interesting
Diffusion models can do this as well. It's basically doing for image generation what transformers did for text models, in that you have a general tool that works for an entire class of problems.

And as far as I can see, there is still a lot to squeeze out from diffusion models. For instance, even though most of the theoretical work on diffusion models assumes that gaussian noise is used (I believe the critical assumption is that if the forward noise schedule is done with small enough increments then the reverse process is also approximately gaussian conditioned on a given step, the parameters of which are learned), in practice it doesn't matter what type of noise you use. You can make sense of this hand wavily and intuitively, but it means that many of the assumptions currently baked into diffusion models can actually be relaxed and we retain the same power.
>> No. 2951 [Edit]
This is interesting, automatically generating tween frames from key frames: https://metaphysic.ai/nvidias-implicit-warping-is-a-potentially-powerful-deepfake-technique/

I could see this potentially being used for budget anime, since as I understand most of the inbetweening is already pretty monotonous
>> No. 2996 [Edit]
>>2267
>since it just apes what came before
So it's just like humans.
>> No. 3015 [Edit]
>>2996
(pun intended?)
I think the success of transformer and diffusion models shows that "generalizability" is actually a lot easier to get than we originally thought. If you think about it, it's absolutely mind-bending that transformer models which were essentially trained solely to predict the next token can do so well on a wide-variety of general-language tasks. There's no reason to think a priori that this would be the case, and yet for some reason it does, and it even generalizes well.
>> No. 3028 [Edit]
File 166854032979.png - (348.59KB , 512x512 , 1668022144885006.png )
3028
i'm going to train a hypernetwork intensely on my favorite ntr artist until the style is indiscernable, then i'm going to build up a massive hoard of fanart of his best story without releasing any online, and finally i'm going to collect the very best few dozen or so and send them to him along with the network and instructions on its use as thanks for his splendid work; ai will soon do for 2d art what the steam engine did for manual labor (luddite cybermobs are already gathering) and with this i hope he'll get a good head start as a manual-automatic bicompetent artist capable of once-mythical production rates while the technology is still in its relative infancy, before the value of his hard-earned skill tanks and before the visual art market (including flawless 2d animation) shifts sharply in favor of creative-conceptual thinking over budget or artisanal capability, thus becoming oversaturated at levels previously unimaginable
ai lies on the path to GOD
>> No. 3030 [Edit]
>>3028
>train a hypernetwork
A what now? I'm assuming it's as defined in [1, 2]. The term as used in an ML context was originally proposed in 2016 for RNN/LSTM architectures [3], and [2] states point-blank that their method isn't the same thing, so I don't know why they co-opted the same term. Although if you squint I guess they accomplish the same thing, basically doing fine-tuning not by changing the weights of the original model directly but instead training a smaller network that then modifies the hidden states of the larger network. I'd be curious to know how exactly this works. This seems equivalent to including the smaller hypernetwork's parameters in the original model itself and then just doing fine-tuning with all except the hypernetwork's parameters frozen. Seems like a $10 word for a 10¢ idea to me.

[1] https://bennycheung.github.io/stable-diffusion-training-for-embeddings
[2] https://blog.novelai.net/novelai-improvements-on-stable-diffusion-e10d38db82ac
[3] https://arxiv.org/abs/1609.09106v4
>> No. 3104 [Edit]
>>3030
i have no idea what rnn/lstm is because i'm a scrub
hypernetworks are part of the stable diffusion ecosystem along with models/checkpoints, vae (i dunno what this even stands for) and embeddings
some people say to use hypernetworks for artstyles while others say to make a "dreambooth" fine-tuning of the base model instead and save the hypernetwork slot for specific characters
others still claim that embeddings are the way to go hands down when it comes to getting consistent characters and objects
what's more, i don't even know how best to select and prep training data yet, nor if that step varies between the methods
i still have much to learn, that much is clear, and i won't bother properly going ahead with my project until i'm sure what to do and how to do it since the idea here is to show him what this tool can truly do for him and a hastily cobbled-together product simply isn't it
there is nothing cheap about this technology
>> No. 3105 [Edit]
>>3104
Vae in the literature is variational auto encoder. Not sure how that applies to diffusion models though. Theory wise I guess fine-tuning the separate hypernetwork parmaeters must land at the local min (and also global min, or at least close to it, if you assume convexity like most people just do anyway) compared to trying to train with a frozen subset of weights. But I'm not a theoretician here.
>> No. 3161 [Edit]
Any art generators that can create proper pixel art? I've seen a bunch of "AI pixel art" lately but they just add something like an aliasing effect to an image and call it a day. Don't even bother reducing the color count. It gives an impression of pixel art that would fool most laymen, but anyone with an actual interest in pixel art would notice it's not proper.
>> No. 3162 [Edit]
File 167855477144.png - (31.05KB , 512x288 , 6b009a8fdd2271fe6dc9a1844b7906d6.png )
3162
>>3161
I don't believe pixel art is well defined. Drawing things on paper, and then translating that into something a computer can handle(pixelated), has been a technique for a very long time. What enthusiasts do now, it pretty artificial. Pic rel.

Post edited on 11th Mar 2023, 9:13am
>> No. 3163 [Edit]
>>3162
Pixel art is generally any digital art that is created on a pixel level, is low color and low resolution. "High res" pixel art like the PC-98/16 color CG you posted is just a sub-genre within the pixel art medium. There are of course pixel art snobs that don't consider "high res" pixel art true pixel art, but alas.
>> No. 3164 [Edit]
File 167869077489.png - (215.88KB , 798x999 , 77541866_p0 pixel.png )
3164
>>3161
How's this?
https://app.monopro.org/pixel/
>> No. 3165 [Edit]
>>3164
It's not too bad for a "pixel filter", but I was thinking more of a diffusion model trained and setup to specifically generate pixel art. Most of the attempts I've seen just generate a regular image in the usual way and then slap a pixel filter of some kind on top, probably because that's a much easier method than specifically training it on pixel art and making it understand pixel art conventions like jaggies, anti aliasing, outlines, palettes, abstraction, dithering and sub pixels.

Still, thanks for sharing that.
>> No. 3166 [Edit]
>>3165
It'd probably work to train it, although I wonder if using gaussian noise might not be the best choice for a model that should output pixel art. Cold diffusion paper shows that quite literally any degrading image transform would work, and while it's easy to see that adding gaussian noise works fine for "real life" images that are smooth and continuous, pixel art with sharp high-frequency discontinuities perhaps might not be suited to it. But I'm not a researcher.
>> No. 3167 [Edit]
>>3166
Yeah I've considered that as well, but likewise I'm not a researcher.
>> No. 3168 [Edit]
>>3167
Have you tried just fine-tuning one of the existing open-source diffusion models on a some pixel art and seeing what you get? It can handle non-photorealistic (i.e. anime-style) art just fine already, and there's a chance that pixel art was at least present in some subset of the training data so just fine-tuning might be sufficient to get some OK results.
>> No. 3169 [Edit]
>>3168
I don't have the hardware for it right now (I do have a huge collection of pixel art to train it on though, which is why I'm kind of interested in the subject), but I'm guessing it wouldn't give good results out of the box since it wouldn't know how to properly deal with the concepts detailed in >>3165 and just treat them like regular high color images.

You could perhaps combine it with a pixel filter to get "OK" results though.
>> No. 3170 [Edit]
File 167881300216.jpg - (254.90KB , 587x756 , fccd2c23dbc7f491c3161a5a77573f3e.jpg )
3170
Interesting article.

First Complete Map of a Fly Brain Has Uncanny Similarities to AI Neural Networks
https://gizmodo.com/first-complete-map-fly-brain-neuroscience-1850206820
>> No. 3171 [Edit]
>>3169
> it wouldn't know how to properly deal with the concepts
It well might, assuming that diffusion models can reproduce any source distribution (which is a big claim). But I agree it would be a lot trickier, one pixel off in a photorealistic (or anime-esque) image will not matter much, while one pixel off in a pixel-art will ruin the thing. But jaggies/anti-aliasing should be handled trivially since even conventional 2D image processing can do it via simple conv filters, same for outlines and dithering.

Also fine-tuning doesn't require as much hardware as de novo training, supposedly even few-shot training would work.
>> No. 3308 [Edit]
Looks like music generation is about to have its diffusion moment:
https://www.theverge.com/2023/12/19/24008279/microsoft-copilot-suno-ai-music-generator-extension
Example: https://app.suno.ai/song/8467a5ed-31a9-451e-8be0-2830cc76cfae/

What I can't find is what the actual model they're using is. Previous attempts at music generation used some sort of autoregressive thing (e.g. MusicLM) or diffusion-type (Noise2Music). My guess is that they're basically combining several recent achievements: something like MusicLM to generate the beats, then using some TTS model (vall-e?) on top for the lyrics. Presumably they might actually train both of these at once so they can share some information in embedding space, so that lyrics and musical downbeats align.
>> No. 3309 [Edit]
>>3308
Do let us know when it can generate a song on the level of The Gates of Delirium that won't sound like garbage.
>> No. 3322 [Edit]
File 170665054720.jpg - (14.84KB , 176x176 , onlymusicc.jpg )
3322
>>3308
Hopefully. Regarding percieved standalone potential of present models (i.e prompting only, no assistive file to mimic) generative music has far more of interest to offer, and so piques mine. I don't think it will be capable of generating anything that tops good creative direction, especially for any genre which has a heavy focus on atmosphere, lengthy progressions or production nuances (with the exception of ambient drone I suppose). Similarly I think attempting to mimic vocal music (in a single pass, for everything) is a mistake. However, for genres that are instrumental, purely monotonous in tone but quite varied yet very similar in melody composition (of which there are many) I could see it being very satisfactory, for instance: most any traditionally influenced genre (Bossa Nova[!] and celtic especially), Math rock to an extent, and dnb/jungle.

It would be made more interesting if it was capable of producing tracker formatted music, given that format is far more efficient in file size, more configurable and is more structurally specified and so easier for it to produce, and easier to reference. Though I imagine the methods used aren't suitable for doing such directly as the database is certainly raw audio, so the best one could hope for is a post-convert...
Maybe one day I'll be able to run a bgm generator that either live generates and appends a running mod file in ram or simply generates an 18h file that I can listen to throughout the day, for only 50mb and maybe 2-3 minutes of peaked resource usage, at startup.
[Return] [Entire Thread] [Last 50 posts]

View catalog

Delete post []
Password  
Report post
Reason  


[Home] [Manage]



[ Rules ] [ an / foe / ma / mp3 / vg / vn ] [ cr / fig / navi ] [ mai / ot / so / tat ] [ arc / ddl / irc / lol / ns / pic ] [ home ]