CodeWalrus

Development => PC, Mac & Vintage Computers => Topic started by: gameblabla on January 21, 2018, 01:03:38 AM

Title: Rule34 image downloader
Post by: gameblabla on January 21, 2018, 01:03:38 AM
So i met a guy on Discord and he made this image file scrapper in Python :
https://github.com/sunx2/r34py (https://github.com/sunx2/r34py)

Unfortunately, reading the source code was like trying to de-obfuscate it.
Plus, i had to use proxychains to make it work over Tor.

Hence why i decided as an exercise to re-implement it in C.
The main issues i had were to implement a function to crawl over the html files and find the image links.
This was the first time i did something like this so i took me a few more hours.
I eventually decided i should look for a pattern that would allow me to find them links fairly easily.

<a href="http:///....>


I eventually shortened the code so it only looks at  "<", ":" and if it is preceded by "r" two characters before the link.
Believe or not it works and i can now download all of my c goodness from the command line.

Then, i extended it so it downloads it from all the pages (just a loop really) and here are the results :
https://github.com/gameblabla/r34downloader_curl (https://github.com/gameblabla/r34downloader_curl)

It uses curl for downloading things. Once i work on the SDL gui interface, it could be ported to other platforms too.

Lemme now what you think about it. You must compile it from source.
Title: Re: Rule34 image downloader
Post by: Dream of Omnimaga on January 23, 2018, 09:12:54 PM
Is the only content allowed rule 34/c material? :P
Title: Re: Rule34 image downloader
Post by: _iPhoenix_ on January 24, 2018, 12:08:05 AM
It's really easy to do it manually, too. Most OS's provide an option to download a webpage and it's assets, and you can filter by image type.
Title: Re: Rule34 image downloader
Post by: gameblabla on January 24, 2018, 10:26:06 PM
Quote from: xlibman on January 23, 2018, 09:12:54 PM
Is the only content allowed rule 34/c material? :P
Well the website itself only allows that so yeah. But this could be adapted to other websites but most don't post direct links
to the full pictures so yeah...

Quote from: _iPhoenix_ on January 24, 2018, 12:08:05 AM
It's really easy to do it manually, too. Most OS's provide an option to download a webpage and it's assets, and you can filter by image type.
Well yes, that is true but the point was to only download what i wanted, namely just the images themselves and not the thumbnails, ads or other random c. I doubt the OSes allow that very easily.
Title: Re: Rule34 image downloader
Post by: gameblabla on January 25, 2018, 08:26:00 PM
Well, i tried it with some other tags like Mario and i realized it would not pick up some images properly.
So i fixed that and it works properly now.
Title: Re: Rule34 image downloader
Post by: gameblabla on May 11, 2018, 08:26:45 PM
If i had an advice for aspiring developers, it's to not use fgets.
Just use fread instead, it can do the same things (together with fseek) and is much less buggier and more portable.
It's also more predictable and less confusing.

I was using fgets because i was under the impression that the formatting would not be the same if i used fread.
Of course i was wrong, it doesn't matter...

So yeah, i fixed that and a few small things (such as arrays not being cleared properly) and the program finally now works properly for multiple pages.
Of course, it still works well over tor. (i did not dare to do it over clearnet because i don't want the govt to know my norp habits...)
Title: Re: Rule34 image downloader
Post by: Dream of Omnimaga on May 14, 2018, 03:34:58 PM
This will be handy for downloading abacus c.