Join us on Discord!
You can help CodeWalrus stay online by donating here.

Rule34 image downloader

Started by gameblabla, January 21, 2018, 01:03:38 AM

Previous topic - Next topic

0 Members and 3 Guests are viewing this topic.

gameblabla

So i met a guy on Discord and he made this image file scrapper in Python :
https://github.com/sunx2/r34py

Unfortunately, reading the source code was like trying to de-obfuscate it.
Plus, i had to use proxychains to make it work over Tor.

Hence why i decided as an exercise to re-implement it in C.
The main issues i had were to implement a function to crawl over the html files and find the image links.
This was the first time i did something like this so i took me a few more hours.
I eventually decided i should look for a pattern that would allow me to find them links fairly easily.

<a href="http:///....>


I eventually shortened the code so it only looks at  "<", ":" and if it is preceded by "r" two characters before the link.
Believe or not it works and i can now download all of my c goodness from the command line.

Then, i extended it so it downloads it from all the pages (just a loop really) and here are the results :
https://github.com/gameblabla/r34downloader_curl

It uses curl for downloading things. Once i work on the SDL gui interface, it could be ported to other platforms too.

Lemme now what you think about it. You must compile it from source.
  • Calculators owned: None (used to own an Nspire and TI-89)

Dream of Omnimaga

Is the only content allowed rule 34/c material? :P
  • Calculators owned: TI-82 Advanced Edition Python TI-84+ TI-84+CSE TI-84+CE TI-84+CEP TI-86 TI-89T cfx-9940GT fx-7400G+ fx 1.0+ fx-9750G+ fx-9860G fx-CG10 HP 49g+ HP 39g+ HP 39gs (bricked) HP 39gII HP Prime G1 HP Prime G2 Sharp EL-9600C
  • Consoles, mobile devices and vintage computers owned: Huawei P30 Lite, Moto G 5G, Nintendo 64 (broken), Playstation, Wii U

_iPhoenix_

It's really easy to do it manually, too. Most OS's provide an option to download a webpage and it's assets, and you can filter by image type.
  • Calculators owned: Two TI-84+ CE's
Please spam here: https://legend-of-iphoenix.github.io/spam/

"walruses are better than tuxedo chickens, all hail the great :walrii:" ~ me
Evolution of my avatar:

gameblabla

Quote from: xlibman on January 23, 2018, 09:12:54 PM
Is the only content allowed rule 34/c material? :P
Well the website itself only allows that so yeah. But this could be adapted to other websites but most don't post direct links
to the full pictures so yeah...

Quote from: _iPhoenix_ on January 24, 2018, 12:08:05 AM
It's really easy to do it manually, too. Most OS's provide an option to download a webpage and it's assets, and you can filter by image type.
Well yes, that is true but the point was to only download what i wanted, namely just the images themselves and not the thumbnails, ads or other random c. I doubt the OSes allow that very easily.
  • Calculators owned: None (used to own an Nspire and TI-89)

gameblabla

Well, i tried it with some other tags like Mario and i realized it would not pick up some images properly.
So i fixed that and it works properly now.
  • Calculators owned: None (used to own an Nspire and TI-89)

gameblabla

If i had an advice for aspiring developers, it's to not use fgets.
Just use fread instead, it can do the same things (together with fseek) and is much less buggier and more portable.
It's also more predictable and less confusing.

I was using fgets because i was under the impression that the formatting would not be the same if i used fread.
Of course i was wrong, it doesn't matter...

So yeah, i fixed that and a few small things (such as arrays not being cleared properly) and the program finally now works properly for multiple pages.
Of course, it still works well over tor. (i did not dare to do it over clearnet because i don't want the govt to know my norp habits...)
  • Calculators owned: None (used to own an Nspire and TI-89)

Dream of Omnimaga

This will be handy for downloading abacus c.
  • Calculators owned: TI-82 Advanced Edition Python TI-84+ TI-84+CSE TI-84+CE TI-84+CEP TI-86 TI-89T cfx-9940GT fx-7400G+ fx 1.0+ fx-9750G+ fx-9860G fx-CG10 HP 49g+ HP 39g+ HP 39gs (bricked) HP 39gII HP Prime G1 HP Prime G2 Sharp EL-9600C
  • Consoles, mobile devices and vintage computers owned: Huawei P30 Lite, Moto G 5G, Nintendo 64 (broken), Playstation, Wii U

Powered by EzPortal