inb4: IPFS doesn’t work, unfortunately as you cannot provide the hash of an arbitrarily large file and retrieve it from the network. IPFS content IDs (CID) are a hash of the tree of chunks. Changes to chunk size can also change the hash!

Basically, I’d like to take the SHA256, SHA3, blake2, md5, of a file and either retrieve it from a network or get a list of sources for that file. Does something like that exist already or will I have to build it?

If I have to build it

it will be a really simple, dumb, HTTP service with

  • GET /uris/:hash:?alg=sha256|md5|blake
  • POST /uri/:hash: with the contents being a URI to the file
    supported URI schemes would probably be HTTP/S and FTP. Maybe P2P protocols like IPFS and if there’s a way to target a specific file in a torrent, maybe magnet links too. But that’s feels like risky territory.

Of course for hashing requests it would have a limited task queue (maybe 5 in parallel?), rate limiting by IP, and a size limit for retrieval (1GB feels like more than enough).

Can’t think of a way to do it with a DHT 🤷

  • tinkralge@programming.devOP
    link
    fedilink
    English
    arrow-up
    0
    ·
    12 days ago

    I’m not sure what your concern is. I’d basically like to call a function retrieveFile(fileHash) and get bytes back. Or call retrieveFileLocations(fileHash) and get URIs back to where the file can be downloaded. Also, it’ll be opensource, so nothing to reverse engineer.

    • FiskFisk33@startrek.website
      link
      fedilink
      arrow-up
      0
      ·
      edit-2
      12 days ago

      md5 for example is already vulnerable. People have figured out how to manipulate data into having a pre-specified hash. Meaning someone could engineer deliberate hash collisions and serve you any file they like.

      SHA-256 doesn’t (i think) have this issue, so far hah.