inb4: IPFS doesn’t work, unfortunately as you cannot provide the hash of an arbitrarily large file and retrieve it from the network. IPFS content IDs (CID) are a hash of the tree of chunks. Changes to chunk size can also change the hash!
Basically, I’d like to take the SHA256, SHA3, blake2, md5, of a file and either retrieve it from a network or get a list of sources for that file. Does something like that exist already or will I have to build it?
If I have to build it
it will be a really simple, dumb, HTTP service with
GET /uris/:hash:?alg=sha256|md5|blake
POST /uri/:hash:
with the contents being a URI to the file
supported URI schemes would probably be HTTP/S and FTP. Maybe P2P protocols like IPFS and if there’s a way to target a specific file in a torrent, maybe magnet links too. But that’s feels like risky territory.
Of course for hashing requests it would have a limited task queue (maybe 5 in parallel?), rate limiting by IP, and a size limit for retrieval (1GB feels like more than enough).
Can’t think of a way to do it with a DHT 🤷
If two files have the same hash, you may receive the file you request by hash, or you may receive a different, possibly malicious file.
https://en.m.wikipedia.org/wiki/Collision_attack
Strong cryptographic hashes are resistant to such attacks, but md5 is relatively weak.
Absolutely. An example of a malicious collision would be to request the file with the SHA-1 of 38762cf7f55934b34d179ae6a4c80cadccbb7f0a. But… there’s two of them here.
MD5 is so broken that its former status as a cryptographic hash function has been stripped. And efforts are underway to replace SHA-1 where it’s used, since although it takes some prerequisites to intentionally create a SHA-1 collision today, it’s worth remembering that “attacks always get better, they never get worse”.