This thread has been archived. It is now read-only. Topics
Rules
More
Categories News Feedback Android Desktop Java Desktop Rust Relays DHT Bootstrap DKT Website Bug Report Dev All Categories

DKT Hotspot Concept

DKT

Octo 1/9/2025

Sharding

Sharding is the idea of splitting evenly, in the sense that a keyword is more popular than another we don't want to have hotspots so we would "shard" to split the load evenly.

Lets assume that we have a keyword like "1080p" we know that this keyword is extremely popular. To avoid 5 nodes having to take on a substantial amount of storage and bandwidth just to serve 1 keyword, while the other nodes have serve a small amount of data. We could force each node to only accept up to 50 results per keyword.

Considerations

We would still have to worry about keyword intersections (if you want to search for more than 1 keyword at a time). Another thing to account for would be knowing which nodes are full and which nodes are not (which nodes should receive the new result).

PUT

We should have a distributed tracking method for keywords being 5 nodes closest to the hash of the keyword that stores a count for keywords. For example if we have 56 total results for "1080p" it should be notified when a keyword is added with a cache time of 24 hours.

When a user wishes to add a keyword they will first ask for a count from the tracker for example:
COUNT_CHECK_REQ

1080p

COUNT_CHECK_RES

56

The user now is aware that there is 56 total results for that keyword so it should now hash the keyword in this manor sha((count/50)+keyword). It will then send this as a store keyword to the 5 nodes closest via XOR.
PUT_KEYWORD_REQ

1080p

PUT_KEYWORD_RES

stored

The nodes that received this should now notify the tracker that they are storing the keyword.
COUNT_UPDATE_REQ

1080p

COUNT_UPDATE_RES

updated

GET

The user will hash the keyword sha(keyword) and find the tracker and request a count example:
COUNT_CHECK_REQ

1080p

COUNT_CHECK_RES

56

Now that the user has an understanding of the total keywords it will select a random number 0 - count/50 which we will call x. It will then do sha(x+keyword) and XOR the nodes closest to that hash for results.
GET_KEYWORD_REQ

1080p

GET_KEYWORD_RES

list of infohashes

Now that we have discussed the issue regarding the sharding portion we should get into how we can handle keyword intersections. My theory is that we hash each keyword individually then place them in an index accordingly so that order will not be a factor. We can limit to 10 keywords per torrent, which is the same amount used for website keywords within the meta, most users only search 1.9 words when doing a query anyways as given by Google. We know that the total keywords that would have to be stored every 24 hours would be allot but not unreasonable, at a maximum of 1,023 total keywords stored per torrent.