postcd
Weaksauce
- Joined
- Nov 24, 2016
- Messages
- 94
Hello,
i want to speak up about my problem i am long time thinking about and unable to solve despite of searching and also asking on other sites. Sumary is in the title of this page.
Currently i am having Windows system that i do not want to leave.
I need to store around 7TB of data and growing maybe 150GB/mo., mainly movies and also data of an application which is utilizing roughly million files (mostly small files of total size maybe 500GB). Currently i am on HDDs (mentioned app files on one ext. HDD and movies, etc on other external, then system hdd), but i would welcome if i can speed UP the storage at least by 100% (-> read striping) and also i am running out of disk space.
ISSUE: need more space than SSDs can offer and bigger speed than single HDD can offer.
i think i may need to create some storage "pool" out of my HDDs. I want this storage pool be encrypted after i shutdown computer. I would need to use USB enclosures i think as i do not want to buy expensive and power hungry NAS and i do not trust in compatibility of the network attached filesystem to my WIndows 10 PC (that all apps will have no problem using it).
I can use Windows app DrivePool to combine multiple USB drives, this app do not offer deduplication (if the virtual storage pool - NTFS filesystem contains one file in multiple directories, this SW is not able to save physical disk space by utilizing physical space equal to 1 file only). But no problem, maybe i should find some SW that will do it regularly for me by replacing dupes by hardlinks?
I do not need real time deduplication as i read that the ZFS filesystem and maybe BTRFS filesystem's deduplicaton feature requires lets say 6GB RAM + 1GB RAM per 1TB of the storage and also additional CPU time (https://hardforum.com/threads/zfs-dedupe-is-fixed-soon.1854062/) - that is too much resources for me. Thinking if there is some software solution that do not require me purchasing expensive minipc with big RAM..
Maybe i am complicating it too much, maybe i should buy 2 large external HDDs and continue like now, but i am starting to need more space than largest affordable SSDs offer and also more than one HDD speed (IOPS) - (at least for certain file directories/files of like 500GB size needs to have fast read than current single HDD offers). I can not move them to SSD, because they are part of the APP which also contains very large files needing terabytes of storage. I tried to use symlinks (symbolic links) to a different HDD, but the app not worked with symlinks (hardlinks on windows only within single drive).
I do not know what to do.
Note that i prefer external 2.5" USB drives because of lower: noise,PWRconsumption,price.
I do not wish to run some old noisy computer/NAS just to allow me connect 2-3 HDDs together. Also i wish i can prevent buying additional expensive or power hungry hardware. So i prefer (not insist) software solution or solution using simple static HW. DrivePool SW can do everything i want, except it will waste my disk space (in my case space wasted would be in terabytes) because it can not handle duplicates on storage - maybe i should find a Windows SW that will regularly scan for dupes and do replacement by hardlinks? Symlinks does not help as mentioned.
Maybe i should buy some miniPC like Rpi (<10Watt) to offload my HDD IOPS and CPU cycles (which is also becoming problem). I only do not know how i would reliably and simply connect my Windows PC storage and Linux PC storage. From this point looks better to utilize single storage.
Advices on good setup are very welcome, thank you in advance and sorry for hard reading (i am not native speaker).
---------------------
Interesting comments found:
(unsure if quoted text shows it can work in my case somehow, but the Dragonfly download page shows no ARM CPU support, it says "DragonFly BSD is 64-bit only")
i want to speak up about my problem i am long time thinking about and unable to solve despite of searching and also asking on other sites. Sumary is in the title of this page.
Currently i am having Windows system that i do not want to leave.
I need to store around 7TB of data and growing maybe 150GB/mo., mainly movies and also data of an application which is utilizing roughly million files (mostly small files of total size maybe 500GB). Currently i am on HDDs (mentioned app files on one ext. HDD and movies, etc on other external, then system hdd), but i would welcome if i can speed UP the storage at least by 100% (-> read striping) and also i am running out of disk space.
ISSUE: need more space than SSDs can offer and bigger speed than single HDD can offer.
i think i may need to create some storage "pool" out of my HDDs. I want this storage pool be encrypted after i shutdown computer. I would need to use USB enclosures i think as i do not want to buy expensive and power hungry NAS and i do not trust in compatibility of the network attached filesystem to my WIndows 10 PC (that all apps will have no problem using it).
I can use Windows app DrivePool to combine multiple USB drives, this app do not offer deduplication (if the virtual storage pool - NTFS filesystem contains one file in multiple directories, this SW is not able to save physical disk space by utilizing physical space equal to 1 file only). But no problem, maybe i should find some SW that will do it regularly for me by replacing dupes by hardlinks?
I do not need real time deduplication as i read that the ZFS filesystem and maybe BTRFS filesystem's deduplicaton feature requires lets say 6GB RAM + 1GB RAM per 1TB of the storage and also additional CPU time (https://hardforum.com/threads/zfs-dedupe-is-fixed-soon.1854062/) - that is too much resources for me. Thinking if there is some software solution that do not require me purchasing expensive minipc with big RAM..
Maybe i am complicating it too much, maybe i should buy 2 large external HDDs and continue like now, but i am starting to need more space than largest affordable SSDs offer and also more than one HDD speed (IOPS) - (at least for certain file directories/files of like 500GB size needs to have fast read than current single HDD offers). I can not move them to SSD, because they are part of the APP which also contains very large files needing terabytes of storage. I tried to use symlinks (symbolic links) to a different HDD, but the app not worked with symlinks (hardlinks on windows only within single drive).
I do not know what to do.
Note that i prefer external 2.5" USB drives because of lower: noise,PWRconsumption,price.
I do not wish to run some old noisy computer/NAS just to allow me connect 2-3 HDDs together. Also i wish i can prevent buying additional expensive or power hungry hardware. So i prefer (not insist) software solution or solution using simple static HW. DrivePool SW can do everything i want, except it will waste my disk space (in my case space wasted would be in terabytes) because it can not handle duplicates on storage - maybe i should find a Windows SW that will regularly scan for dupes and do replacement by hardlinks? Symlinks does not help as mentioned.
Maybe i should buy some miniPC like Rpi (<10Watt) to offload my HDD IOPS and CPU cycles (which is also becoming problem). I only do not know how i would reliably and simply connect my Windows PC storage and Linux PC storage. From this point looks better to utilize single storage.
Advices on good setup are very welcome, thank you in advance and sorry for hard reading (i am not native speaker).
---------------------
Interesting comments found:
https://news.ycombinator.com/item?id=15070542HAMMER (the first version, not HAMMER2) has offline deduplication that is usually scheduled to run at night. This allows regular use to be quite performant and with a low memory footprint. Of course there are tradeoffs, in particular heavy disk usage at certain hours (which can be an issue depending on the workload) and the fact that space is reclaimed only after some time that it has been wasted.
https://leaf.dragonflybsd.org/cgi/web-man?command=hammer§ion=8dedup filesystem
(HAMMER VERSION 5+) Perform offline (post-process) deduplication.
Deduplication occurs at the block level, currently only data
blocks of the same size can be deduped, metadata blocks can not.
The hash function used for comparing data blocks is CRC-32 (CRCs
are computed anyways as part of HAMMER data integrity features,
so there's no additional overhead). Since CRC is a weak hash
function a byte-by-byte comparison is done before actual dedup-
ing. In case of a CRC collision (two data blocks have the same
CRC but different contents) the checksum is upgraded to SHA-256.
...
The -m memlimit option should be used to limit memory use during
the dedup run if the default 1G limit is too much for the
machine.
(unsure if quoted text shows it can work in my case somehow, but the Dragonfly download page shows no ARM CPU support, it says "DragonFly BSD is 64-bit only")
Last edited: