not getting 10Gb/s and totally perplexed.

applegrcoug

Limp Gawd
Joined
Aug 28, 2021
Messages
497
I'm trying to setup a 10Gb/s network between my main NAS, my Windows Emby/Minecraft server and then a test NAS.

NAS is a 4790k with a 10gbe card and the Windows box is a E5-2699 v3 on a Gigabyte X99-UD5 with a 10gbe card. The test NAS is an old FX-8320..nothing special. These boxes are sitting side by side and are connected directly into the switch. Most of my network cards are ASUS XG-C100C, which is a PCIe Gen 3 x4 card...I also have one card that is PCIe Gen2 x8 for the old stuff that only runs at Gen 2.

When I run iperf between the two NAS boxes, I get 9 Gb/s, which seems reasonable.

However, when I run Iperf to the Windows box it seems to be plagued with gremlins...I get more like 3 Gb/s.

So then swapped network ports on the switch and cables cables...3 Gb/s. So how about using a different PCIe slot, right? The X99 board has two x16 and one x8. Nope, 3 Gb/s. Same with a different card. Then I tried safe mode with networking and I got up to 5 Gb/s. Then I tried the boot drive from my testnas and was up like 9 Gb/s.

So, what gives with windows and truenas not playing???
 
Which model cards do you have and exactly how do you have them wired? (switch, dac, etc).
 
OK, If I'm reading this right, you swapped a bunch of stuff around, but basically E5-2699 v3 on Gigabyte X99-UD5 with Windows does 3gbps in normal mode and 5 gbps in safe mode, doesn't much matter which NIC or cable etc; but you get 9gbps if you boot that board with Linux?

If that's the case, double check drivers, but there's probably some sort of tuning issue. Easiest thing would be to try running iperf with its parallel mode, or just run multiple iperfs. You might try cpu pinning iperf. I've never done high performance network tuning for Windows, but I know it's a thing*; try fiddling with the options for receive side scaling, and number/size of tx/rx queues in the driver properties. I don't know if there's an easy way to setup which core processes interrupts for the nic in windows, but you'll get better perf for an application that's network i/o bound with near zero application cpu use when the core that's running the application is also doing the rx and tx interrupts for the nic; if Windows is trying to be smart and balance the load onto multiple cores, it backfires here because now there's a lot more cross-core communication which is super slow.

* Actually, Windows pioneered Receive Side Scaling, which is amazing for high performance networking; so I have a lot of respect, but I'll do my hyperscaling on FreeBSD please, or Linux if I have to, before considering Windows.
 
Have you tried testing with tools other than Iperf? The Wikipedia page on it says the Windows support is unofficial and hasn't been maintained since 2016.

Maybe just try copying a file?

I get a little over 8.5Gb in either direction just using Windows file copy between a Xeon E5-2687Wv2 running Windows and an i3-10100 running RedHat 8 & Samba. Or at least I do for the first few GB receiving on the Windows box or sending a second time. That old machine only has SATA SSDs, so if the file isn't cached in ram on a send or the cache fills up on a receive it slows down to a bit over half of that since it's going at SATA SSD speed.

Also, I'm using NVidia/Mellanox ConnectX-4 cards. So server NICs. That might make a difference.
 
OK, If I'm reading this right, you swapped a bunch of stuff around, but basically E5-2699 v3 on Gigabyte X99-UD5 with Windows does 3gbps in normal mode and 5 gbps in safe mode, doesn't much matter which NIC or cable etc; but you get 9gbps if you boot that board with Linux?

If that's the case, double check drivers, but there's probably some sort of tuning issue. Easiest thing would be to try running iperf with its parallel mode, or just run multiple iperfs. You might try cpu pinning iperf. I've never done high performance network tuning for Windows, but I know it's a thing*; try fiddling with the options for receive side scaling, and number/size of tx/rx queues in the driver properties. I don't know if there's an easy way to setup which core processes interrupts for the nic in windows, but you'll get better perf for an application that's network i/o bound with near zero application cpu use when the core that's running the application is also doing the rx and tx interrupts for the nic; if Windows is trying to be smart and balance the load onto multiple cores, it backfires here because now there's a lot more cross-core communication which is super slow.

* Actually, Windows pioneered Receive Side Scaling, which is amazing for high performance networking; so I have a lot of respect, but I'll do my hyperscaling on FreeBSD please, or Linux if I have to, before considering Windows.
I did iperf with -P 8 and get 4.5ish send and 4.5ish receive. I'm thinking that adds up to 9ish...so it is how windows handles stuff.
 
So you are using a 10 GbE switch ? The NAS boxes and the windows box are all connected directly to the 10 GbE switch? Are your cables good?
 
I did iperf with -P 8 and get 4.5ish send and 4.5ish receive. I'm thinking that adds up to 9ish...so it is how windows handles stuff.

You should really be able to get 9ish in both directions at the same time, but it kind of is what it is. Maybe see if you can tell iperf to use bigger socket buffers? How does large file copy look?

iperf is fun, but you probably didn't put together a 10G network to mess around and take benchmarks (although, I have to be honest that that's 95% of why I got 10G cards at home, but also I didn't have a better way to see if I could run 10G between my network closets than getting a pair of cards for my servers)
 
So you are using a 10 GbE switch ? The NAS boxes and the windows box are all connected directly to the 10 GbE switch? Are your cables good?

Straight in with 6ft brand new cables that I actually bought instead of making myself.

You should really be able to get 9ish in both directions at the same time, but it kind of is what it is. Maybe see if you can tell iperf to use bigger socket buffers? How does large file copy look?

iperf is fun, but you probably didn't put together a 10G network to mess around and take benchmarks (although, I have to be honest that that's 95% of why I got 10G cards at home, but also I didn't have a better way to see if I could run 10G between my network closets than getting a pair of cards for my servers)

It is getting even more frustrating for me this morning with the more I read and test.

On my truenas, I store my Emby library. It is housed on 4x20TB Seagate Exos drives in a Raid-Z1 (raid5) So, I decided to move a big file around...Fellowship of the Ring at 4k at 144.8GB should do nicely. So I copy FOTR from the HDDs to my NVME in the windows box. It took 26:31, or 93.2 MB/s.

Then, I transfer it back and I got 424.9 MB/s, granted I got some RAM cache going on with my 32GB total, buuut still.

So I turn on the Jumbo stuff for windows and no different.

Next, I tried regular Iperf3 again...and still 3Gb/s, so then I tried the -R flag to have the server and it is more like 1.5 Mb/s. I am tempted to try my 5950X machine with a high single core rating. It gets 5Gb/s on Iperf over the old copper. But to move it...that's a PITA.

ETA: changed MTU on truenas to 9014 and now I am seeing higher results in iperf.
 
Last edited:
I wonder what speeds you will get if you connect your windows box directly to the NAS without the switch?
 
You dont have a virus.
You dont have a technical problem.
You dont have a cabling issue.
You have a case of not enough threads, too small of a TCP windows etc... during the iperf test try this command:

iperf -c x.x.x.x -i1 -t 10 -m -p 7

-c means client mode

-i1 means report every 1 second the speed

-t 10 run for 10 seconds

-m is mpbs or you can use g for gbps

-p parallel threads that are transferring data at the same time

You are only using a single data transfer thread with the default commands so you need to bump up the threads, not cpu threads, but parallel transfer threads.


replace X's with the obvious server address that youre using

im just an old school network engineer. Iperf is one of the BEST test methods even today especially over big ass high speed fiber pipelines.
 
Last edited:
You dont have a virus.
You dont have a technical problem.
You dont have a cabling issue.
You have a case of not enough threads, too small of a TCP windows etc... during the iperf test try this command:

iperf -c x.x.x.x -i1 -t 10 -m -p 7

-c means client mode

-i1 means report every 1 second the speed

-t 10 run for 10 seconds

-m is mpbs or you can use g for gbps

-p parallel threads that are transferring data at the same time

You are only using a single data transfer thread with the default commands so you need to bump up the threads, not cpu threads, but parallel transfer threads.


replace X's with the obvious server address that youre using

im just an old school network engineer. Iperf is one of the BEST test methods even today especially over big ass high speed fiber pipelines.
I came to that conclusion as well. However, I also determined I also needed to change MTU to 9000.

It is an interesting exercise...the older CPUs needed a -P 4 to saturate it. Not even a zen3 could saturate it...only gave about 55%, so a -P 2 was needed for them.

This of course makes me wonder if on an actual file transfer is the CPU still only using a single thread? If so, how kneecapped is my old xeon compared to a 13th gen intel?

My next oddity is when I have a pool of a single SSD and I try to write to it, I am capped at 220 MB/s. That is like a third of 6Gb/s. My 4xHDD pool kicks the crap out of it. I should get ambitious and put the OS on the SSD and try the NVME for transfers and see what happens.
 
You don't need jumbo frames for 10Gbps. I can routinely saturate my 40Gbps setup with iperf3, without jumbo frames. You have something else going on. I've only skimmed the thread but I don't see where you say what NIC or switch you're using other than "most of..." which isn't helpful. Specifically which NICs, with what driver, and what switch? What's your VLAN setup? What does your switchport show - is it clean or are error accumulating? Have you tried different cables and switchports? Have you tried connecting directly and skipping the switch?

You don't "add" 4.5 + 4.5 and get 9, and say you're operating at nearly 10Gbps. It's duplex, so you should get 10 in each direction simultaneously.
 
Windows file transfers are single threaded. Your old Xeon won't be much of a handicap there unless you have a lousy NIC. A good NIC will offload as much work as possibly from the CPU. A cheap one will make the CPU do a bunch of work. So I'm really wondering what NICs you're using since my old Xeon is whipping yours. My E5-2687Wv2 has a higher max turbo clock -- 4.0 vs. 3.6 -- but it's an older architecture than your E5-2699v3 so my old Xeon should be less than 10% faster on a single thread load, and might even be slower.

I played around with iperf3 and got about 9.1Gb with the Windows Xeon E5-2687Wv2 sending and 9.5Gb with the Linux i3-10100 sending over TCP. I can't seem to get that high with UDP, which means iperf3 is either busted or hard to set up right. It seems that the -l parameter on the client makes a big difference for tcp. I've been using -l 1M, so iperf3 uses a 1MB send buffer (I think). No -P flag, so running single threaded. Without -l I get a little over 7Gb.

You have something else going on other than an old CPU, and I'm really wondering what NICs, switches, etc. you're using. Especially NICs. Mine are all pretty top tier from a few years back. NVidia/Mellanox ConnectX-4 in these two machines. But do try that -l flag and see what happens. iperf3 seems to need a fair bit of tuning. Defaults are slower than a windows (SMB) file copy. I still haven't figured out how to get decent speed out of UDP, and UDP ought to be faster than TCP.
 
It definitely seems to be thread limitations with iperf.

When I got with a -P 2 I can pretty much saturate. The other thing I have found is I need jumbos.

Most of my cards are this one from Asus:
https://www.amazon.com/dp/B072N84DG6

Then because the old FX system only goes to PCIe Gen2, I got this XZSNET one because it is a x8.

https://www.amazon.com/dp/B0BJKMBQWY

Since then, I figured out how to configure emby and to use nvidia for transcodes on the test box, so I rolled it to my main box. That meant I had reconfigure my main NAS to include my 1080ti for emby transcodes. To get the 1080ti to actually fit, I could not get it to fit in PCIe #1 and clear the hard drives. Instead, I had to put it down in #2 and put either the NIC or sata expansion card in #1. For some reason, the asus card won't work if the GPU isn't in #1. Sooo, the x8 XZSNET is now in my main nas. In case anyone wants to know, a 10Gbps PCIe x4 Gen3 nic running at Gen2 gives about 6.2 Gbps.
 
Back
Top