It isn't that bad for a YoY gap, but you are right, it's been two years...and let's face it, the little core is actually more important because that's where battery life improvements are made.
Ok, point taken, 4 year gap. Even more unacceptable. But you continue to imply the architecture hitting a wall because of the inherently in order nature of RISC so I ask you, why have Samsung and Apple continued to have great success deviating from ARM's reference designs, while Qualcomm has been married to them and paying the performance price (specifically looking at you, 808)
"die-constrained design" He gave the answer right there, at least as far as Apple is concerned. ARM is imposing a certain, small, die size for the A53/55 cores, that limits their potential performance a lot. Apple doesn't care about die size since their margins are high enough. The Apple A5X was as large as the Intel Ivy Bridge 4C die. Both were around in 2012, though Intel had the process node advantage. Still insane numbers. And Samsung is also very vertically integrated, they don't have to worry as much about die size as Qualcomm since they own the manufacturing as well. And QC doesn't have to play the performance game, since their choke hold on the modem technology allows them to still have plenty of design wins for the moment.
Zephyr is NOT a massive OoO design. Probably 2-wide in order. We don't know its performance, but it certainly saves power (compared toA9) while not seeming to slow down the phone.
ARM seems to hurt itself by an insistence on these TINY designs. (Just like Intel on the other side hurts itself by an insistence on designs that are first server targeted). Apple wins partially by not trying to be everything to everyone...
There is nothing inherently in-order about RISC. Various ARM designs such as Cortex A57, A72, A73 and A75 are out of order.
Samsung has not really deviated from ARM's reference design by much, they still use the Cortex A53 in various SoCs including their latest and greatest Exynos 8895. And that 8895 is also falling behind Snapdragon 835, which is a standard A73 implementation for all intents and purposes (Qualcomm's marketing notwithstanding).
Apple, well, they can afford to dedicate the die area to a big core. No one else can.
Everyone can go for the giant-die approach, but thanks to how marketing works, people will buy 8 A53s well before they even consider 4 A53 + 1 Cyclone-sized core.
It was late 2014, and the first designs should appear late this year (just like Cortex-A73 was announced last year and appeared the same year). That is a 3 year gap, not 4 years.
Note Cortex-A53 scaled quite well, from 1.3GHz in Exynos 5433 to 1.8GHz in Kirin (and goes up to 2.5GHz in Helio P25). The big cores scaled via yearly new micro-architectures, so the big/little performance ratio has remained similar.
> Fact is, there is only so much you can do with an in-order die-constrained design.
And more importantly not only keeping but actually improving power efficiency. It could go much faster if it were allowed a similar power budget as the big cores.
The A55 design is constrained not merely by area and power but also by configurability. Being able to vary the L1 cache sizes from 16 KiB to 64 KiB means that the pipeline structure and cycle time is not optimized for one size. Targeting multiple processes and design factors (e.g., SRAM libraries can be tuned for different performance/area/power tradeoffs) also constrains optimization.
While ARM might have had in mind a particular implementation for optimization (for which it might provide hard cores), it is still limited to providing acceptable designs for other implementations. Some microarchitectural optimizations might strongly depend on implementation details which are outside of ARM's control.
There are probably also higher-risk design possibilities that were not explored simply because the resources were not available. Having multiple design teams with similar targets typically would mean wasting effort, but such provides a potential for a better design. It would be difficult for ARM to charge for the cost of unused designs given that other designs are available.
Targeting a broad range of workloads also means a design will tend to be worse than a design targeting a narrower range of workloads.
Of course they could but would those changes have permitted it still be within the design constraints of the A55? Small die size and lower power are two characteristics that are not compromised for the A55. Faster is easy to do with more power but considering that the A55 is the little core, higher power consumption is to be avoided. Similarly a faster core might be done with a larger die area. There are trade offs here but the pressure from ARM's customers is to keep this as small as possible.
Considering those constraints, I considering any improvements to be rather impressive. If there is a silver bullet that ARM could have used to make it faster/smaller/consumer less power in these designs without violating the constraints they have in place, I'd like know what it was.
"What will be the goal for the next core, which will be coming from ARM’s Austin team that produced the A72? "
That was my main question too but my hope was that the next core is aiming for much higher IPC. They need it for server and dual big core configs in mobile on 7nm. Or maybe they don't quite need it really, A75 is really fast and if the next core adds 15-20% higher IPC combined with higher clocks enabled by the process, that's quite a lot and rather amazing from a perf density perspective.
Not much talk about area, any clue how A75 + DynamIQ compare to previous solutions - ofc the cache part is easy to factor in.
It is interesting that A75 scales better with higher clocks, any guesses for clocks at 2W? A laptop with 4b4L would be rather nice.
A55 not targeting higher clocks seems a bit odd, would mean that power goes down if folks move from A53 on 16FF to A55 on 7nm so maybe ARM has another update before 7nm.
I think these are still mobile CPUs. It's up to Cavium et al to do ARM ISA-compatible designs for servers. ARM's not that bothered; the mobile market is far larger.
ARM is very eager to go server and just a year ago ARM was targeting 25% share in server by 2020. This gen does highlight infrastructure as they call it, a large segment where they've been gaining share and the next step is server. 7nm is where it starts really, TSMC has the HPC version of the process and ARM needs to be ready too with the core that follows A75. What's is unclear is the strategy. A75 is already desktop class so they could just increase IPC some more but maybe they can aim higher. It seems that the Austin team got an extra year to work on the next core so that's 3 years, could be an entirely new design.
ARM in the server space is sound much like the hype of Linux on the desktop: always 'next year'.
The challenges ARM designs have had have been to simply get out to market. AMD's Seattle chip is indeed out but suffered two years of delays and most of the design wins have evaporated due to it. AMD's K12 efforts are MIA right now. Similarly Cavium's ThunderX line is interesting but not the game changer it was hyped to be. Broadcom has exited the ARM server market after promoting an interesting design (SMT on ARM!). Applied Micro's efforts for ARM servers have been lost to corporate mergers. Caldexa folded years ago.
The one interesting ray of hope is that there are indeed some customers like Microsoft, Facebook, Google and Amazon who are interesting in ARM's low power nature to certain workloads. Microsoft has a version of Windows Server running on ARM but is not releasing it publicly, rather keeping it tied to their Azure cloud services. I have yet to hear where MS has gotten their ARM hardware from though. Google has dipped their toe into chip development for their deep learning efforts and it would be a straight forward process to piece together their own server designs from licensed IP blocks now that they have the in-house expertise to do it (saying they can and them doing it are two different things). In the end, the big cloud providers who could have spurred the ARM server space for everyone may keep the ARM server idea private to themselves while the rest of the market gets to deal with x86. Considering that x86 is perceived as higher power and higher cost, this serves the cloud providers well as it give incentive for companies to migrate to their cloud solutions instead of looking at ARM alternatives.
The other difficulty for ARM in the market place right now is that Intel preemptively released their response: the Xeon D. Intel was doing a performance/watt play there and it paid for for the low end server market. In most cases, the Xeon D for a pure single socket server was a better choice than the Xeon E5 1xxxx or Xeon E3 line up. I suspect that Intel management sees Xeon D as 'too good' and thus hasn't been quick to bring an updated Sky Lake version to market.
Please read: http://www.anandtech.com/show/11189/appliedmicro-x... - it says both Vulcan and XGene are alive. You forgot to mention QC's Centric (48 cores on 10nm, available this year). There are also 64-core/256GB DRAM beasts made by HiSilicon.
If you assume a 15-20% IPC gain over A75 for ARM's 7nm core and clock it past 4GHz for server, that's somewhat the worst case scenario for where ARM is in server in 2018-2019. We can assume DinamIQ evolves a bit by then too. That wouldn't be bad at all and ARM has extraordinary perf density. They might deliver more than that, we'll see.
Check https://community.arm.com/processors/b/blog/posts/... - at the end there is a whole section on how much faster Cortex-A75 is compared to Cortex-A72 on SPECrate in big server designs. And Cortex-A55 also has all the RAS features required for servers.
It's about damn time they're addressing the little cores. We might not get anything better until they go fully OoO on both clusters with 7nm and beyond.
In-order is more efficient, and that's important for 'workhorse' cores, whatever the node. Why waste energy when you don't need to? Like A53, A55 is about doing enough work quick enough, not doing as much as possible as fast as possible. (I think A35 was too weedy for life outside of IoT.)
Plus I suspect that the ARM ISA is suited to in-order.
Energy consumption goes down with each process node advancement. The A53 is spending most of its compute time well over 1.2ghz on flagship SoCs. At 1.5ghz and beyond, it starts wasting more energy than the big cluster at similar performance points. Your point is absolutely correct in an "ideal workload", but the real world is far from that with higher resolution screens and all the other crazy peripherals OEMs employ. We'll have to wait and see the performance energy curve first before determining the sweet spot of the a55 vs where most of the actual workload resides.
The 2.5x claim was on an ARM slide but it included process shrinks. The 50% claim seems to be at 2W per core not 3GHz but even then it clearly isn't in SPEC,maybe in Geekbench it gets close to that gain at 2W.
I wouldn't be surprised if we do see some configurations at 7nm like that (3GHz). ARM has always tended to give pretty unrealistic clock speed peak performance for its cores, which no chip maker actually ended up implementing, but I think this time it may be different due to Chromebooks and Windows 10 on ARM, where a 15W TDP or even higher is perfectly fine, as long as you also get good performance/W (compared to Intel/AMD).
However, chip makers and OEMs will also have to take into account Windows 10 on ARM emulation, so they'll need to keep those chips at least 10% lower TDP/power consumption than the Intel/AMD competitors at those levels of performance.
I don't think these chips will ship by late 2018. ARM typically announces its chips 2 years before they are shipped. To be shipped in early 2018, there would have to already be a Cortex A75 tapeout, which I don't think is the case. In 2019, Samsung likely intends to release Galaxy S10 with 7nm chips, so I'm going to assume it will be the A75.
This is aimed at 10nm and the cycle that starts in early 2018 or before. So SD845 at MWC 2018 and maybe Huawei does it again and has something this year. The slides mention 10nm but not 7nm, the article notes repeatedly late 2017-early 2018.
A73 was announced a year ago and Huawei had Kirin 960 last year, Qualcomm in first half of 2017. This is an unveiling for the public not ARM's partners. Also do remember that ARM has a new big core every year now.
As for Samsung, they'll likely stick with their own core next year and remains to be seen what ARM has for 7nm. It appears that the Austin team got an extra year to work on the next core and that could be a hint that the core aimed at 7nm is an entirely new design.
Do note that the numbers ARM quotes are for just the core, no cache, interconnect, IO, GPU. A SoC with 4x2W would use quite a bit more power than just 8W.
Sadly, I think DinamiQ doesn't mean that chip makers will use a single 8-core cluster, but that they will use both DinamiQ and big.Little in configurations like 2+8 or 4+8, or even 8+8, mainly for marketing reasons. So the performance flexibility won't change much.
DynamIQ and big.LITTLE are not compatible, you can't mix and match. You either use older cores (A72/73 + A53) with big.LITTLE, or you use newer cores (A75+A55) with DynamIQ.
DynamIQ, IIUC, allows for multiple clusters, so you could get 8+8, 4+8, 2+8 and similar configurations. I doubt anyone would do that in a smartphone; but the Chinese OEMs seem obsessed with core counts, so they may do something weird (like the Helio tri-cluster setup).
What is branch prediction used for in the *in-order* A55? Is it just to try to prefetch the right instructions into L1I? Or can you do some speculative stuff (e.g. decode the expected next instruction) and still be called in-order?
In-order doesn't imply no speculation. Instructions after a branch start executing speculatively but cannot complete until the branch direction is determined.
Branch prediction is as important for an in-order core as it is for an OoO core. Without branch prediction every branch would take 8+ cycles rather than < 0.1 cycles on average.
I am a bit confused by this statement on the second page:
"Together with L3 cache and other control logic, the DSU is about the same area as an A55 core in its max configuration or half the area of an A55 in its min configuration."
Is this backwards (half A55 in max, or ~A55 in min), or is the second A55 supposed to be an A75?
Actual answer: Android is always doing loads of things at once which don't need doing particularly quickly (I.e. aren't at the direct behest of the user), but do need doing efficiently. The throughput needed is more than three small cores can provide.
This website has itself proven that Android does use 8 cores and these extra cores do bring improvements in overall experience.
Personally, I'm hopeful that SoC vendors actually do following ARM here and kill off 8 little cores in favour of a 1+7 design. Would translate into a huge single threaded improvement for end users.
Well ARM says the TDP of A75 is from 750mW per core to 2W per core based on clock speed. Obviously an SoC has other components as well, and cache and the interconnects use power as well, not to mention the GPU. So depending on how you clock it and how much cache it has and what GPU it has, I'd say that an 8-core A75 SoC on 10nm process would have a TDP of somewhere between 12W to 25W.
And as much as I'd like to see such a 8-Core A75 SoC, this is pure fantasy. The volume of Chromebooks is too low to design bespoke SoCs for them, they'll be just use whatever is designed for phones.
Sure, I would prefer a 2+4 to a 1+7 configuration as well. I've always thought that 2+4 is the sweet spot for big.LITTLE.
But 2+4 is going to be a lot bigger than 1+7. These LITTLE cores are extremely tiny, the A53 is about 1mm on 16nm fab process. Which is why the mid-range of the market has gone for 8-core A53s in the last couple of years.
2+4 is a lot bigger so it's not even a consideration for this end of the market. 2+4 could have been done with big.LITTLE but the fact that there were so few 2+4 SoCs tells you that the market just didn't think it was worth it. 1+7 on the other hand was not possible with big.LITTLE, but is possible with DynamIQ, and according to ARM is only a little bigger than 8 small cores, so I'm hoping that the market sees value in that for the midrange.
But die size will be bigger. The reason why ARM introduce 1+7 is because die size is 13% bigger than 8xA53 while 8xA55 is 10% more. So, it's only 3% more area than 8xA55 but provide 2x single core performance.
They only seem to bring some improvements because the big cores aren't all that great to begin with. If they had fewer little cores, and better big cores, that wouldn't be the case.
All the graphs are so misleading except for the 3rd one. 1.18x performance increased looks like 200% and 1.97x looks like 500% increased. I call this marketing scam/consumer fraud.
Wow, I hate those disproportional bar graph heights in the "Pushing the performance envelope" diagram. Such bullshit tricks (especially when as unnecessary as in this case) stick out like a sore thumb and spoil the whole enjoyment of the article because I'm constantly thinking what else they're trying to sugar coat...
Forgive me if I'm wrong, but I was under the assumption that the A53 had 2 simple Integer ALUs (Adds/Shifts) and a complex Integer ALU (Multi/Mac/Div) like the Austin CPUs and the Cortex A7. Am I wrong about that?
Fast forward to one year later: Android apologists will say how their barely faster than stock A75 custom cores still losing big time to Apple in ST is a "conscious design decision".
Maybe, maybe not. That would be almost twice the best we're seeing from the highest performing parts now, such as the 835. The A11 is showing over 4 500 single core and close to 9,000 multicore, assuming these numbers are real.
The A11 numbers likely are fake, but they are also plausible. A8 to A9 and A9 to A10 were both 40 to 50% increases. Going to 4500 is thesame sort of increase. The new process allows for the frequency boost, and there remain realistic micro-archictural mechanisms that could provide for the IPC boost.
No, we don't know if they're fake. TSMC stated, months ago, that they were delivering 10nm parts to their largest customers, which one would presume, is Apple.
And my statement stand. If the best the u35 can do is just over 2,000, then these parts are slightly over twice as fast. And if the claim for the multiprocessing score is right, then that's well over the score for any 4 core ARM chip from anyone else.
''Traditionally", these scores that leak out, whether real, or not, are remarkably close to what's tested after Apple's product does come out, often being somewhat lower that the "real" scores.
Apple's ST perf is marketing for folks like you, nothing more. ST perf is too high for mobile even with A75 and ST perf is not what matters. We would all be better off with less ST and higher efficiency. Sadly people like you are pushing the industry into pushing ST for no reason. Apple 's core is huge compared to ARM's core and for what, ST perf you don't need ,lower MT and efficiency.
From where I stand, the great majority of times on mobiles are spent either on the web, or in games. Javascript is still very much single-threaded, so higher ST performance directly results in better web experience.
Why do you think that "I don't need ST perf"?
Note: I don't have a single iOS device, though I'd have loved to have an Android device with A10 inside.
Nonsense! If this were a Qualcomm or Samsung chip, you wouldn't be saying that, and we both know that. While I don't know what other chips use, Apple's is about 3 watts, which is likely about what the others are. But Apple manages to get far better performance. That's never a bad thing.
I don't think you understand what smartphones are being used for.
While we don't know if the benchmarks that have been listed for the new A11 from Apple are real, though they seem to be what we would expect, individual cores are hitting over 4,500 and slightly under 9,000 multicore, with both cores.
With everything I've read here, I'm still not sure what we would expect from these parts. The highest performing ARM used on Android seems to be well below 2,000 per core, with almost 7,000 for 4 core multicore.
So, what's to expect here? And how much of this advantage is coming from the process shrink, rather than from core improvements?
I hope chip vendors don't push 8x A55 designs for the midrange because they're only good for the low end. Having so many similar cores is pointless because Android rarely uses all 8 cores.
I'd rather see more 2+4 or 4+4 designs with the A55 and A75, especially something like the old Snapdragon 650/652 with the latest cores and processes. I'm looking to upgrade my Mi Max a year from now and the relevant chips should appear by then. On the other hand, with constant driver updates, this phone could last for a few years still.
A Cortex-A55 at 2.5GHz (same as Helio P25) would get close to ST performance of Galaxy S6 (and match MT perf). That was top-end 2 years ago... So while I agree 1+7 or 2+6 would be much better than 8x A55, I don't think you could call an S6 a low-end phone even in 2018!
The Helios with their decacore design couldn't beat the real world speed and battery life of a Snapdragon 65x. It's foolish to run an A55 at 2.5 GHz when an A75 at lower speed uses similar amounts of power while being much faster. At one point, you move the load from the donkey and put it on a race horse :)
All of the Helio deca cores use Cortex-A72, just like Snapdragon 65x, and the higher clocks of Helio means it beats the 65x like you'd expect. So I'm not sure what your point is?
Cortex-A53 in Kirin 950 at its highest frequency is ~50% more efficient than Cortex-A72, with the crossover point at around 2.1GHz. On 10nm with Cortex-A55 it may be closer to 2.5GHz.
Yeah, my mistake, I was thinking of Helios with 8x A53s only. Anyway, perf/watt matters at high clock speeds, so a 50% efficient A53 still can't do as much work as an A72 at the same clock speed. I'd rather keep the efficient cores humming along at low speed and have the big cores come online in short bursts, like for app loading or web page rendering. Note that this might not work for constant gaming though, the big cores constantly being on will overheat the phone and kill battery life.
2 GHz a53 has the single thread performance of 2.3 GHz Krait 400/1.85 GHz cortex a15. Octa designs often have well over double the multithread performance of say a Snapdragon 800/801. Not low end..very much midrange
But the A53 or A55 at 2+ GHz is a huge power hog that's still slower than a similarly clocked A72 or A75. The octacore branding is a gimmick when all cores are the same design. Performance doesn't scale equally with increasing frequency and power consumption - at one point, it's better to switch the task to a high performance core rather than keep increasing speed on a low performance core.
A smart design (like the 650/652 which is a flagship killer) would 4x or 6x A55 at low clock rates for multi threaded stuff and 2x A75 for pure single threaded performance, power consumption be damned.
Snapdragon 625 @ 2.02 GHz and 626 @ 2.3 GHz are certainly not battery hogs. They are both homogenous and ramp clock speed up on all cores very, very often. Much snappier performance than paper specs would suggest and incredible battery life
Midrange ARM big cores at actual midrange prices are already quite a rarity in the China phone market, let alone outside of it. Since they perform so close to actual flagships SoCs, most OEMs will either price the devices similarly to their actual flagships (cough Samsung A9 Pro), or not doing them altogether. If you ask me who to blame, it will be the non-Apple custom core designers sucking hard at their jobs.
Besides an A55 SoC with presumably >1K ST GB4 scores are no slouches either, for a $120 device I'm certainly not complaining.
Maybe the 650/652 was a flash in the pan and the 660 could be a unicorn chip, one that's announced but never deployed. Interestingly Xiaomi moved to the 625 in the Redmi Note 4 and Mi Max 2, whereas predecessor models used the 650. Maybe OEMs really are afraid of good-enough chips in their midrange devices cannibalizing flagship sales.
All of these numbers are crap if the cache configs are not stated. DynamIQ is very different and most of the SPEC gains could be from L2/LLC increases. This is all marketing FUD
"These numbers, as well as the others shown in the chart, comparing the A55 and A53 are at the same frequency, same L1/L2 cache sizes, same compiler, etc. and are meant to be a fair comparison. The actual gains should actually be a little higher, because partner SoCs will benefit from adding the L3 cache, which these numbers do not include."
"ARM wants to push the A75 into larger form-factor devices with power budgets beyond mobile’s 750mW/core too by pushing frequency higher. Something like a Chromebook or a 2-in-1 ultraportable come to mind. At 1W/core the A75 delivers 25% higher performance than the A73 and at 2W/core the A75’s advantage bumps up to 30% when running SPECint 2006. If anything, these numbers highlight why it’s not a good idea to push performance with frequency alone, as dynamic power scales exponentially."
Perhaps, but it gives it a lot more headroom for use in things like tablets... and laptops. I'm thinking Windows on ARM could use an even faster SoC than the SD 835, and 2W is perfect. Right in Atom ULP territory, and there's no modern Atoms left to compete in the lower-price territory. Perhaps Intel will be forced to release cheaper gimped Core-based "Atoms" in the future? Or Celerons/Pentiums. ;)
Incremental update with no radical changes. Would LOVE to see a huge fat ARM core with a 5+ wide front end for premium devices, with single threaded throughput approaching that of the Core M series. Now that would be progress.
No reason why a dual core with two fat cores cannot work great on android, especially given the idea of race to sleep. Off load background tasks to DSPs, Microcontrollers etc or even use a third big core clocked at about half the frequency of the main two cores.
Sure, will be expensive and big, but you can be sure there will be customers for it, especially in the 700 USD plus market segment. As of now, manufacturers barely have any choice apart from qualcomm chipsets.
I ask you, why have Samsung and Apple continued to have great success deviating from ARM's reference designs, http://www.promocodeway.com/coupons/ubereats-promo... while Qualcomm has been married to them and paying the performance price (specifically looking at you, 808)
For the most part, Samsung's designs were straight from ARM. They didn't have an architectural license. It's only very recently that they've gotten one.
But Snapdragon has been Qualcomm's own designs, because they do have an architectural license, as does Apple. But, like the rest of the industry, they were discombobulated when Apple came out with the 64 bit A7.
They've never gotten totally back into the race. Their fist one was an ARM design, and it had heat problems. The second was their design, but performance was fairly poor. The 835 is not much better than the preceding model. Samsung has faired no better. The problem they all have is that Apple is two years ahead there, and likely took their time with the A7, because there was no competition. These guys are rushing to catch up, and they are likely restrained by the expectation by Android buyers that more cores are better, rather than having better cores.
Now that Apple's GPU is a mostly fully custom part, expect the A11 to start another A7-esque domination over Android SoCs on graphics. i also expect a Apple custom LTE baseband to debut this year too, since Apple is definitely too paranoid to depend solely on Qualcomm and Intel's baseband proved to be donkey balls.
Besides, iPhones probably outsell everyone's else flagships combined yearly in a single launch quarter. The economics of scale for a Android flagship SoC makes far less sense.
Clear and lucid article. I am from commerce background and can still understand most of it. Introduction of L3 cache in mobile is good but it being pseudo exclusive sucks! Because area and power efficiency is what matters most in mobile.
"Or will we see new 7+1 or 3+1 combinations with a single A75 surrounded by A55s?"
They would have too much sense. The loads are either single-threaded or multi-threaded, not 4 threaded. Because every modern computing environment from watches to server farms are power-limited, multithreaded loads need to be performed on the most efficient cores possible. Which is certainly excludes any SPECULATIVE out-of-order execution.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
104 Comments
Back to Article
colinw - Monday, May 29, 2017 - link
Motion to call "cache stashing" just "staching". All in favour say aye.R0H1T - Monday, May 29, 2017 - link
Nay, motion denied due to lack of quorum.0ldman79 - Monday, May 29, 2017 - link
Motion is seconded.boeush - Tuesday, May 30, 2017 - link
I vote for "cashing" as a write-inAlexvrb - Tuesday, May 30, 2017 - link
I vote for 'staching as a write-in.nonz - Wednesday, May 31, 2017 - link
Sheesh, it should obviously be called 'cache me outside'Eden-K121D - Monday, May 29, 2017 - link
They could've improved the A55 quite a bit morelilmoe - Monday, May 29, 2017 - link
It's a start. There's only so much you can do and stay in-order.Samus - Monday, May 29, 2017 - link
It isn't that bad for a YoY gap, but you are right, it's been two years...and let's face it, the little core is actually more important because that's where battery life improvements are made.aryonoco - Monday, May 29, 2017 - link
A53 was announced in 2013. First SoCs implementing it came out in 2014.A55 is being announced in 2017, SoCs implementing it will probably come out in 2018.
So it's a 4 year gap, not two.
Fact is, there is only so much you can do with an in-order die-constrained design.
Samus - Monday, May 29, 2017 - link
Ok, point taken, 4 year gap. Even more unacceptable. But you continue to imply the architecture hitting a wall because of the inherently in order nature of RISC so I ask you, why have Samsung and Apple continued to have great success deviating from ARM's reference designs, while Qualcomm has been married to them and paying the performance price (specifically looking at you, 808)Death666Angel - Monday, May 29, 2017 - link
"die-constrained design" He gave the answer right there, at least as far as Apple is concerned. ARM is imposing a certain, small, die size for the A53/55 cores, that limits their potential performance a lot. Apple doesn't care about die size since their margins are high enough. The Apple A5X was as large as the Intel Ivy Bridge 4C die. Both were around in 2012, though Intel had the process node advantage. Still insane numbers. And Samsung is also very vertically integrated, they don't have to worry as much about die size as Qualcomm since they own the manufacturing as well. And QC doesn't have to play the performance game, since their choke hold on the modem technology allows them to still have plenty of design wins for the moment.Wilco1 - Monday, May 29, 2017 - link
It's a 3 year gap. And Samsung still uses Cortex-A53 alongside Mongoose.tipoo - Monday, May 29, 2017 - link
Not RISC constrained to in-order, nothing to do with RISC, rather that in-order is a design choice for this particular model for power and die size.A75, Hurricane, etc are of course massively out of order ARM/RISC designs.
name99 - Wednesday, May 31, 2017 - link
Zephyr is NOT a massive OoO design. Probably 2-wide in order. We don't know its performance, but it certainly saves power (compared toA9) while not seeming to slow down the phone.ARM seems to hurt itself by an insistence on these TINY designs. (Just like Intel on the other side hurts itself by an insistence on designs that are first server targeted). Apple wins partially by not trying to be everything to everyone...
helvete - Wednesday, August 30, 2017 - link
Ending up being nothing for nobody? /saryonoco - Monday, May 29, 2017 - link
There is nothing inherently in-order about RISC. Various ARM designs such as Cortex A57, A72, A73 and A75 are out of order.Samsung has not really deviated from ARM's reference design by much, they still use the Cortex A53 in various SoCs including their latest and greatest Exynos 8895. And that 8895 is also falling behind Snapdragon 835, which is a standard A73 implementation for all intents and purposes (Qualcomm's marketing notwithstanding).
Apple, well, they can afford to dedicate the die area to a big core. No one else can.
ZeDestructor - Tuesday, May 30, 2017 - link
Everyone can go for the giant-die approach, but thanks to how marketing works, people will buy 8 A53s well before they even consider 4 A53 + 1 Cyclone-sized core.name99 - Wednesday, May 31, 2017 - link
Ding ding ding. We have a winnerWilco1 - Monday, May 29, 2017 - link
It was late 2014, and the first designs should appear late this year (just like Cortex-A73 was announced last year and appeared the same year). That is a 3 year gap, not 4 years.Note Cortex-A53 scaled quite well, from 1.3GHz in Exynos 5433 to 1.8GHz in Kirin (and goes up to 2.5GHz in Helio P25). The big cores scaled via yearly new micro-architectures, so the big/little performance ratio has remained similar.
> Fact is, there is only so much you can do with an in-order die-constrained design.
And more importantly not only keeping but actually improving power efficiency. It could go much faster if it were allowed a similar power budget as the big cores.
Meteor2 - Monday, May 29, 2017 - link
How?Paul A. Clayton - Monday, May 29, 2017 - link
The A55 design is constrained not merely by area and power but also by configurability. Being able to vary the L1 cache sizes from 16 KiB to 64 KiB means that the pipeline structure and cycle time is not optimized for one size. Targeting multiple processes and design factors (e.g., SRAM libraries can be tuned for different performance/area/power tradeoffs) also constrains optimization.While ARM might have had in mind a particular implementation for optimization (for which it might provide hard cores), it is still limited to providing acceptable designs for other implementations. Some microarchitectural optimizations might strongly depend on implementation details which are outside of ARM's control.
There are probably also higher-risk design possibilities that were not explored simply because the resources were not available. Having multiple design teams with similar targets typically would mean wasting effort, but such provides a potential for a better design. It would be difficult for ARM to charge for the cost of unused designs given that other designs are available.
Targeting a broad range of workloads also means a design will tend to be worse than a design targeting a narrower range of workloads.
Kevin G - Monday, May 29, 2017 - link
Of course they could but would those changes have permitted it still be within the design constraints of the A55? Small die size and lower power are two characteristics that are not compromised for the A55. Faster is easy to do with more power but considering that the A55 is the little core, higher power consumption is to be avoided. Similarly a faster core might be done with a larger die area. There are trade offs here but the pressure from ARM's customers is to keep this as small as possible.Considering those constraints, I considering any improvements to be rather impressive. If there is a silver bullet that ARM could have used to make it faster/smaller/consumer less power in these designs without violating the constraints they have in place, I'd like know what it was.
tipoo - Monday, May 29, 2017 - link
Alas, still waiting to find out how different Apples Zephyr is from standard Little cores like it. It's nearly twice as big.jjj - Monday, May 29, 2017 - link
"What will be the goal for the next core, which will be coming from ARM’s Austin team that produced the A72? "That was my main question too but my hope was that the next core is aiming for much higher IPC. They need it for server and dual big core configs in mobile on 7nm.
Or maybe they don't quite need it really, A75 is really fast and if the next core adds 15-20% higher IPC combined with higher clocks enabled by the process, that's quite a lot and rather amazing from a perf density perspective.
Not much talk about area, any clue how A75 + DynamIQ compare to previous solutions - ofc the cache part is easy to factor in.
It is interesting that A75 scales better with higher clocks, any guesses for clocks at 2W? A laptop with 4b4L would be rather nice.
A55 not targeting higher clocks seems a bit odd, would mean that power goes down if folks move from A53 on 16FF to A55 on 7nm so maybe ARM has another update before 7nm.
Meteor2 - Monday, May 29, 2017 - link
I think these are still mobile CPUs. It's up to Cavium et al to do ARM ISA-compatible designs for servers. ARM's not that bothered; the mobile market is far larger.jjj - Monday, May 29, 2017 - link
ARM is very eager to go server and just a year ago ARM was targeting 25% share in server by 2020. This gen does highlight infrastructure as they call it, a large segment where they've been gaining share and the next step is server.7nm is where it starts really, TSMC has the HPC version of the process and ARM needs to be ready too with the core that follows A75.
What's is unclear is the strategy. A75 is already desktop class so they could just increase IPC some more but maybe they can aim higher. It seems that the Austin team got an extra year to work on the next core so that's 3 years, could be an entirely new design.
Kevin G - Monday, May 29, 2017 - link
ARM in the server space is sound much like the hype of Linux on the desktop: always 'next year'.The challenges ARM designs have had have been to simply get out to market. AMD's Seattle chip is indeed out but suffered two years of delays and most of the design wins have evaporated due to it. AMD's K12 efforts are MIA right now. Similarly Cavium's ThunderX line is interesting but not the game changer it was hyped to be. Broadcom has exited the ARM server market after promoting an interesting design (SMT on ARM!). Applied Micro's efforts for ARM servers have been lost to corporate mergers. Caldexa folded years ago.
The one interesting ray of hope is that there are indeed some customers like Microsoft, Facebook, Google and Amazon who are interesting in ARM's low power nature to certain workloads. Microsoft has a version of Windows Server running on ARM but is not releasing it publicly, rather keeping it tied to their Azure cloud services. I have yet to hear where MS has gotten their ARM hardware from though. Google has dipped their toe into chip development for their deep learning efforts and it would be a straight forward process to piece together their own server designs from licensed IP blocks now that they have the in-house expertise to do it (saying they can and them doing it are two different things). In the end, the big cloud providers who could have spurred the ARM server space for everyone may keep the ARM server idea private to themselves while the rest of the market gets to deal with x86. Considering that x86 is perceived as higher power and higher cost, this serves the cloud providers well as it give incentive for companies to migrate to their cloud solutions instead of looking at ARM alternatives.
The other difficulty for ARM in the market place right now is that Intel preemptively released their response: the Xeon D. Intel was doing a performance/watt play there and it paid for for the low end server market. In most cases, the Xeon D for a pure single socket server was a better choice than the Xeon E5 1xxxx or Xeon E3 line up. I suspect that Intel management sees Xeon D as 'too good' and thus hasn't been quick to bring an updated Sky Lake version to market.
Wilco1 - Monday, May 29, 2017 - link
Please read: http://www.anandtech.com/show/11189/appliedmicro-x... - it says both Vulcan and XGene are alive. You forgot to mention QC's Centric (48 cores on 10nm, available this year). There are also 64-core/256GB DRAM beasts made by HiSilicon.jjj - Monday, May 29, 2017 - link
If you assume a 15-20% IPC gain over A75 for ARM's 7nm core and clock it past 4GHz for server, that's somewhat the worst case scenario for where ARM is in server in 2018-2019. We can assume DinamIQ evolves a bit by then too.That wouldn't be bad at all and ARM has extraordinary perf density. They might deliver more than that, we'll see.
lilmoe - Monday, May 29, 2017 - link
All of ARM's hope of having any significant, mainstream, server market share was demolished with the announcement of Naples.Wilco1 - Monday, May 29, 2017 - link
Check https://community.arm.com/processors/b/blog/posts/... - at the end there is a whole section on how much faster Cortex-A75 is compared to Cortex-A72 on SPECrate in big server designs. And Cortex-A55 also has all the RAS features required for servers.lilmoe - Monday, May 29, 2017 - link
It's about damn time they're addressing the little cores. We might not get anything better until they go fully OoO on both clusters with 7nm and beyond.Meteor2 - Monday, May 29, 2017 - link
In-order is more efficient, and that's important for 'workhorse' cores, whatever the node. Why waste energy when you don't need to? Like A53, A55 is about doing enough work quick enough, not doing as much as possible as fast as possible. (I think A35 was too weedy for life outside of IoT.)Plus I suspect that the ARM ISA is suited to in-order.
lilmoe - Monday, May 29, 2017 - link
Energy consumption goes down with each process node advancement. The A53 is spending most of its compute time well over 1.2ghz on flagship SoCs. At 1.5ghz and beyond, it starts wasting more energy than the big cluster at similar performance points. Your point is absolutely correct in an "ideal workload", but the real world is far from that with higher resolution screens and all the other crazy peripherals OEMs employ. We'll have to wait and see the performance energy curve first before determining the sweet spot of the a55 vs where most of the actual workload resides.Meteor2 - Friday, June 2, 2017 - link
Haven't seen an energy/performance curve on Anandtech for ages :(nandnandnand - Monday, May 29, 2017 - link
Low tier tech news outlets are saying the A75 has 50% more performance and that the A55 is 2.5x more power efficient:Compare: https://venturebeat.com/2017/05/28/arm-wants-to-bo... which acknowledges "in some use cases"
To: http://www.zdnet.com/article/arm-launches-new-cort... which just says "up to"
50% figure comes from comparing A73 at 2.4 GHz to A75 at 3.0 GHz: https://www.theregister.co.uk/2017/05/29/arm_cpus_...
jjj - Monday, May 29, 2017 - link
The 2.5x claim was on an ARM slide but it included process shrinks.The 50% claim seems to be at 2W per core not 3GHz but even then it clearly isn't in SPEC,maybe in Geekbench it gets close to that gain at 2W.
Krysto - Monday, May 29, 2017 - link
I wouldn't be surprised if we do see some configurations at 7nm like that (3GHz). ARM has always tended to give pretty unrealistic clock speed peak performance for its cores, which no chip maker actually ended up implementing, but I think this time it may be different due to Chromebooks and Windows 10 on ARM, where a 15W TDP or even higher is perfectly fine, as long as you also get good performance/W (compared to Intel/AMD).However, chip makers and OEMs will also have to take into account Windows 10 on ARM emulation, so they'll need to keep those chips at least 10% lower TDP/power consumption than the Intel/AMD competitors at those levels of performance.
jjj - Monday, May 29, 2017 - link
A75 is not aimed at 7nm, that's where the next core comes in.Krysto - Monday, May 29, 2017 - link
I don't think these chips will ship by late 2018. ARM typically announces its chips 2 years before they are shipped. To be shipped in early 2018, there would have to already be a Cortex A75 tapeout, which I don't think is the case. In 2019, Samsung likely intends to release Galaxy S10 with 7nm chips, so I'm going to assume it will be the A75.jjj - Monday, May 29, 2017 - link
This is aimed at 10nm and the cycle that starts in early 2018 or before. So SD845 at MWC 2018 and maybe Huawei does it again and has something this year.The slides mention 10nm but not 7nm, the article notes repeatedly late 2017-early 2018.
A73 was announced a year ago and Huawei had Kirin 960 last year, Qualcomm in first half of 2017.
This is an unveiling for the public not ARM's partners.
Also do remember that ARM has a new big core every year now.
As for Samsung, they'll likely stick with their own core next year and remains to be seen what ARM has for 7nm.
It appears that the Austin team got an extra year to work on the next core and that could be a hint that the core aimed at 7nm is an entirely new design.
aryonoco - Monday, May 29, 2017 - link
You have obviously not read this article then.These IPs will be seen in SoCs in late this year/early next year.
nandnandnand - Monday, May 29, 2017 - link
15 W TDP you say... maybe 8x 2 Watt A75s crammed into one laptop?jjj - Monday, May 29, 2017 - link
Do note that the numbers ARM quotes are for just the core, no cache, interconnect, IO, GPU.A SoC with 4x2W would use quite a bit more power than just 8W.
Krysto - Monday, May 29, 2017 - link
Sadly, I think DinamiQ doesn't mean that chip makers will use a single 8-core cluster, but that they will use both DinamiQ and big.Little in configurations like 2+8 or 4+8, or even 8+8, mainly for marketing reasons. So the performance flexibility won't change much.phoenix_rizzen - Monday, May 29, 2017 - link
DynamIQ and big.LITTLE are not compatible, you can't mix and match. You either use older cores (A72/73 + A53) with big.LITTLE, or you use newer cores (A75+A55) with DynamIQ.DynamIQ, IIUC, allows for multiple clusters, so you could get 8+8, 4+8, 2+8 and similar configurations. I doubt anyone would do that in a smartphone; but the Chinese OEMs seem obsessed with core counts, so they may do something weird (like the Helio tri-cluster setup).
twotwotwo - Monday, May 29, 2017 - link
What is branch prediction used for in the *in-order* A55? Is it just to try to prefetch the right instructions into L1I? Or can you do some speculative stuff (e.g. decode the expected next instruction) and still be called in-order?Wilco1 - Monday, May 29, 2017 - link
In-order doesn't imply no speculation. Instructions after a branch start executing speculatively but cannot complete until the branch direction is determined.Branch prediction is as important for an in-order core as it is for an OoO core. Without branch prediction every branch would take 8+ cycles rather than < 0.1 cycles on average.
alpha64 - Monday, May 29, 2017 - link
I am a bit confused by this statement on the second page:"Together with L3 cache and other control logic, the DSU is about the same area as an A55 core in its max configuration or half the area of an A55 in its min configuration."
Is this backwards (half A55 in max, or ~A55 in min), or is the second A55 supposed to be an A75?
alpha64 - Monday, May 29, 2017 - link
Or, perhaps, is this talking about Max and Min configurations of the DSU itself, and not the core (which is not clear in the sentence either)?jjj - Monday, May 29, 2017 - link
Got to be DSU max and min but can't be with the L3$ as 4MB L3$ is huge.hMunster - Monday, May 29, 2017 - link
In what mobile usage scenario is having 7 small cores and 1 big one an advantage over having just 3 small cores and 1 big one?Meteor2 - Monday, May 29, 2017 - link
Are you trolling...?Actual answer: Android is always doing loads of things at once which don't need doing particularly quickly (I.e. aren't at the direct behest of the user), but do need doing efficiently. The throughput needed is more than three small cores can provide.
aryonoco - Monday, May 29, 2017 - link
This website has itself proven that Android does use 8 cores and these extra cores do bring improvements in overall experience.Personally, I'm hopeful that SoC vendors actually do following ARM here and kill off 8 little cores in favour of a 1+7 design. Would translate into a huge single threaded improvement for end users.
Eden-K121D - Monday, May 29, 2017 - link
Better to have 2 + 6 or even 2+4phoenix_rizzen - Monday, May 29, 2017 - link
Yeah, a 2+6 A75/55 arrangement would be neat to see. 4+4 seems like overkill in a phone, but could be useful in a high-res tablet or Chromebook.Wonder what the TDP would be for an 8-core cluster of just A75s. :) Chromebook or laptop?
aryonoco - Monday, May 29, 2017 - link
Well ARM says the TDP of A75 is from 750mW per core to 2W per core based on clock speed. Obviously an SoC has other components as well, and cache and the interconnects use power as well, not to mention the GPU. So depending on how you clock it and how much cache it has and what GPU it has, I'd say that an 8-core A75 SoC on 10nm process would have a TDP of somewhere between 12W to 25W.And as much as I'd like to see such a 8-Core A75 SoC, this is pure fantasy. The volume of Chromebooks is too low to design bespoke SoCs for them, they'll be just use whatever is designed for phones.
aryonoco - Monday, May 29, 2017 - link
Sure, I would prefer a 2+4 to a 1+7 configuration as well. I've always thought that 2+4 is the sweet spot for big.LITTLE.But 2+4 is going to be a lot bigger than 1+7. These LITTLE cores are extremely tiny, the A53 is about 1mm on 16nm fab process. Which is why the mid-range of the market has gone for 8-core A53s in the last couple of years.
2+4 is a lot bigger so it's not even a consideration for this end of the market. 2+4 could have been done with big.LITTLE but the fact that there were so few 2+4 SoCs tells you that the market just didn't think it was worth it. 1+7 on the other hand was not possible with big.LITTLE, but is possible with DynamIQ, and according to ARM is only a little bigger than 8 small cores, so I'm hoping that the market sees value in that for the midrange.
0iron - Monday, May 29, 2017 - link
But die size will be bigger. The reason why ARM introduce 1+7 is because die size is 13% bigger than 8xA53 while 8xA55 is 10% more. So, it's only 3% more area than 8xA55 but provide 2x single core performance.Meteor2 - Friday, June 2, 2017 - link
I like my 808 :)melgross - Thursday, June 8, 2017 - link
They only seem to bring some improvements because the big cores aren't all that great to begin with. If they had fewer little cores, and better big cores, that wouldn't be the case.sonny73n - Monday, May 29, 2017 - link
All the graphs are so misleading except for the 3rd one. 1.18x performance increased looks like 200% and 1.97x looks like 500% increased. I call this marketing scam/consumer fraud.Daniel Egger - Monday, May 29, 2017 - link
Wow, I hate those disproportional bar graph heights in the "Pushing the performance envelope" diagram. Such bullshit tricks (especially when as unnecessary as in this case) stick out like a sore thumb and spoil the whole enjoyment of the article because I'm constantly thinking what else they're trying to sugar coat...Wardrive86 - Monday, May 29, 2017 - link
Forgive me if I'm wrong, but I was under the assumption that the A53 had 2 simple Integer ALUs (Adds/Shifts) and a complex Integer ALU (Multi/Mac/Div) like the Austin CPUs and the Cortex A7. Am I wrong about that?StrangerGuy - Monday, May 29, 2017 - link
Fast forward to one year later: Android apologists will say how their barely faster than stock A75 custom cores still losing big time to Apple in ST is a "conscious design decision".Wilco1 - Monday, May 29, 2017 - link
I'd expect a 3GHz Cortex-A75 to have a Geekbench score of ~3200, ie. very close to A10.melgross - Monday, May 29, 2017 - link
Maybe, maybe not. That would be almost twice the best we're seeing from the highest performing parts now, such as the 835. The A11 is showing over 4 500 single core and close to 9,000 multicore, assuming these numbers are real.Wilco1 - Monday, May 29, 2017 - link
835 based phones score just over 2000. Cortex-A73 in Kirin 960 also does 2000 at 2.4GHz, so 34% IPC gain will get that to 3200 at 3GHz.As for the A11 claim, those scores are fake, see eg. https://9to5mac.com/2017/04/25/iphone-8-fake-bench...
name99 - Wednesday, May 31, 2017 - link
The A11 numbers likely are fake, but they are also plausible. A8 to A9 and A9 to A10 were both 40 to 50% increases. Going to 4500 is thesame sort of increase.The new process allows for the frequency boost, and there remain realistic micro-archictural mechanisms that could provide for the IPC boost.
melgross - Wednesday, May 31, 2017 - link
No, we don't know if they're fake. TSMC stated, months ago, that they were delivering 10nm parts to their largest customers, which one would presume, is Apple.And my statement stand. If the best the u35 can do is just over 2,000, then these parts are slightly over twice as fast. And if the claim for the multiprocessing score is right, then that's well over the score for any 4 core ARM chip from anyone else.
''Traditionally", these scores that leak out, whether real, or not, are remarkably close to what's tested after Apple's product does come out, often being somewhat lower that the "real" scores.
jjj - Monday, May 29, 2017 - link
Apple's ST perf is marketing for folks like you, nothing more.ST perf is too high for mobile even with A75 and ST perf is not what matters. We would all be better off with less ST and higher efficiency.
Sadly people like you are pushing the industry into pushing ST for no reason.
Apple 's core is huge compared to ARM's core and for what, ST perf you don't need ,lower MT and efficiency.
aryonoco - Monday, May 29, 2017 - link
I'm curious to know what you are basing this on.From where I stand, the great majority of times on mobiles are spent either on the web, or in games. Javascript is still very much single-threaded, so higher ST performance directly results in better web experience.
Why do you think that "I don't need ST perf"?
Note: I don't have a single iOS device, though I'd have loved to have an Android device with A10 inside.
melgross - Wednesday, May 31, 2017 - link
Nonsense! If this were a Qualcomm or Samsung chip, you wouldn't be saying that, and we both know that. While I don't know what other chips use, Apple's is about 3 watts, which is likely about what the others are. But Apple manages to get far better performance. That's never a bad thing.I don't think you understand what smartphones are being used for.
melgross - Monday, May 29, 2017 - link
While we don't know if the benchmarks that have been listed for the new A11 from Apple are real, though they seem to be what we would expect, individual cores are hitting over 4,500 and slightly under 9,000 multicore, with both cores.With everything I've read here, I'm still not sure what we would expect from these parts. The highest performing ARM used on Android seems to be well below 2,000 per core, with almost 7,000 for 4 core multicore.
So, what's to expect here? And how much of this advantage is coming from the process shrink, rather than from core improvements?
tipoo - Monday, May 29, 2017 - link
Yeah that's what I'm wondering, how much is IPC improvement and how much is just clocking it higher on a new node.Wardrive86 - Monday, May 29, 2017 - link
Shouldn't that be 2-128 bit NEON/FPU pipelines for the A75? If not that's a Max 4 flops per clock and lower than the cores it is replacingserendip - Monday, May 29, 2017 - link
I hope chip vendors don't push 8x A55 designs for the midrange because they're only good for the low end. Having so many similar cores is pointless because Android rarely uses all 8 cores.I'd rather see more 2+4 or 4+4 designs with the A55 and A75, especially something like the old Snapdragon 650/652 with the latest cores and processes. I'm looking to upgrade my Mi Max a year from now and the relevant chips should appear by then. On the other hand, with constant driver updates, this phone could last for a few years still.
Wilco1 - Tuesday, May 30, 2017 - link
A Cortex-A55 at 2.5GHz (same as Helio P25) would get close to ST performance of Galaxy S6 (and match MT perf). That was top-end 2 years ago... So while I agree 1+7 or 2+6 would be much better than 8x A55, I don't think you could call an S6 a low-end phone even in 2018!serendip - Tuesday, May 30, 2017 - link
The Helios with their decacore design couldn't beat the real world speed and battery life of a Snapdragon 65x. It's foolish to run an A55 at 2.5 GHz when an A75 at lower speed uses similar amounts of power while being much faster. At one point, you move the load from the donkey and put it on a race horse :)Wilco1 - Tuesday, May 30, 2017 - link
All of the Helio deca cores use Cortex-A72, just like Snapdragon 65x, and the higher clocks of Helio means it beats the 65x like you'd expect. So I'm not sure what your point is?Cortex-A53 in Kirin 950 at its highest frequency is ~50% more efficient than Cortex-A72, with the crossover point at around 2.1GHz. On 10nm with Cortex-A55 it may be closer to 2.5GHz.
serendip - Tuesday, May 30, 2017 - link
Yeah, my mistake, I was thinking of Helios with 8x A53s only. Anyway, perf/watt matters at high clock speeds, so a 50% efficient A53 still can't do as much work as an A72 at the same clock speed. I'd rather keep the efficient cores humming along at low speed and have the big cores come online in short bursts, like for app loading or web page rendering. Note that this might not work for constant gaming though, the big cores constantly being on will overheat the phone and kill battery life.No easy solutions then.
Wardrive86 - Tuesday, May 30, 2017 - link
2 GHz a53 has the single thread performance of 2.3 GHz Krait 400/1.85 GHz cortex a15. Octa designs often have well over double the multithread performance of say a Snapdragon 800/801. Not low end..very much midrangeserendip - Tuesday, May 30, 2017 - link
But the A53 or A55 at 2+ GHz is a huge power hog that's still slower than a similarly clocked A72 or A75. The octacore branding is a gimmick when all cores are the same design. Performance doesn't scale equally with increasing frequency and power consumption - at one point, it's better to switch the task to a high performance core rather than keep increasing speed on a low performance core.A smart design (like the 650/652 which is a flagship killer) would 4x or 6x A55 at low clock rates for multi threaded stuff and 2x A75 for pure single threaded performance, power consumption be damned.
Wardrive86 - Tuesday, May 30, 2017 - link
Snapdragon 625 @ 2.02 GHz and 626 @ 2.3 GHz are certainly not battery hogs. They are both homogenous and ramp clock speed up on all cores very, very often. Much snappier performance than paper specs would suggest and incredible battery lifeStrangerGuy - Tuesday, May 30, 2017 - link
Midrange ARM big cores at actual midrange prices are already quite a rarity in the China phone market, let alone outside of it. Since they perform so close to actual flagships SoCs, most OEMs will either price the devices similarly to their actual flagships (cough Samsung A9 Pro), or not doing them altogether. If you ask me who to blame, it will be the non-Apple custom core designers sucking hard at their jobs.Besides an A55 SoC with presumably >1K ST GB4 scores are no slouches either, for a $120 device I'm certainly not complaining.
serendip - Friday, June 2, 2017 - link
Maybe the 650/652 was a flash in the pan and the 660 could be a unicorn chip, one that's announced but never deployed. Interestingly Xiaomi moved to the 625 in the Redmi Note 4 and Mi Max 2, whereas predecessor models used the 650. Maybe OEMs really are afraid of good-enough chips in their midrange devices cannibalizing flagship sales.legume - Tuesday, May 30, 2017 - link
All of these numbers are crap if the cache configs are not stated. DynamIQ is very different and most of the SPEC gains could be from L2/LLC increases. This is all marketing FUDWilco1 - Tuesday, May 30, 2017 - link
Forgot to read the article?"These numbers, as well as the others shown in the chart, comparing the A55 and A53 are at the same frequency, same L1/L2 cache sizes, same compiler, etc. and are meant to be a fair comparison. The actual gains should actually be a little higher, because partner SoCs will benefit from adding the L3 cache, which these numbers do not include."
legume - Wednesday, May 31, 2017 - link
iso is not the same as knowing the valuesMatt Humrick - Wednesday, May 31, 2017 - link
The L1/L2 cache sizes for A53/A55 are stated in the article.Great_Scott - Tuesday, May 30, 2017 - link
Fantastic article, Matt. Best CPU tech article I've read in years, and I read most of them.Alexvrb - Tuesday, May 30, 2017 - link
"ARM wants to push the A75 into larger form-factor devices with power budgets beyond mobile’s 750mW/core too by pushing frequency higher. Something like a Chromebook or a 2-in-1 ultraportable come to mind. At 1W/core the A75 delivers 25% higher performance than the A73 and at 2W/core the A75’s advantage bumps up to 30% when running SPECint 2006. If anything, these numbers highlight why it’s not a good idea to push performance with frequency alone, as dynamic power scales exponentially."Perhaps, but it gives it a lot more headroom for use in things like tablets... and laptops. I'm thinking Windows on ARM could use an even faster SoC than the SD 835, and 2W is perfect. Right in Atom ULP territory, and there's no modern Atoms left to compete in the lower-price territory. Perhaps Intel will be forced to release cheaper gimped Core-based "Atoms" in the future? Or Celerons/Pentiums. ;)
LiverpoolFC5903 - Wednesday, May 31, 2017 - link
Meh.Incremental update with no radical changes. Would LOVE to see a huge fat ARM core with a 5+ wide front end for premium devices, with single threaded throughput approaching that of the Core M series. Now that would be progress.
No reason why a dual core with two fat cores cannot work great on android, especially given the idea of race to sleep. Off load background tasks to DSPs, Microcontrollers etc or even use a third big core clocked at about half the frequency of the main two cores.
Sure, will be expensive and big, but you can be sure there will be customers for it, especially in the 700 USD plus market segment. As of now, manufacturers barely have any choice apart from qualcomm chipsets.
lizanosi - Wednesday, May 31, 2017 - link
I ask you, why have Samsung and Apple continued to have great success deviating from ARM's reference designs, http://www.promocodeway.com/coupons/ubereats-promo... while Qualcomm has been married to them and paying the performance price (specifically looking at you, 808)melgross - Wednesday, May 31, 2017 - link
For the most part, Samsung's designs were straight from ARM. They didn't have an architectural license. It's only very recently that they've gotten one.But Snapdragon has been Qualcomm's own designs, because they do have an architectural license, as does Apple. But, like the rest of the industry, they were discombobulated when Apple came out with the 64 bit A7.
They've never gotten totally back into the race. Their fist one was an ARM design, and it had heat problems. The second was their design, but performance was fairly poor. The 835 is not much better than the preceding model. Samsung has faired no better. The problem they all have is that Apple is two years ahead there, and likely took their time with the A7, because there was no competition. These guys are rushing to catch up, and they are likely restrained by the expectation by Android buyers that more cores are better, rather than having better cores.
StrangerGuy - Friday, June 2, 2017 - link
Now that Apple's GPU is a mostly fully custom part, expect the A11 to start another A7-esque domination over Android SoCs on graphics. i also expect a Apple custom LTE baseband to debut this year too, since Apple is definitely too paranoid to depend solely on Qualcomm and Intel's baseband proved to be donkey balls.Besides, iPhones probably outsell everyone's else flagships combined yearly in a single launch quarter. The economics of scale for a Android flagship SoC makes far less sense.
Suraj tiwari - Thursday, June 1, 2017 - link
Dynamiq is a welcome move, it should be adopted by SOC manufacturers immediately. No other cpu manufacturer (intel, AMD) has a technology like this!Anato - Saturday, June 3, 2017 - link
I would prefer 2+2 over 8 A55 cores any day and pay for it, but marketing disagrees :-(slee915 - Wednesday, June 28, 2017 - link
This article shows A73 has a 3-stage AGU LD/ST memory pipeline but last year's A73 article http://www.anandtech.com/show/10347/arm-cortex-a73... shows it has a 4-stage AGU LD/ST. So which one is correct ?skiffc - Wednesday, August 16, 2017 - link
As described in ARM A75 TRM, A75 L1 data cache is of PIPT 16-way set-associative.Suraj tiwari - Monday, October 9, 2017 - link
Clear and lucid article. I am from commerce background and can still understand most of it.Introduction of L3 cache in mobile is good but it being pseudo exclusive sucks! Because area and power efficiency is what matters most in mobile.
roshanraju - Wednesday, January 3, 2018 - link
what is the replacement policy used in the L2 cache? Also is the L2 cache PIPT?peevee - Tuesday, March 20, 2018 - link
"Or will we see new 7+1 or 3+1 combinations with a single A75 surrounded by A55s?"They would have too much sense. The loads are either single-threaded or multi-threaded, not 4 threaded. Because every modern computing environment from watches to server farms are power-limited, multithreaded loads need to be performed on the most efficient cores possible. Which is certainly excludes any SPECULATIVE out-of-order execution.