Archives

These are unedited transcripts and may contain errors.


Routing working group, 2nd November 2011: Routing group working group:

CHAIR: Hello. Welcome. Hope you all had a good lunch. This is the routing working group. If you meant to be on the other working group that's the one behind.

So we have the draft agenda posted up there. I'd like to start by reminding everyone that if they want to go up to the microphone and speak, please state your name and affiliation for the benefit of the remote audience. My name is [Jaoa] Damas. I'm chair of the routing working group. The cochair is Rob Evans. If you have any complaints please feel free to address us later. The session is being stenographed for your benefit as well. And available on line. The scribe and Jabber room person, Rob told me 60 seconds ago and I've already forgotten. Emile is doing the minutes and Robert is doing the Jabber. Hopefully you had a chance to look at the minutes from RIPE 62. If there are no comments, we'll declare them final. The final item in this preliminary part is, as you may have noticed, we have been discussing for some sometime, IPv6 routing recommendations document last call. We issued that in the summer and, much to my shame, I failed to close it when it was due. Two weeks ago I reissued the last call with an end date of today. So far we have had no further comments and the comments that happened the previous last call were addressed, I think. So unless there is any other comment at this time from the room, we'll declare that last call close and had ask the RIPE NCC to publish the document. No one? Thanks.

Then we go to the more regular program. And we have four speakers this afternoon, Geoff Huston, Thomas [] /PHAPBG inand two little brothers called Randy Bush. So that's the draft agenda. There will be is some time for AOB. Or does anyone have anything to say about the agenda? Oh, Daniel.

AUDIENCE SPEAKER: Daniel, maybe under the AOB, if the working group wants to touch on the routing beacon address assignment, I think you may make 30 seconds for it.

CHAIR: Thank you for noting that. Let's get started.

GEOFF HOUSTON: Good afternoon. My name is Geoff Huston. I'm with APNIC. I have a lot of slides and exactly 20 minutes. If it starts to look like a movie, that's deliberate. I want to have a look at BGP over the last few months. All the back in 2000 six, they came out with the alarming statement that of all of the problems facing the network at the time, including IPv6, I might add, the one they thought was the most important problem was routing scaleability and it must be solved. That's the conventional wisdom about BGP, that we're all going to by because routing are not con stained. You unleash those nasty /24s and we're going to die a horrible death. In v6 you slash and all of our routers are going to melt. The beauty about conventional wisdom is nobody looks at it. I looked at this and I liked this definition: Conventional wisdom, even though these beliefs are widely held they're generally unexamined. Let's examine this idea that we're all going to die a routing heat death. I'd like to have a look at this with data. There's an enormous number of graphs and all of them go up into the right. Here you go. But interestingly, there's a few little details. So this is the last oh, 22 months. And you'll notice that sort of late last year, someone decided there'd be a nice person and get rid of some bogus routes so they got rid of about 6,000 routes and that lasted for a few weeks. The real thing is in April, because over there in Asia Pacific, we ran out of v4 addresses and the rate of growth in BGP has slowed since then. So the table is growing slower than it has done for some time since runout, which is kind of interesting. So whatever is going on with the result and explosion of routes because of address exhaustion, not yet. How many addresses are being routed? About 150 /out eights out of 256. There's been a lot of folk advertising /8s. A lot of work of deboganising, testing the traffic there and the work I've been doing and others is the amount of dark traffic. If you advertise a vacant /8 and see who wants to send you traffic anyway, we've been doing work on that. You can see that in looking at the address span. On the whole the stuff is panning out to the right. This one I like, this is the number of autonomous systems in the network. When you bring up addresses you're erratic, have a bit of on day and off day, but when ASs are concerned, you guys are process bound, every day you must do 10 .2 into the network. It's a clean straight line. You're weird. You are so weird. I'm like why do you do this? So what happened? That's the whole set of stats. If you look at the right, the general thing is the v4 network, no matter how you look at it grew at about 12% in the last few years, it's about 12% on most of the statistics. So what's going on? Not much. Address span is growing more slowly than anything else, probably because we have fewer addresses as we run out. But other than that, nothing much is going on. So if that's v4 and that's a bit boring, what about v6? The dreaded J curve. So, yes, we're actually doing phenomenal amount of acceleration in v6. We didn't grow from 100,000 but we did grow from a mightty 2,500 until 6,800 routes. Notice World v6 Day. Notice what happened after World v6 Day. The message was you were meant to keep going, not stopping. Okay? Keep going. So, yes, that sort of world tail off, oh, that was all too much, now I'm going on holiday all summer. Back to work everyone. Here's the routed address span. V6 is bizarre, because, quite frankly /8s are enormous, huge, and someone decided to advertise a few. That wasn't me. That wasn't me. But I was advertising a /12 and those little bits, that's me. Not much you can say about that kind of graph, it's lumpy. And here's autonomous systems and again it's more obvious before v6 day. We got another 200 autonomous systems, and as soon as the day was over, that was it, no more. Right up until now. No more. So whatever effort sort of v6 Day had, the continued impetus has been slow. So, yeah, just that one point, nothing more. Here are the stats on v6, even though the numbers are small, the percentages are enormous. Up around 60% in some cases. So this stuff is growing like crazy. You can start doing some comparisons. You can't compare the number of entries in the routing table. The v6 routing table is different than v4 because it has a huge amount of legacy swamp insider. But one thing you can compare that I think makes more sense is the number of autonomous systems that play in each protocol. If you look at that and start extrapolating forward, at that rate of growth which for AS numbers is 80% per year, by mid2016, the two counts will be about the same, which means if you keep up doing what you're doing and don't slack off, you know, there may be hope at the end of the tunnel. So we're growing at about ten% in v4,00% in v6. Where is all this heading? We're going to be heading into some kind of routing scaling heat death of BGP that your routers are going to melt and we're all going to be in hell. Sore is this so boring there's simply nothing to see in let's understand which of these options we're going to go to.

The best way of doing this is by applying maths, so I did. The way I'm using this is a quadratic 2 polynomial. So you take that graph and inside of looking at the last couple of years, that's BGP since 2004, that's seven years. That's the first order differential. Don't ever let it be said that we're not part of conventional mainstream business. That's the global financial crisis mark one in 2008, late 2008. And that's IANA exhaustion in February. The rest of 2011, I blame the greeks and the rest of you guys that you just stopped working, which is kind of true. There's little money left and the growth rate has come down heaps. So the daily growth rate is back to the longterm medium after a sudden peak. You can take that model, generate a quadratic. Do you remember those? I certainly do. And extrapolate the quadratic forward and you get to this kind of table. Up and to the right and you notice that by about 2016, if we keep on doing what we're doing we'll get to half a million routes. Most of your routers today can cope with that. It's hardly alarming news. What's alarming? What makes the number alarming? Well, I would suggest that what makes things alarming is generally Moore's Law, that if you got a system that's growing faster than the capability of silicon, whatever you're doing is more expensive. If you have a system that's growing slower, whatever you're doing is going to get cheaper over time. Maybe we should be comparing that growth potential to Moore's Law, it's been relatively consistent and it's all about the number of gates on a silicon chip doubling every 18 and a half months. Go and look it up. My memory is fading. That's Moore's Law compared to the size prediction. I can't find any data there that causes me to panic. Whatever we're doing in v4 is kind of boring compared to the capability where silicon is going at this point in time. So the growth factors in v4 don't seem to me to be cause for alarm. So there's the projections for v4 by 2015, half a million entries. That's assuming you don't go nuts and advertise /24s. If you do all bets are off but you did it to yourselves so so what. That's v6, that's the table. Look at that daily growth rate. Look at what happened with World IPv6 Day. That's quite an amazingly large curve that you actually fit to. I can't do a fit address all the seven years, so what I did was a fit across 18 months and that gives me that quadratic over there, push it forward and it starts to look a we bit interesting, by 2016, you'll be up to 60,000 routes in v6. Hardly big numbers. That's the kind of projection we see. That's a comparison to Moore's Law. Moore's Law is still higher than those projections at this point. So as far as I can see, even if you put all these numbers together, as long as you guys don't go crazy. Don't go crazy. You'll be fine. Interestingly, you'll be the sort of same unit costing routing will persist over the next five years. So is it a problem? I don't think so. In terms of price performance terms, if you're [] /AS I cans, if you can get large amounts of memory in line with Moore's Law you don't need to worry about fib compression or trying to compress your routing tables or even too much about routing aggregation. Whatever you're doing at the moment is okay. Just don't go nuts. Don't do any more or less than what you're doing now. And you're within sustainable margins. So from that message, things are okay. There are very few women in this room, so this question is topical to all of you men, does size really matter? Is rise really the major metric here? And interestingly in BGP, whenever we talk about scaling in BGP, it comes up it's not the size that matters, it's actually the amount of activity in routing that matters, the level of updates that happen, the amount of crunching that you need to do because every single time you receive a BGP update some processor needs to go whir, whir, whir. So let's have a look at updates. This is all the updates I can find in my router in AS 2.0 since Jan 2010. BGP is boring except for the exciting days. Most of the times there's nothing happen then all of a sudden there's millions and millions of up dates. Most of the time it's boring. Let's get rid of the exciting days because that's when a router did a reset and filled me a routing table. On all the other days since 2005, I've only seen 100,000 updates every day. That's not an up and to the right curve, it's flat. So I'm getting curious at this point, that shouldn't be happening. Since 20005 we've grown the routing table and by and large there should be more things being unstable. There's more things in the routing space, more people to do fat fingers and get stuff wrong. Right? Wrong. Somehow we're pumping more stuff into the routing table but the amount of activity is kind of constant. If I project forward, even to 2016, I'm still at about 100,000 updates per day. This is weird, really, really unnatural. So then I go, well, how about withdrawals? Maybe it's just me, withdrawals are meant to be kind of global. Same picture, around 10,000 a day. Nothing has changed since 2007. The stuff is really flat. So I'm doing a linear projection, you go up to a phenomenal 14,000 up dates per day, one every five seconds. This is not exactly exciting stuff. And I'm kind of sitting there thinking, no, no, on my daily drive to work, I do notice that when I drive to work on Saturday, accidentally, there's no one else on the road and it's really quick. All the other days there's everyone else on the road and my drive to work is really, really slow. BGP doesn't do this, somehow we've doubled the number of cars and the road and the number of people doing updates but the number of updates is stable. That is really, really weird and anomalous. What the hell is going on here? Instead of looking at just the updates. I start looking at the prefixes. So I don't care how many times they got updated every day, if they're the subject of an update, I'll count it. So on the days when there was a full routing table reset, I see the full routing table, every prefix gets updated. On the days when I don't see someone close to me doing an update, I see 20,000 prefixes a day getting update and had that's been the same since 2007. It's back to around 2004, only 20,000 prefixes a day on average get updated when it's a quiet day. And I'm thinking, what's going on here? It's kind of like you people at the back, you're the bad people, right? You're allowed to do updates and everyone isn't. If you feel like doing an update you have to swap; with someone in the bad room. There's only 20,000 folk who can do an update every day. This is bizarre. For some reason we're only doing a fixed number every day. Is it the same people? Other folk who are persistently bad, bad for a long time or is badness a short thing? How long does badness last? I thought badness was a longterm trait, once you were bad, you were bad and you were crazy forever. Not so. Within two days you're not bad anymore. Somehow the number of folk who are unstable keep on changing every single day but there are only 20,000 of them. I don't know what strange law BGP operates on, it's a secret to me but it's truly weird. That distribution is strange. Down the right on the routing beacons, eight prefixes unstable over the entire period and we know who they are. They're unstable by definition. So generally 90% of prefixes unstable for two days or less and I can see the beacons. So this is weird. Then I'm thinking to myself, the whole idea of BGP was to compute the topology of the network. And the real scale was as we had a network with a million entries BGP would never converge, constantly thrash and take longer and longer for the BGP distributor algorithm to come to a stop. Let's look at that theory now and understand what converge sense about.

Here's an example that we typically see with a route withdrawal. This is a beacon in Moscow and I picked it because it's a long way and you can see path hunting going on. In this case, one withdrawal actually managed to generate four updates and then a withdrawal. That's pretty typical, the bigger the network, the more you will see path hunting. So the theory is that the bigger we make the Internet, the more we're going to get this update update update, the more we're going to get BGP itself amplifying the problem. So how many convergence events do I see every day? If I count that entire sequence as one event, how many of those do I see a day? Again since 2009, that looks pretty flat, around 20, 30, 40,000 of them. It's not growing. I separate out the single and multis and still see flat. Now, I'm getting fascinated. For some reason we've got a network that grows but the underlying issue, convergence, isn't, it's if he nominally stable. What can I say about this? For some reason BGP is doing an amazing job. On average since 2009, the network has grown like crazy but still takes 2.9 updates to reach convergence and has done every day. Really stable. If I look at the distribution of those number of up dates most the time the network converges really quickly, inside three, about 99% converge within three updates. BGP is working really, really well. Here's the amount of time it takes. I'm sitting in Australia, a small isolated island. If anything is going to take a long time to converge it's going to be to me. The average time to converge is 70 seconds. Really stable. Nothing has changed. We've doubled the size of the routing table yet it still takes 70 seconds for BGP to reach convergence. This is bizarre. You start to think what's going on. The first thing I can think about is that convergence has a lot to do with the average AS path link of the network. So then I went back and looked at all of route view's data over all of time, which in this case is since 1998, the universe only started then. All of the peers of route views have not seen their average AS path link increase. Somehow as we built the network, we haven't made it bigger, we've made it denser. And the diameter of the network has actually been constant. And I suspect that that constant diameter is why convergence has been bounded. If we start building a long and stringy network it's going to have a lot more updates. As long as we keep the topology you're going to have a network where the BGP factor is going to remain stable. This is a good thing. What you're doing now works, don't stop it. So that's okay. I can understand the first part of the problem. But those bad boys over there, why are there only 20,000 of them? Why is there only a certain number of unstable prefixes every single day? Why isn't that growing? Because any kind of system where you do you believe its population normally instability should be a function of the population, more folk, more things, more stuff going unstable. Why is that part of BGP limited? I have no idea. And folk that I've sort of asked about this, when you pump through the data they're simply going absolutely no clue whatsoever. Somehow we have a system that we understand one part, BGP convergence is bounded because topology is being bounded, this is great. What we don't understand is why the raw instability events, the unstable prefixes, hasn't grown at the same pace as the routing table. For some reason that particular metric has been flat. And why is a complete mystery. If you have any ideas I'd be keen to hear them even now. If you have any questions I'd be happy to answer them. Thank you.

(APPLAUSE.)

AUDIENCE SPEAKER: This is just an idea and I don't have a number to back this up, you asked for feedback, so here it is. The Internet is becoming more dense and the pref lense of I network to peer with other network is his probably getting more and more regular to see because we see Internet Exchange points marketing much more widely to enterprise networks who run BGP even though we're tacking on to the end of the network, the networks we think of stop is maybe just a large website, actually getting involved with the middle of the network and peering, AS path  I can believe why it's so short. Maybe it's because people who peer more widely are enjoying the lowest convergence time and it's becoming more typical to peer more widely. This could be the answer. It might be nice to do some reserve into that. And maybe draw a conclusion.

GEOFF HOUSTON: There has been some work in that area. I remember work done at UCLA and what was found, when she compared the number of updates against the position in the tearing table, and she was pointing out the lower you are in the pecking order of tears, you generally experience a higher number of up dates than folks higher up the tree. This idea that you connect as quickly as you can to the middlest bit of the network and that's a good thing, is probably true, the more you try to search for the best connectivity, the more you're doing yourself a favour and oddly enough the rest of the network a favour as well. That's part of it.

AUDIENCE SPEAKER: Thank you for that. And peer with me.

CHAIR: Question in the back.

AUDIENCE SPEAKER: Thanks for a really interesting talk. As usual it's great to get actual data on all of this conventional wisdom stuff. I'm not sure if I really share your back lack of concern about routing table or routing engine mechanisms, because it's not clear to me that they are keeping up to date necessarily with the requirements of large scale routing on the Internet. There are two main mechanisms, TCAM and RTL RAM and they have good points and bad points. When you say the DFC is whatever it is, 375,000 prefixes today and IPv66,000 prefixes, you isn't also say that in general an IPv6 will take up four times the amount of lookup space that an IPv4 prefix will do. And you also ignored iBGP tables and also VPN systems within interior networks. And the consequence of this is that in many large networks, the conventional wisdom is that one million prefixes, which is the standard size for largescale routers today; not really sufficient any longer. And once you go beyond that one million, it becomes quite expensive and very difficult to get to the next stage, which is two million, which will last you for another couple of years. Do you have any comments on this?

GEOFF HOUSTON: A few. Thank you. The first of these comments is interestingly enough, in terms of the amount of processor required, if you had an engine that could process all of the BGP updates you were seeing in 2004, the same engine copies today. That's weird but it's true.

AUDIENCE SPEAKER: I would disagree with that. Have you run a 720 in practice?

GEOFF HOUSTON: I said engine, not the router itself. The other part is your issue with forwarding tables.

AUDIENCE SPEAKER: The engine on a sub720 is inadequate for routing these days.

GEOFF HOUSTON: The number of updates per day hasn't changed.

AUDIENCE SPEAKER: Maybe it was

GEOFF HOUSTON: I've also had the opportunity to look at a major carrier's internal iBGP network and I was looking for the same signature of whether it was growing or flat. And we've yet to get any published data on that exercise. From the preliminary looking at it, oddly enough even iBGP in that environment doesn't seem to be doing an up and to the right. It seems to be this same kind of bizarre relatively flat profile. So from that prospective, it's hard for me to agree with the IAB. It's very hard for me to say, I think this system, which is unbounded, has no natural law that prevents anyone from doing anything in BGP. But there's no observation that says not only is it unbounded, but it's growing so quickly it's out of control. It doesn't look to be out of control. It looks to be much flatter than we all thought.

AUDIENCE SPEAKER: Okay. Thank you very much.

CHAIR: Okay, thank you Geoff.

(APPLAUSE.)


CHAIR: Next up we have Thomas.

THOMAS MANGIN: Hello. Geoff just told us that the iBGP was more unstable and I'm going to  Geoff just said the number of wave in iBGP was more unstable so I'm going to tell how to increase that.
Why would you want to use any application to inject manually? The main one is well known, you have the DDOS filtering. You may have a government that wants you to do intersection and you may want to do it via BGP. Traffic engineering. Suspend customers. Some people working more on the service side may want to work with you N BGP to announce prefixes. You can see for any cast, you can see from cloud providers, using BGP to move servers from one zone to another quite easily. So up to now the way to do it was to find BGP software, general configuration, then renew the application. It's quite clear that it's not very practical. You have to work within the limit of the software you're using and you will not be able to integrate very easily with all your back hand or whatever you use. From that I came to the point that something better should be done. And I tried to do it.

So I wrote a BGP software, which is a route injector, which allows you to write to inject BGP route from his simple scripts. The way it works, you fire your text editor, write a simple script, which will print on the screen what you want to announce and what you want to remove. This very short shell script will announce a route ten seconds and remove it ten seconds later. All you have to do then is just define a neighbour, set up your BGP session and tell the program to start announcing those routes.

So that was pretty good and simple. But some people told me it wasn't simple enough so using user feedback I designed something else. Some routes have some  you may have some practice service and if your service is down you may want to move the routes as it's not needed anymore. A good example is any cast, you may move it from your  if your DNS software goes down, you may want to stop any cast from that location, otherwise you are rerouting traffic. In that example you can define route manually and give them a name and later on check if whatever the reason for it to be vary and had if it's still varied continue to announce them or remove them in bulk. So it's BGP some of you may like to deal with D Dos, a good way is to use  I had time on my hand. One had to be written and I did spec support for the application. It's a way to pass firewall like rule via BGP to your router. So you can define the rule  back the traffic you want to match, exactly like a firewall, and the in action, to drop the traffic or move it and you can do it in that application. I was informed you must be careful on the limitation on the hardware and if that you moving many route or complex route it may affect. If you want more information I can point to you someone who knows about it. That's the example of what the flow looks like. You can define source destination. That's it. I hope it was useful. What I'd like to know is many of the  I tried to make sure that the application is useful to people in the room or out of the room and what I'd like to know if any of you have any feedback and I will answer any question. Thank you very much.

(APPLAUSE.)



CHAIR: Any comments or questions? Of your email address? That you're not going to get a look 

THOMAS MANGIN: Yes, it's thomas [at] mangin [dot] com.

CHAIR: There's a typo. That's clear, I think. Okay. Any more comments? No, well, thank you very much.

(APPLAUSE.)


CHAIR: Now we have Randy and then Randy.

RANDY BUSH: Jeff's right, you people in back have been very bad and you get to hear me twice.

Okay, let me notice how to work this. Looks good. I'm sure you're all aware of back in March we had an unfortunate event in Japan, many people killed, et cetera. But the question is what happened to the network? And as Geoff tried to say, life is actually when you look at it pretty boring. This is pretty boring and that's my point. Okay. The agenda, we'll do, I'll show you the events, what happened as viewed through routing, show you what happened as viewed through traffic, and maybe something about the impact of other disasters. For those who don't know what the shape of Japan is, this is the area where the quake was, this is Tokyo, and most of networking in Japan and a lot of networking in Asia goes through this area. And secondary would be here. Okay? And then, of course, there's the links over to the states, and there are links from this coast over to the rest of Asia and down through the Taiwan strait, which had fun the other year. The actual events, March 11th, the quake, 14:46. This is viewed from IIJ. Let me give you perspective. We're an Asiancentered provider, we're Japan's first commercial Internet provider. We don't service many consumers, NTT has the consumer. We serve the small enterprises, government of Japan, Toyota, all the big banks, also go through Asia to the states. So the interJapan links, [Sendi], et cetera, are the ones that got hit. The earthquake occurred. Two minutes later the inhouse power switched. At 14:48 we lose the links to send [] a. One link to the US fails seven hours later. What's really happening is that those links get hit by under sea land slides that table out the cable much later. So four hours later or three and a half hours later we lose two more links between Tokyo and the US. That next morning, six in the morning, so about 16 hours, the first link to [Sendi] is recovered, the data center goes to external power, the US link, first one comes up and there are other US links as you'll see. The second link comes up and a third link to the States and a second link to [] send AI comes up.

Routing viewpoints, OSPF, I'm still fighting that. And we have a view from a large neighbour who's a good friend, also an international provider, and we can see their view of us, which is interesting. Okay? The OSPF we got through route explorer through packet design. For the backbone, there are about 1,500 links and 325 nodes. This is at the end of February. The event occurred in March. We'll bucket the events, these are bucketed in events per hour.

So watch out from your culture green and red are reversed. Red is good, green is bad. So here are neighbour drops per hour. Okay? This, of course, is the main event. Okay? And here neighbour add back in. What you're seeing is routing noise. You'll notice normal noise in OSPF as things pick up. Couple more circuits come up, et cetera. I always say route is gone like white blood cells, routing noise is the system attacking the problem. The fact that there's routing noise is good. Okay?

So here are the events seen per hour, and we see the drops. Okay? As links to [] [Sendi] go down, we see OSPF time outs for the whole region occurring much later. Okay? Then we see failure of US link. And then we see stuff start to come up. Okay? And that's just events per hour of those circuits.

The connectivity to [Sendi] was lost for 15 hours. Out of a dozen or so transPacific links three links fail. OSPF churn is low compared to the number of refreshes. Okay?

So, again, here's where it happened, about 100 prefixes disappeared and this is not a log scale, it's just a little noise.

From our kind neighbour, we record all their iBGP. So they peer at many points with us, and so we get to see how we look from their viewpoint. And here we see the withdrawals per hour, there's the great fun right there. Okay? And the different symbols are their different routers that peer with us. So you can guess we peer at about a dozen points. And here we get to see stuff coming up. Okay? Here is the updates, right? This was the withdraws, these are the updates. This is normal. So the stuff we saw is boring. It's just part of the normal noise. Nothing radical, nothing big. So let's look at traffic. We see the traffic, MIYAGI is one of the prefectures up there, boom, and then slowly picks up, days later now, they slowly recover. Japan takes a big hit right here, essentially on the telco network was out far longer than the links to [Sendi]. The links stayed up. On your hand set telephone, you could not get a voice connection, you could not text, but email worked. Okay? And I was sitting at home and once I washed all the coffee off the walls and desks, my connection was fine. Okay? But what happened was the actual human activity was severely disrupted. My wife took six hours to walk home from class. Right? Everything stopped. So that's what you're seeing here in Japan. It's not that the traffic from [Sendi] was such a big contributor but we're seeing Japan slowly recovering. Okay? The transPacific links disappears, another one takes it up. Another transPacific link, it takes the increase but no flattopping, no nothing. No congestion. We look at the big exchange points. Took the serious hit. And stays down. We're not sure why. All of the exchange points kind of stayed down after the quake. Here's the Osaka exchange point. Maybe they switched from JNAP to JPI, don't know. Okay? So what did we see? OSPF, one link further north failed because it had a shared fate with these. But other links stayed up. Okay? It was just [Sendi] that went down. We also found that in our neighbour, one router had massive, essentially a session reset once a day at the same time. They didn't know it. We caught it for them. So we found abnormal behavior in our friend. Essentially, this was all pretty boring. I think it was Krugman, the economist, that said when banking got exciting, we all were ruined. Bank is gone supposed to be boring. Internet operations are supposed to be boring. [Sendi] was disconnected for 15 hours, no effect on customers that weren't there. Significant impact on transPacific links, but not to see. Okay? The Internet works, by the way. We don't use MPLS. I looked at another provider that had fixed MPLS transPacific sharing, both of their virtual circuits got killed, and of course things got ugly. We are a v6 provider since 97 or something like that. We don't have any secret sauce, we run vanilla, stupid Internet. Very prudent and high quality operations. Aside from the physical isolation due to two broken links that nobody is going to do anything about, no impact on customers, routing spikes solved the problem. The Internet is boring. Okay? And it should be. Unfortunately, we will probably have other events in the world that are even more significant and will make things even less boring. Let's hope not. Questions on this one? How do I make it go to the next presentation?

AUDIENCE SPEAKER: I have one.

RANDY BUSH: Yes.

AUDIENCE SPEAKER: Sorry for that. What's interesting for me is what was the difference between the MPLS and real IPv6 what was the reason that the real MPLS was killed 

RANDY BUSH: This was not us. It was a neighbour that I talked to and they had built virtual traffic engineering primary and back up. They had more than those number of circuits, but since the primary and back up got hit, they didn't have routing. They had circuits.

AUDIENCE SPEAKER: Okay.

RANDY BUSH: Right?

AUDIENCE SPEAKER: I have one more question.

RANDY BUSH: Who are you?

AUDIENCE SPEAKER: [] /AOEPL owe. Have you looked at the latencies and have you noticed the failover times? I mean that seems like a bit nitpicking.

RANDY BUSH: How long did it take to failover? The links to [Sendi] couldn't fail over. They were dead.

AUDIENCE SPEAKER: The other ones.

RANDY BUSH: The transPacific links? No we didn't look at detailed changes in latency. I think most of the circuits are essentially about the same. And they're vastly overprovisioned.

AUDIENCE SPEAKER: Okay. Thank you.

RANDY BUSH: Okay. Because things are calming down in RIPE in the RPKI area, we needed to make more trouble. So after those orange gin validations the next thing is BGP SEC, it's years away, path validation. The AS path is essentially signed. At every hop in this design, AS0 is sending this BGP update to AS1. And it signs that it's going to send it to AS1 so nobody else can say that they sent it and so on and so forth. Two signatures, AS0 sends it to AS1. AS1 sends it to AS2 and so forth. What I try to look at, we were very worried and the whole community thought that the costs of this signing and the costs of verifying this  I got a BGP announcement, it's got a path length of 5, I have to verify the crypto for five hops, my router is going to die, oh, my god, it's the end of the universe, news at 11. So how do we model what this is really going to cost? The way this is designed is that when you look at origin validation with ROAs that's a thousand points of light, somebody decides to do it here, somebody decides to do it there, the places that decide to do it are autonomous from each other. This technology deploys in islands, I and my neighbour and my neighbour's neighbour start signing. Okay? Those who are not playing the game are not on the island. So this deploys in island. So what does an island look like? Okay. It's  and just to be clear, when you deploy origin validation with ROAs, you get the validation locally immediately. This depends on how big an island you have. Okay? And we usually draw pictures like this, AS1 signs, AS2 signs, and there's the signatures of the AS path, et cetera, et cetera. But I don't like this picture because ASs are not really these homogeneous green circles and it's not that the color is wrong, it's the idea that they're homogeneous and simple is wrong. And life is more like this. ISPB peers at a whole lot of points with ISPA and ISPA has a big customer cone, customers, and it's a big cone, but what's interesting, when I look at ISPB and I look at the AS paths here from A, if I look back here at ISPB, I see whatever it is, ten paths, and I think, I'm going to be validating all those? No. This router only hears one path for a particular prefix from the neighbour. So it's really the load of one single path per prefix coming from A and A's customer cone.

How big is that customer cone? Okay. Here are actual measurements. Okay. These are some of the world's largest ISPs, a couple of medium ones and some smaller ones. What we're looking at is the AS path link and how many BGP prefixes  pardon me, BGP prefixes are that many hops down. So this very large global, they themselves only have 1300 prefixes. Okay? And that's their static links, whatever. They have 21,000 firstlevel customers. Those customers have 6,000 and so on down to here. These paths are actually fairly long. We were kind of amazed. And yes, we removed prepending. Okay? So the model says that this router hears that from its peer. Okay? So now we can look at what's the cost of those multiple hops because we now have weights and counts on them. Next we want to say, how many signatures can I validate per second? And how many can I sign per second? And this is an a Sandy Bridge processor, 2011 model, using only one of its cores. Okay? And it can validate about 2200 prefixes a second, it can sign about 2500 signatures a second. Okay? This is what it looks like. Because no matter how long the AS path is, when I sign it and send it to my neighbour, I'm only doing one signature. So the signature lines are all flat, for the different processors. This is the one I'm going to focus on, the Sandy Bridge, this Sandy Bridge, verify, sign. Of course the verifies have curves depending on the AS path length. Watch out, that's a log scale.

So this is the actual result when you model it, router R hears A and let's make believe the entire AS cone below A is signed, all the prefixes, 34 seconds. That is one or two orders of magnitude less than we were afraid of. Right? This says when the session to A bounces, it's going to take me 34 seconds to validate the whole thing. Similarly B, this is their customer cone, notice all the costs tend to be in the first couple hops. Notice this one is much more concentrated up here. This one's got a little bit bigger tail. But low cost. Okay?

Signing. You only sign once. You only sign towards BGP SEC speakers. If you have a neighbour who doesn't speak BGP SEC you have to strip out the goop, that could be on the order of signing.

There's another trick that if I am just an end can you have thor and mime's multi homed, I don't want to validate the whole universe, he and she are validating them for me, so I don't want that cryptogloop from them but I want to sign my prefixes. All I have to do is sign one or two prefixes and that takes almost no hardware and no time. So I can probably do this on existing hardware I have today. Now, what's interesting is when we look at those customer cones, for the  look at the last ten years and 84% of all ASs were stubs, nontransit. So when we look at the customer cones of some ISPs, here's four ISPs, notice one is very different. We say how many BGP peers do they have on a router, what's the largest number they have on a router and what's the largest number of customers they have on the router? She's going to have a bit of trouble. These are fairly low number. So they have three, four, BGP peers, and 20 customers. And remember, we just said that probably 84% of those customers were stubs.

So except for poor W who's in trouble, you have two or three BGP SEC customers per aggregation router. Say you're trying to sign 400 K routes and it cost you 25, 30 signatures per second. The math comes out, now we're a little costly. The start up of all those sessions at once is going to cost you. But this presumes that the entire Internet is signed. That's a long way from now. But as I said, she's going to have a problem. Okay?

So if I look at it as our poor little friend B has a peer A who signs the entire customer cone, and is passing it to three BGP peers and having to sign all of these, how much is it going to cost? About 73 seconds. That's cheap. That's cheap. So don't panic.

(APPLAUSE.)

AUDIENCE SPEAKER: James Blessing. I think we're sort of quite bit bigger than W.

RANDY BUSH: Again.

AUDIENCE SPEAKER: Limelight Networks.

RANDY BUSH: I think we're.

AUDIENCE SPEAKER: Bigger than W in terms of how many sessions

RANDY BUSH: I'd love data from a bunch of people on that.

AUDIENCE SPEAKER: Those calculations are assuming that that processor is doing nothing else?

RANDY BUSH: That was one core of the route processor.

AUDIENCE SPEAKER: Yes. So 25%  when you're maybe two orders of magnitude higher on a single box?

RANDY BUSH: Yeah.

AUDIENCE SPEAKER: That's not going to work.

RANDY BUSH: How many of those are stubs? How many of those are going to be active when? You're talking about something that's not deploying this year, something that's deploying five years from now. What this says is hey, we're in range. It's not going to be fun but it's not dead. We thought we were in serious trouble. It doesn't look like we are.

AUDIENCE SPEAKER: The other question is what about v6?

RANDY BUSH: What about it?

AUDIENCE SPEAKER: I take it it signs in approximately the same amount of time, there's no additional?

RANDY BUSH: I'm not going to arbitrate.

AUDIENCE SPEAKER: Nick Hilliard, INEX. Just to continue what James was talking about. You model this on a core I7, a topoftherange server CPU for today.

RANDY BUSH: Which tends to be what's in routers three years from now.

AUDIENCE SPEAKER: Maybe a little bit more than three years from now. There's a second issue as well, it's a multi processor CPU. There's been very little indication of router vendors implementing multi threaded route processor software on their routers, and we're not  I'm not really can have the that we're going to see that in the next couple of years. Okay, well, you can have a chip like a core I 7 in a router four or five years from now, but, you know, only one of those core is his going to be used. So these figures that you say of 2,300 

RANDY BUSH: I think router vendors would disagree with your statement.

AUDIENCE SPEAKER: Maybe we have some router vendors here and they could comment.

RANDY BUSH: Let's give the Jabber room priority.

AUDIENCE SPEAKER: Thank you for describing the problem with BGP SEC, it's a protocol which fails to getdy employed even in the easiest model. I do understand router vendors like us to buy new routers but please forgive me for not supporting it.

RANDY BUSH: Didn't ask you to support it, Lutz.

AUDIENCE SPEAKER: Actually I did some similar calculations just really fairly rough in some discussion I had two weeks ago and I was taking one full table we get from our upstreams, not stripping all these prepending stuff, just counting the AS numbers and then assumed not RSA, two,096 bits and let my notebook decide how long it would take to verify these table and it came up with 40 seconds and okay that sounds not that bad I thought. So actually the other thing is we had just the argument that this kind of processor is not in a router. But once BGP SEC becomes an issue, it is no problem to build, you know, €150 CPU in a line card that cost 500,000 euro. So I don't think that CPU time is much an issue at this point and I agree with you.

RANDY BUSH: Couple points. By the way I gave this presentation at Cisco architecture last week and they said well beat the hell out of that. And that was their engineers but not the marketing department. We'll see. The second thing is remember, as I said, this deploys an island. What I think we're going to see is, just like we chased the routing engines, you know, upgrades with existing routers, we're going to chase it here. What we're going to see is some of the big islands deploy and start whining at the vendors and they catch up and make faster and faster boards. The third point I want to make, by the way, is nobody's asked the other question: What about memory? And Daniel, do you mind if I rant on a second?

AUDIENCE SPEAKER: Go ahead.

RANDY BUSH: This is what's fun. I've only got rough ones that's why it's not in the presentation. I just did some modelling on the plane here. An entry  okay, let me back off a second. If I'm validating, even if it's an island, I don't know what routers are going to be on that path, so I have to take and assume that I have the keys for all the routers on the Internet, because I don't know who's going to have signed. So I have to have, let's say, a million keys. And the keys are about 400 bites. You do the math. That's not too bad. So it looks like  it's not going to be magic, or fun or easy, but we thought it was going to be death. This is going to be within engineering scale. This is cheering news. That's the only point. Daniel.

AUDIENCE SPEAKER: Daniel. I have a slightly nitty question. Can you go back to the slide where you introduced the cones first. Next one. That one, yes. So we really, when you define those cones, you really talk about BGP hops, it's not like some other aggregation, like your customers and customer's customer. It's just purely BGP hops. Or is it some other form of aggregation?

RANDY BUSH: It's AS path length.

AUDIENCE SPEAKER: Exactly.

RANDY BUSH: With prepends removed.

AUDIENCE SPEAKER: Okay. Thank you.

RANDY BUSH: And I chose that particularly because with the current proposal for BGP SEC that's what the signature count is.

AUDIENCE SPEAKER: That's what I assumed but I wanted to be sure.

CHAIR: You say as long as  sorry, I'm talking in this one here. You're basically saying as long as deployment is slow enough this is not going to be a problem, just keep growing things in island. And my question is what happened  when you joined another new neighbour, you won't know if that neighbour is part of an island.

RANDY BUSH: When islands join it's not going to be  hopefully, as time goes on, joining a new island will be bigger and bigger because the islands will be bigger.

CHAIR: Is there a way to predict how long the island will be before you join?

RANDY BUSH: Sure. You can measure it. This slide came from a 20line Python script, you know? This was cheap.

CHAIR: Okay. Anyone else? No.

RANDY BUSH: Coffee.

(APPLAUSE.)

CHAIR: Not quite ready for coffee yet. I mean, this is the Routing Working Group. Do you expect to finish before time? Come on. Daniel brought something for AOB. At the last session of the morning yesterday, they basically put forward the case for the beacon addresses, and I think Daniel wanted to see if there was further input here. Perhaps you could get the  a little bit 

Daniel: My question is we have to formally apply to the Plenary for the address space that is beacon space and I wonder if the working group or any individuals have any idea that the beacons are useful or switch them off or increase their number or reduce their number or whatever. It would really be helpful if we got a little bit of feedback here on the usefulness of this stuff. And I know Randy has an opinion.

RANDY BUSH: Randy Bush. We put up the first beacons. We put it up as part of an experimental design for a specific experiment. RIPE NCC said this would be useful for lots of routing reserve, let's put up some more permanent beacons because we took ours down when the experiment was over. I build lots of experiment designs. RIPE left them up. People actually write to me and say "I want to conduct an experiment and your beacons aren't there." And I say "RIPE, RIPE, RIPE." And this happens a couple times a year. I think people are actually using your beacons.

SPEAKER: That tends to be my experience as well. Is there anything else?

GEOFF HOUSTON: I certainly use them, because I know them. I understand their behavior. I understand their location. In terms of trying to understand the dynamics of the protocol as distinct from what causes these interactions, the beacons are really useful as a control point where I look at that and compare it with what I see elsewhere. I certainly appreciate this and would like to see it continue.

AUDIENCE SPEAKER: Thank you. I think that's as far as we can get. Thank you.

CHAIR: So we are basically done. Just one more thing. Early in the week we called for an lightning talks, we didn't realise the link to submit lightning talks wasn't really visible. We fixed that. If you are thinking about it, it's now on the front page of the event, RIPE 63. That should make things easier. People have to talk on Friday at the closing Plenary, the last slot before lunch. So with that, we are done, unless anyone has anything else to add, which doesn't look like. Thank you very much for coming. Time to go for coffee.

(APPLAUSE.)

IBGP, eBGP, TCP, CPU, Internet Exchange,















1