These are unedited transcripts and may contain errors.
MAT session: 3rd November 2011 at 4 p.m.
CHAIR: Welcome to the MAT Working Group. All the ones who have read e?mail in the last 20 minutes and have not heard the announcement this thank this is not the Cooperation Working Group room, nor have seen the small slide, you might be in the wrong room. Now, it's too late.
Let's start. We are here today with two of the co?chairs; that is Richard and me, Ian didn't make it this time, so you have to take us.
We have a colourful agenda as normal, so we have some activities around what the NCC does and the tools coming out of these projects and as usually, we have invited a couple of other people, starting with the RIPE TTM part which is interesting because that's where the whole MAT Working Group emerging. A lot around measurement lab and so on.
So, where is my first presenter? Because I ?? here you go. I haven't seen him the whole day. I was a little bit worried.
ANTONIO MARCOS MOREIRAS: Hello. I'll talk about what we are doing in NIC.BR. We have five TTM boxes there. And we are getting the data in realtime with a system. So first I'd like to tell you some words about NIC.BR and why we are doing that.
So, NIC.BR is a non?profit organisation in Brazil that manges the local ccTLD .BR and we also have Internet Exchange, we created and manges 18 Internet Exchange. And we have projects to measure the quality of the Internet infrastructure in Brazil. So basically, we get our funding from .BR ccTLD, then reapply this funding in the projects which is fostered the development, the Internet development in Brazil.
So we have some projects related to measuring the Internet quality in Brazil. Basically, we rely in RIPE TTM to see how the main Brazilian operators are interconnected with the rest of the world. And we have the Simon project to see more or less the same thing in Latin America, and we have Brazilian projects to measure the quality of our backbone and back haul, another project to measure the quality of the last mile of home users.
Simet is the project we have for last mile measurements. We have service in our IXPs and we have two versions: We have a thin client plus a GPS that we install in some home user residents. We have about 100 of that for now and we have a website, a Java version that can do about the same tests. So for the ISPs that are directly connected in our IXPs, this is a very good measurement where we will get the last mile quality and don't have interface of other networks.
We have SAMAS that is our project to measure the quality of our backbone and back haul. We use about the same probes, they are thin clients with GPS and software that we developed. We deployed 100 first probes about a month ago. So, we have Latin, cheater, back clause and straight [] hout.
So about TTM: TTM has very valuable information for us, but it's difficult to read. So, we try to get the raw data, or even the consolidated data and show it in easier ways.
For example, I presented it a comparison between IPv4 and IPv6 this is this, data shown in another way. We have also this map, it's kind of a quality map. We have kind of grades for inter?connections with different parts of the world, and this TTM data showed in another way.
Well, TTM data was day, minutes 1 until sometime ago. Now it's day minus 2. We have the data from two days ago now. And it's not good for us. We would like to get TTM data faster, and TTM have a way to do that. There is a TelNet interface. We can connect via TCP port in the 9142 in our box, and we have a protocol, some message, some documented and some not documented. What we get: If I have only one box, I can connect to it, and we can get the latents jitter from any other box and we can get the trace out the path from my box to any other box. So I only get these two information. I can't get the latency from my box to our boxes, where they trace route in the opposite way. And I can't get any information about packet loss this way.
But, we have five boxes in Brazil, so we can do that in the pair of the boxes, in the BoF box at the same time. If you do that ?? if we do that and we combine the messages, we have the information in realtime. So we have latency, jitter, packet loss and trace routers in BoF directors and it's relatively easy to do.
So, we develop a software to do that. It's a Java software. It's a diagram of how it is constructed. There are threads that listen for the box and filter the messages and put them in memoir lines and then other threads for each of the combinations that consume this information and filters again this information and puts them in RD files. So, this is the result. The web interface is not very good. We spend more time in the programme that get the data in realtime in the correct way. But it is working. So, in the first matrix up on the screen we have the realtime data BoF directions. These are our boxes. And in the rest of the matrix, we have only the data in only one direction. So, that's it.
We are willing to charge Java software if anybody is interested. And we would like to have realtime data, permission to read the TelNet interface from other boxes, maybe all the boxes in North America, Asia and Africa because there are few ones and maybe three or four boxes in Europe.
Our system is don't scales to be a substitute or anything like that for TTM. We have tested it with 30 boxes and we have more than, I don't know 100, 150 in TTM, but we would like ?? we would like to have this general perception of how is our inter?connection and fewer box in each head room will do. If anyone is interested, please contact me.
That's it. Thanks. Any questions?
(Applause)
CHAIR: Any questions?
AUDIENCE SPEAKER: One common request that I have heard of at NIC.BR ?? I am Todd Underwood from Google ?? that it's a very difficult for other researchers to use your data because it's not released in a sort of standard format, so I notice that you are taking the RIPE data and incorporating an different interface to it. Would it be possible to release the original data that you produce sod that other people could compare it to their results?
ANTONIO MARCOS MOREIRAS: I don't think that I understood your question. You mean the data we put in RDs or not?
AUDIENCE SPEAKER: A number of things. For example, the WHOIS data which would help people map sub prefixes to additional allocated entities. I think Brazil is one of the only places in the world where it's not possible to get bulk WHOIS access, is that still true? I thought it was?
ANTONIO MARCOS MOREIRAS: You are not talking about that.
AUDIENCE SPEAKER: I am talking about in general the notion of ??
ANTONIO MARCOS MOREIRAS: Sorry, my English is not very good. Please speak slowly and now I understood that you are not talking about that. I don't know about .BR or WHOIS, I am not in that department. Sorry.
AUDIENCE SPEAKER: Never mind. Thank you.
ANTONIO MARCOS MOREIRAS: But there is someone, maybe [Petara] is ?? yes, talk with [Petara].
CHAIR: Any other direct or not direct question?
ANTONIO MARCOS MOREIRAS: About the matter please.
CHAIR: Okay. Thanks a lot.
(Applause)
CHAIR: The next one is Tiziana Refice from Google /M?Labs or whatever direction you want.
TIZIANA REFICE: Hi everybody, I am Tiziana Refice from Google and I am here to talk to you about M?Lab, which stands for Measurement?Lab. So I am going to start with a list of questions that I am pretty sure you are very familiar with. And they are all about broadband. I am going to start with Internet users, what Internet ?? a problem that Internet users face daily, which ISPs they use, and once they have one they want to know what they are paying for. Then there is vigilators who want to know what the broadband status is in their countries, and finally there are researchers who are doing research on broadband and they need data to do all this. All these questions have something in common. They need open data about broadband and that's very difficult to find any.
So this was the reason that, about two years ago, brought a bunch of researchers and people from the industry all together in California in a workshop organised by wind surf, they were trying to find a solution to this problem and the answer to that solution was M?Lab. So what is M?Lab?
M?Lab's goal is to provide data, open data to Internet users, regulators and researchers. Open data worldwide and the way we do this, basically we provide three different pieces. We provide a server infrastructure that is used by researchers to deploy Internet measurement tools. These tools measure different things. They required to be open source and they are available to every Internet user to test different characteristic of the Internet connections and broadband connections. Whenever these tests run, they provide some information to the specific users about measurement data and we collect the same data in a central repository and we share all the data with with everybody. So this is extremely unique characteristic of this data set. And in particular, there are three unique characteristics of this data set that I want to share with you.
First of all, again the data is completely open. And is open and collated which means we use open source tools to collect this. You want to know what the data means, you can look at the data, you can look at the code and it is absolutely for free. So you don't have ?? it's really everybody, researchers, Internet users, he have been can look at the data without signing NDAs, without paying for anything. And this one part. The other part is about consistency. The platform is a worldwide flat form and is collected consistently worldwide.
The M?Lab team, I talk to different regulators or researchers, one of the biggest problems is that people keep collecting data about broadband everywhere. Different tools, different methodologies, different definitions of broadband and it's extremely difficult to compare those across different countries. We are trying to break this problem, providing a platform that spans across multiple countries and you can do ?? we have been collecting data for the last two years so you can also do trend analysis in the last two years.
Finally, we share all the data, everything, all the raw data, all the full TCP dumps with full addresses. I am not just talking about aggregated statistics; I am talking about the full data set. And this is really a unique case.
Before going any further, it's important to understand who is behind this. This is not just a Google project. There are many industries, many partners from the industry, many research groups working on this. Different partners contribute to the project in different ways. Some provide servers, some provide measurement tools, some analyses data and each one of them is extremely important and I hope that at the end of this meeting, I will have ?? I will be able to add a few more logos there, so please help me with that.
So now we can go into details. Let's start with the platform. The M?Lab platform is a set of service distributed worldwide. You can think of the platform something like a PlanetLab like platform with the difference that it is designed to provide, to allow accurate broadband measurement which means, for example, that we make sure that resources are reserved to specific tools, the processes are not preemptive and things like that. It's important to have accurate measurements. Also the platform feature a specific kernel implementation called Web 100 that allows to collect a full TCP state for every TCP connection on the platform. Those are two of just many characteristics of the platform.
Right now, we have servers in 18 distinct locations mostly covering North America, Europe, Australia, New Zealand and a new server is coming up in Japan but as you can see there are many spots over there that have no server. After yesterday's social, I believe we are going to have a server in Norway. Thanks so the social. And to the beer, I guess. And but we really need many more of those. So, please ?? I mean please come to me at the end of the presentation if you are interested in hosting a new node. But I'll talk to you more about why you should do that.
So, M?Lab again is a platform. A server platform. It's a set of of tools. Right now on the platform we are running nine different tools. They are all ?? they all have different characteristics. Some of them are wear tools, some are them are hardware based. The software based tools you can run through through browser, some of them through a common line. Some of them are embedded in other tools. For example there is a tool that's embedded in a [mutoring] client, that being a specific flavour of bit torrent and you can run a test to configure the client the first time you run a mew torrent client. That's just an example. Some other tools can be run on a mobile platform. Right now all the mobile tools run on trade, but we are working to put them on our US. Finally there are the hardware tools and you might recognise the SamKnows tool, we are basically providing the service side of the SamKnows experiment.
Then this is for the specific tools. Then all these tools are client server applications and they only run active measurement tests. This is important because we want to be able to share everything, all data. If we were doing passive measuring we wouldn't be able to do that. On top of that, these tools are open source which means that you can take the test, you can take a specific tool, service the side the clients side and you can customise it. That's what we have been doing, we have been working with several regulators and they basically have done it and I am going to show you few examples.
The tools running right now on M?Lab collect different statistics. It really depends on the tool. Some of them collect basic performance statistics, like through?put, bandwidth. Late events. Others something nor sophisticated than that. Other analyse ISP traffic management techniques, depacket inspection, traffic shaping. It really depends on a specific tool. The background of this slide is meant to show to you some very geeky characteristic of one of the of tools running on M?Lab. I am talking about a speed test light tool. You run the tool and you get as you see expect, download through?put, up load through?put, latency, jitter, so on. Then you can have a more geeky version of the results, you get 150 variables that give you all the details of a specific PT CP connection and the screen right now captures just a few of them.
So again, M?Lab is a platform. M?Lab is a set of tests and that is a positive data. Whenever you user runs a test, the test hits the closer server in the platform, runs a test, gets the results for the specific test and all the measurement data is also collected in central repository. Data, well with he collect all the raw data. And at this point, we have more than 400 terabytes of data. Right now we are currently serving about 150k tests per day and we have users from all over the world. In this map, you can see there is a bubble for each country. The size of a bubble is proportional to the number of users. Where this this case the user is identified by pH addresses, where in fact we have more than this.
And then of course sharing the data. That's the whole point of this exercise. So, we collect the data and we share the data in the raw format. You can get the old 400 terabytes for me. You don't have to tell me. It's there. There are slides to make your job easier but it's also of course a little work to manage 400 terabytes of data so we are working to provide different alternatives to analyse the data of the cloud. Right now, we are providing a tool that provides an as you like interface that allows you to raise SQL queries. In the near future we are going to release more tools to allow you to do some easy analysis of the data amount of
Having open data has been extremely powerful so far. And we have a few examples of being again extremely innovative in this field. First of all, with regulators. We started working with the FCC in the US and we are collaborating with them in different ways. First of all, they were looking ?? they were building a speed test kind of tool and decency tied to use M a lab, one of the tools running on M?Lab to power their test, their consumer brand test. All of the data collected by the FCC consumer broadband test is visualised in the broadband map. Plus in the last year they also support the new study using SamKnows and they were basically using, they collect ?? they distributed thousands of SamKnows boxes in the US, collect data and they wrote, based on that, the first FCC broadband report for the old whole of US the amazing thing is the report is based on open data which is really a unique case in the US. And well this is really one of the biggest results we are now using the M?Lab data.
One other interesting result is what we had in Greece and I am going to give you more details about this because I see that many regulators are interested in using this case. Yesterday we went, for example, to talk to the local regulators, RTR, they wanted to know a lot of details about this, so what happened with EE, it, it which is the Greek telecommunications regulator, specifically approached us, they were looking for a solution to do broadband measurement and we told them, you know, we have this platform and they tried it out. There are tests, they run some tests, they decide to deploy a server in Athens, so, all of a sudden, they have a platform for free except for the cost of the server and they start running it, they start collecting data. They found the data interesting, so they decide to build a map of all the data for the Greek customers, the Greek users, they are showing this data on their website. Finally they are going to use the data to inform decision to make, I mean, to make some policies on broadband and well that's the best example that we have seen to far. Building incrementally on top of the M?Lab platform where everything really, at least the platform the tools and the data is completely for free.
Open data of course empowers research. And we have had incredible results from the research community in the last two years. One here I have just a few examples. One example is from an MIT group from Dave Clarke, Steve Bower and others who have analysed one of the sub sets of the M?Lab data set and they found that many cases, problem with connections users have problems with connections, not because of the congestion in the network but because of the configuration of the clients. In this specific case because of the Receiver window and that you really can do that only if you have access to real data. Cannot just have a simulation model to figure that out. And then there are other examples coming for example from Georgia Tech where some researcher studied about traffic shaping and power boost and then finally a few months ago, they publish other researchers published some results about depack inspection. They are specially focused on the US and Europe. You can find all the details on the M?Lab website and on their website.
As an additional example, I built some visualisation on top of using the M?Lab data and the idea is just really to show the power. I wasn't really doing specific analgies. I want to do a demo here but due to some technical problem I won't be able to do that. You can find the information on that. I am going to show you some screen shots quickly just to see what you are able to see there.
In this case, you see here a map of the world. There is a bubble for each country, the colour represent download through?put, where red is high, blue is low. The size of the bubble is proportional to the number of users ?? of IP addresses. As you can see you can browse the data over time. There is a time bar. So you can see how this numbers change over time from beginning 2010 well until nowadays. And you can see, zoom in to a specific country, for example here in the US, but you can do the same thing in your own country up to the city level. Then the same chart allows different kinds of visualisation. For example, you can see how numbers change over time and there is also a little ?? I don't think I can ?? there is a little box on the left, top left of the chart, and these are also numbers broken down by ISPs, so just go there, see what's in your country, where your ISPs rank in your own country and you can play with that. Everything is open and on top of that, all the data is open so you can do this and you can do much more than this.
Well I don't think ?? I mean, download through?put is a very basic statistic. Based on the M?Lab data, you can also understand if a network is bottleneck ?? if a test is bottle?necked by the network or by for example, the client, the Receiver window. Well, I don't have time right now to explain to you everything, but I am more than happy to do some light demo at the end of the presentation.
So to conclude. . What's next? First of all, I need your help here. First of all we need more servers. So, whoever is interested in hosting servers, please come a talk to me afterwards. I am going to be here today and tomorrow. And especially now we are going to support SamKnows in European study for the next year, we need nor server, we are going to double our server platform in Europe. We really need your help there.
More tools: If you are building a new cool tool that measures something on the network, again, please come talk to us. We are interested in expanding our platform even more. And finally we have tonnes of data. And we are trying ?? we are looking for people, researchers and really everybody who wants to analyse it and in particular, Google is sponsoring this project with research grants. And well, also we always want more regulators and if you are a regulator, again please come talk to us.
Thank you.
(Applause)
CHAIR: Thanks. There might be a question or two. So who has a question?
AUDIENCE SPEAKER: Hi, thank you for sharing this great presentation.
CHAIR: Can you state your name?
AUDIENCE SPEAKER: Jan George: Other regulator is looking into finding something like this and probably I will try to connect with our regulator.
SPEAKER: That's beautiful. Thank you.
AUDIENCE SPEAKER: Secondly, about hosting the server. You were not very clear. Do you send the hardware and the host err just needs to put electricity and the connectivity or ??
TIZIANA REFICE: We have different models. The ideal situation that you provide both, all the servers and connectivity and power. But, we are willing to adapt to different situation. That's what we have been doing in the past. Different situations, different economies, we really want to expand this so we have a situation where we provide the servers and you provide connectivity. We are willing to compromise.
AUDIENCE SPEAKER: So, what is the usual motivation for a hoster to have your server in?
TIZIANA REFICE: To have data, to have a very insight perspective of your own users, to support the Internet, for the good of the Internet, transparency. I mean, there are ISPs who use this to their regular customers to understand what the problem is, we have seen for example that horizon US was providing links to the customer that were customer service to understand what the problem they were having. So, on their benefits really for the ISP and for the whole Internet.
AUDIENCE SPEAKER: There are two questions on Jabber that are kind of related. The first one is from James Blessing asking if you thought about integrating web atlas and the second question from Alexander is would Google and M?Lab sponsor web atlas?
TIZIANA REFICE: Let's start with a simple question. So we are talking ?? well, I have a meeting with Robert tomorrow morning, so definitely, and that's the first answer I guess that's satisfying.
The second one, as I guess as part of the conversation tomorrow morning I think we can address that point. I mean we always share results, everything is ?? everything is on the website. The website is not super pretty but you can find most information there and please come, at the end of the presentation come to me, I can give you my business card, you can just ask me directly questions. Come to us at the end of the presentation anyway. There is a colleague of mine, Meredith here, so it's not just us as in me plural. From our next partner from Norway, what what do you want to know?
AUDIENCE SPEAKER: Thank you, [] Ryanon Vincent from Alto box Norway. I was just wondering about a technical question actually because one of the challenges we were seeing as with the home ISP is the capability of measuring the speeds we are operating with now, we are actually operating typically 50 megabit. 100, 4 hundred megabits and there are many great speed tools out there but none of them are capable of doing these speeds. What can you comment on that? Can you comment on what speeds you are able to actually accurately ?? well semi?RBKI relevant measure?
TIZIANA REFICE: Right now, not all the countries have this incredible situation like Norway has, to be fair. So, we haven't tested the platform in this incredibly nice scenario. But, you know, since we are going to have a server there, what better situation than do a test run there, right? I mean, honestly, this is like ?? I will give you more details. I don't have details right now as tests again with such a high level speed. I have to say, these tests have been deployed in universities in the US. They have extremely high connectivity, high speeds. I mean, so they have been tested ?? they started in that kind of environment. I mean it's not the same thing. I don't think necessarily addresses your question, but they have been developed with that in mind.
AUDIENCE SPEAKER: Okay. We'll find out.
TIZIANA REFICE: Exactly. We'll find that out. I like that.
AUDIENCE SPEAKER: Antonio Marcos Moreiras using completely open source clients and servers in partnership with regulators that can eventually apply pen alternatives to office show testers, could not lead to a situation where someone could be tempted to fake the results and put the results in the system through ??
TIZIANA REFICE: So, right now we have two answers to that question. The first thing is that right now we are doing nothing. We understand the problem. There could be wide ?? the kind of solution is is that we are really keeping everything opening. There are people coping eyes on the data, we have contact with the ISPs. So right now our, the way to address this problem is to be completely open. But we understand that it is not enough. I mean as as the platform gross and when we are going to face more and more he have these problems, we are looking for different solutions to address this. Right now we don't have any. That's absolutely a valid concern.
AUDIENCE SPEAKER: There is another question here from Christian. I don't think that's you, Christian, right, on Jabber ?? asking what client technology for the browser has been chosen and why?
TIZIANA REFICE: So the client tools running, clients ?? can be [Rhino] browser, right now most of them are using Java and we are building ?? a researcher is building a new unusing Java script. I have to say this is not under M?Lab's control. What we do is retalk to researchers who build interesting tools, and that's really up to them. They made the choice. They made the choice about it the methodology. As long as the tools, all the tools are open source. Longs the community can comment on them, that's fine with us. It's not ?? we are not judging the work. We want to provide a platform to researchers. We want to provide data. We are not saying something is good, something is bad. As long as everything is completely open. Everybody can make judgement, can decide what they want to choose. That's why where he want to feature more tools. Not all of them are equally good in every scenario. You can choose.
AUDIENCE SPEAKER: Daniel Karrenberg in charge of RIPE Atlas. I just want in addition to Tiziana's answer I'd like to clarify a few things.
First of all, RIPE Atlas is not in the business of broadband measurement, so we are not there are for the last mile, for various reasons, but the most important reason is that there are actually already others in this space, M?Lab included. So that needs to be clear.
I think there is a lot of synergy in other areas and that's why we are talking, but we are not there to say hey, you know, let's move atlas into a broadband measurement thing. That's not the idea.
The second thing about the sponsorship. I like the question, but I'd also like to make clear that what we want to do with atlas is to remain as independent from specific big sponsors. So that if the gist of the question was why doesn't Google just pay for it. I don't think that would be the right solution. I think we have, with the RIPE NCC, we have an organisation that's funded by almost 8,000 ISPs and therefore, is quite independent and can act quite independently and if the suggestion was to look for a few large sponsors, that would be going in the wrong direction. That's all I wanted to same. Sorry it wasn't a question.
TIZIANA REFICE: Thanks for your clarification, they are extremely valid.
AUDIENCE SPEAKER: I want to make ?? a maybe I wasn't paying attention. I always start my questions like this. But did you mention with all the bandwidth requirements of your system if we were interested in hosting your service?
TIZIANA REFICE: We have all the requirements on the website. If you come to me I will send a you specific URL. Everything is a worldwide website. Begin again it's not super easy to navigate. Everything is there.
AUDIENCE SPEAKER: Last question for real: Do you have some sort of preference for the ones who can host your services, for example are you looking for some kind of profile for an ISP or somebody who has just a website or something like this? Do you understand my question?
TIZIANA REFICE: We have some specifics also that cover this in the document. The idea is is that ideally we would like to have a node in each ISP but since we are far from there, we are trying to start ?? I mean right now we accept everything because we really want to expand. If we had a choice in a specific location, we would rather choose a location that is ?? a network that is highly connected because this would provide for benefit for all the users in in that area. But really since we usually don't have a choice, we just take it whatever we have. I mean whatever is offered.
AUDIENCE SPEAKER: Okay. Thank you.
CHAIR: Thanks.
(Applause)
So, the next one is Martin Levy From Hurricane Electric, probably the only person who has more air miles than IPv6 addresses.
MARTIN LEVY: I'd love to talk about Hadoop, but let's get real. Martin Levy, Hurricane Electric. Got two sets of information for you, obviously about World v6 Day and some of the stats out of that.
But first, what I really want to do is just give you a little bit of insight into our use of, well one page insight into our use of the RIPE Atlas probes.
We have placed three of them in our network, one in Hong Kong, one in California and one in New York City, and it actually gave us a phenomenal insight. It just shows how much potential data you could look at day in and day out if you are trying to adjust a network. So here is a couple of things that I want to point out.
So, literally within days of putting the probe into Hong Kong, we realised that there was something definitely going wrong with with our connectivity to the M route server. I mean, obviously there is something going on here and this actually was an Anycast routing issue which was not, which was coming outsource of it. As you can tell with the appropriate number of e?mails over a few days due to time zones, you really can fix a problem and fix it very successfully.
But what's interesting, is that you get this visualisation. The brain can do stuff with graphs that you just can't do any other way. So here is M root in three different POPs, Hong Kong, California and New York. Here is the equivalent in v6. You will notice that there is actually some differences between these graphs. But what happens is that you start sort of going with e?mails backwards and forwards talking about where ?? in this case Anycasted servers are placed globally where you have Internet peering points that are disconnected between one provider and another geographically even though you are everywhere. So, you know, for example, here is a good example of finally realising that two providers are missing a particular city globally and significantly fixing the performance. There is the same thing going on here. This one is a little bit more complicated to work out what was going on but this one is again bringing up a peering point somewhere.
You'd love all these graphs to be down to the single digit milliseconds but I think the goal is not, as much as I'd like, to there is more complicated reasons why each and every route name server isn't instantaneously in sub ten milliseconds with every POP.
I wanted to give you one ?? you can definitely spend quite sometime looking at this data and it is valuable. These are RIPE Atlas probes sitting in POPs versus on broadband, but they are useful.
That's that page. Now we'll talk about v6.
So, some simple background but there is some curves here that you want to see. We are seeing obviously a lot of up take on the v6 from the routing point of view. If you actually look at the number of ASs, you look at the number of routes, look at where those ASs are, this came up in the IPv6 group, Working Group, but the bottom line is that yes there is an all of lot of the core that has enabled v6 much less on the edge, but I'm talking, when I say edge I am talking about at the single?homed or dual homed ASs versus lower case core in the middle. So these graphs are going up. The percentages are very impressive, but, and I try never to use but in a presentation, you will notice that there is this interesting sort of knee in the curve occurring here and you'll see it's sort of the same here in the prefixes, the ASNs is sort of more interesting to notice, and the conjecture has been and you can look at this various different ways, but there was a lot of people that got ready for World v6 Day at the beginning of June here and the reality is that there is a drop off after that. This is a natural tendency. People do this. There was a lot of effort. There was a lot of stuff done for World v6 Day. We'll see this in another graph. Just for your information, if you want to compare this with v4, this is v4 and if you actually look at the number of routes there it's pretty phenomenal but nothing is slowing down the number of new ASNs that are joining the routing table. I know Geoff has got graphs that put this together but I wanted to make sure we just looked at the v4 graph as well.
One more place where you see a knee in the curve. IANA runout actually got a lot of people earlier in the year interested in v6, so we saw this sort of change in slope. World v6 Day, everything ?? there is a slight drop off there and it seems that this is unfortunately about a month old graph. I didn't up load the latest one. And unfortunately the latest one sort of still shows a bit of a drop?off. I'd like to think it's due to vacation time in Europe but I think it's more complicated than that.
What happened in World v6 Day? We looked at various ways of measuring v6 traffic in the end we desigh that had World v6 Day was about websites and therefore, only flow data to port 80, port 443 should have any relevance to this measurement. We did measurements on what happened on DNS, we did measurements on other stuff. There is an interesting stat on what happened to net news and NTP traffic on that day. I'll talk about that in a moment.
So, if you look at purely flow data measured over our network, at 0 UTC, v6 comes alive. There is no ifs, ands or buts, it went from 0 to 100 miles an hour pretty instantaneously. By the time we finished on the day, if you look at the peak, you end up realising there is a 5 X jump in bandwidth from what what we were seeing before. After World v6 Day ended 24 hours later with the properties that stayed on the ones that did roll and stick where they actually stayed on afterwards, we see basically about a 3 X jump. Now this is, you know, this is great for the day. There is no doubt about it. And it probably can simply be, although the numbers are small, but the numbers are ?? it would be very hard to measure whether this is truly new traffic which it's unlikely to be but it's just movement from v4. Let's look at it on a longer basis, if you look at it over the months what you realise is that traffic is truly stayed there and what's nice about this graph, is that if you look at the ups and downs, you look at it on a daily or weekly basis that this is human driven traffic. This isn't stuff that's coming in machine to machine and if I may try and I have done this since June, to put a nail in the coffin on this, this is not ping and trace route traffic. This is real v6 traffic. Some of it ?? well the majority of it coming from Google video, that's been presented already. Prior to that we'd a lot of of Google map image data that showed up but there is other data in there as well. There is other data sources in there as well. But the fact that it stayed and continued on a sort of a human usage basis, then, you know, I think we can point to this in a good way.
Let's look into the data a little bit. Life gets a bit more complicated.
So, this is the same graph but now split between native v6 traffic and traffic that has originated at a 6to4 address or originated at a Teredo address. The Teredo traffic was amazingly low, would not show up on this graph. We can talk about 6to4. This is traffic that is being AAAAs are being given to end users, end users are enabled for 6to4 and they are using that. Now, there is some settle tee whether they should. There is been graphs put together by Google in regards to this. But this is measured at the backbone, at the relay level. So, one of the things that comes out of this is, again, the graph scale seems to match, it's definitely cause and effect started by World v6 Day. The relays definitely took a lot more traffic. We had to deploy a bunch more relays, we knew prior to this, and it hasn't gone away. It is definitely human created traffic and it's also traffic that, until we see more deployment at the end user level, we will continue to see the the 6to4, this transition traffic is still there.
On a lighter note. A few people asked me this so I have wanted to make sure people knew this. The question is: What was I doing? Because I have spent an all of number of years focusing on v6, as have a lot of other people, so what was I doing on this very important day? I took the day off. Put my vacation months and months ahead of time in full knowledge of what this day was. No ifs, ands and buts, and I have a plausible reason for that. My daughter graduated from high school and I have my priorities. So...
That should go to her, not me. Okay, but the reality is, there really wasn't that much to do. Being ahead, being ahead of this, as were other backbones, was a good thing, but let's go look at what we saw and what we realised. I have actually got the wrong slide up here. I had updated this slide. Never mind.
So, this has got some of the peering stuff but I'll talk about this in context.
MTU filtering, PMTU filters on ICMP, there was an interesting discussion earlier about Jumbo Frames in EIX. Forget that. In v6 world, ICMP filtering, if you are filtering PMTUs you will have problems with v6, no ifs, ands or buts. This was a problem for people up until days before World v6 Day. They were doing their testing, they didn't know why certainly things didn't work. Most of the time it's things like Teredo but it will cause them issues on their general web services. So the reality is that there is a lot more effort that needs to be done in this regard as convincing me. Interestingly enough, some of the fixes got in pretty quickly, within hours. The other issues were that people, if you look at the 6to4 traffic, people were still routing half?way around the globe for some of this stuff. You can try and fix this. You can't do much more.
The rest of the tough is about peering which is uninteresting to talk about there, so really, there were two other graphs in the presentation, I'll make sure they get put up on the PDF online. One was that there was an up take of about, well up to about 2 thousand requests for tunnel broker accounts on World v6 Day on normal sort of run raters about 2 to 300, so obviously there was some end user interest and then there was also a jump on the Alexa list, which is not too bad, not too bad for hosting companies but a lot of them turned that back off again and we haven't worked out why, even to this day.
That's it. Thanks very much. That's what I want to talk about.
(Applause)
CHAIR: Thanks for bringing us back in time again. Any questions? Comments? Robert?
AUDIENCE SPEAKER: Yes. Point of order. Robert [Kisteleki], RIPE NCC, point of order, I didn't pay the beers yesterday, he did.
More to the point, you mention path NTU as an existing problem that causes pain nowadays. I am wondering if you believe that this would be something that RIPE Atlas could measure or wanted to measure or do you believe that this is go away anyway? So... it should go away anyway?
MARTIN LEVY: Okay. Should it measure? Yes, it should measure, there is no reason not to. Because it's so easy to measure. Try putting a big packet through and see if you get a response or you get an ICMP request. That's easy, that gets you CPE level testing. Will this problem go away? No, it, it has been how many years since these specs were ?? I mean, the RFC is ten, eleven years, and we are still sitting here going please don't filter ICMP and v6. There are some interesting problems. The particular use case is that there is, this is a complicated conversation ?? the particular use case ??
AUDIENCE SPEAKER: You might want to take it off line.
MARTIN LEVY: This is going to be with us for years to come.
AUDIENCE SPEAKER: I think I heard your message. Thank you.
AUDIENCE SPEAKER: I think the lady ??
AUDIENCE SPEAKER: Jan, Google. First of all, thank you. I am very glad to hear your experience is similar to our one. On the World v6 Day, just one minor comment on MTU and I [see big] problem. I am probably too optimistic, but I hope that we also see some improvement because in v6, ICMP is dedicated ICMP type and it's not just on the reachable like in v4, because some vendors still recommending no ?? reachable on interfaces which actually breaks ??... on v4. Just a reminder in v6, just separate ICMP every time, you can filter unreachable but please don't ??
MARTIN LEVY: Absolutely. Proving again that this subject will not go away. It will take ??
AUDIENCE SPEAKER: Not immediately.
AUDIENCE SPEAKER: Patrick Gilmore, I want to say we are big supporters of v6 day, we worked very hard for it and everything, but I like data and you showed some data but all of your v6 graphs had nothing on the left side. The problem is, and you can say well it was five times, but if the five times went from 1 kilobit to 5 kilobits that could have been one extra user and it would be useful like if we knew whether this was sustained in real increase and again, not ?? the problem here is I am expecting it not to be a very large increase even though the magnitude of percentage was larger than expected, absolutely values of megabits or kilobytes would be small. I would like to see that data even if it's bad despite the fact we are supporting v6 day etc..
MARTIN LEVY: I learnt the trick of not putting the Y axis from Google. I still think it's a wise lesson.
We are routing tens of gigs of v6 traffic at the present moment in time and it's after World v6 Day. We have not made those graphs public predominantly because, and this is a simple fact that we also know at the moment, there are a finite numbers ?? there are a finite number of companies creating that traffic today, and therefore one can do some logical deductions, it's the usual sort of can't provide too much data on peering, can't definitely provide too much data on transit from customers. So, the lack of Y axis I understand, but if I give you, as easy answer as possible: It's tens of gigs of traffic. The second part is, the question that normally comes straight after that is well what is that as a percentage of v4? And the reality is that that as a percentage is still unbelievably small, we know that, but that is a point that ?? we just have to separate today because there is not equality between A and AAAA distribution on the number of sources out there.
AUDIENCE SPEAKER: To be honest, I am perfectly happy with tens of gigabits. There is not any single web server or any single you know group of end users that could have caused that so we saw an actual increase across the board which is usually, that matches what we saw.
MARTIN LEVY: Sorry, there is another set of graphs but it starts pulling out where we actually split it by continent but it absolutely is running over every continent, which I think, if I remember correctly, Europe winning the battle on that one.
CHAIR: Last question and then I got to interrupt because we are already running behind the time.
AUDIENCE SPEAKER: Benedict Stockebrand. I actually think we won't have that much problem with the packets simply because filtering them with IPv6 will hurt so badly people will fix these. It's not like v4 where you can sort of get away with it and blame others for causing the problems. They are hard to debug but at least we get so many problems we can pinpoint the problems and president the pressure on people to fix them. My personal opinion on that.
MARTIN LEVY: Okay. Well, I'll keep this down to one minute. I think you have got a very valid point. But you have to keep in mind one, I think, fundamental issue with v6. Is you want anybody coming into the v6 world, to work properly on day 0. You don't want somebody to be, okay, let's give this v6 thing a try, have it fail for some reason or other, an this is only one issue where it could fail, and then say, well forget this v6 thing, I have got other things I want to focus on. Even to respond to Patrick, even if it was a kilobit of traffic versus tens of gigs of traffic, that kilobit should be taken care of and routed properly and be nurtured as much as any other kilobit of traffic. You want the v6 experience to be good so that the consumers and the providers of data take it as a positive sign and move forward. We can get past these talks about just this little bit of traffic. We want to talk about this in general. That's all. Thank you.
(Applause)
CHAIR: So, the next presenter is Wolfgang from the NCC, talking about Hadoop.
WOLFGANG NAGELE: Good afternoon, this is one of those things that sounded like a good idea to do my last presentation with a live demo. I am not that sure any more. What I'm here today to share with you is with all the systems that we operate. We run into one problem continuously, and that is we do big data, and when we mean big data, we are talking terabytes of raw information here, and to give you a little bit of a picture about this is, top here is K?root which is pretty promptly known in this community, so that alone produces 1.5 terabytes of PCAP data every month. When we talk about processing this this is more in the area of 5 to 6 terabytes to actually crunch on CPUs. Then there is a variety of other systems here that we also operate in the same sort of ranges. You can extrapolate, we are talking about several terabytes every month.
Now, then the obvious question becomes, well this is DNS data, so there are various tools out there, why not just use them so I have on there, those, there are several others. The basic problem with them is that vertical scaling with those tools just works to a certain degree, and when we are talking about terabytes of data it certainly doesn't work any more. We have two options there: We can either use those tools, run them in parallel and do all of the merging of the results and that stuff ourselves, or we can actually use something that does that very well. And one of those things, open source, does it very well and has gotten quite some run words to recently is Apache Hadoop. Now for those of you not familiar with the matter, I'll briefly go over this. Apache Hadoop started out as HDFS and that was based on a white paper published back in 2003 by Google, about their Google file system. The essence of that file system is you split the two concerns of the registry in the actual data holding and what that gives you is you get a very simple to scale data model, or well, data storage.
Over time, this has evolved and Google had another thing that they called map reduce a couple of years later again published in a white paper and it fits really well into the whole already using a distributed system for the HDFS. So in order to make use of those otherwise quite idle CPO course on the HTFS site, we can use those and distribute computing on those, and distribute computing is hard. What's hard about it is first and foremost for programmers to wrap their mind around it. So one of the things that this white paper describes is a very simple programming pattern where you basically have two stages which can run independently on any number of systems and you as a programmer have to just think in that pattern and all the rest of the scaling is actually done by the underlying system.
Now, I refer here for more information on the matter to a book from O'Reilly. There is no much printout there so far on the topic. It's still quite a young technology, but there is lots and lots of information on the Internet.
So, coming back to our use case. So the base infrastructure is Apache Hadoop, but when we started to do this about a year ago, there was no PCAP library available to do this kind of computation in it. So we actually went ahead and we did some prototyping on this and we were quite pleased with the results. And since we are using this quite extensively in our own environment now we thought it would be good to share this with the wider community. So on this URL you can actually find the source of this and anybody can go away and play with it. It's licensed under the LGPL, and we are very happy to take patches on this.
And with that, I am going to try to give you a little bit of a live demo of the capabilities of this. What I'm going to do is I am going to use a sub?project of Hadoop called Hive which folds data warehouse on top of this. What you get is only have to run statements and they will automatically reduce into map reduce jobs and deliver the results. You don't have to think in that programming pattern any more.
I'm not sure how well you can read this. So, I have prepared a couple of statements here. Starting out basically, we are just going to create a table here within Hive and if you look at the statement and you are familiar with SQL, you will see this is similar to regular SQL, it has a few statements to give it better information on which Java class to say use to decode the information, the stream that it's going to get. Other than that this is pretty much standard SQL.
To have some data to crunch on, I am going to add a couple of partitions, each and every of those partition refers to the data set that I described earlier today in a DNS Working Group, about a traffic incident we had back in June. So one of the things that we had back in June was we wanted to know which of our Anycast nodes are affected by this traffic pattern and can we go and start to contact those ISPs. So, once we have this in here, once we have the raw PCAP, so you have to think all of those directories here from the partitions basically reference a directory in HDFS and there are several PCAPs in there. So if I now go away and I have prepared a select statement, the select statement is really ?? I think that's okay visible here. If you look at the last line here, you really see that select statement is standard SPL, so what I am trying to see is how many sources and how many queries did they send during that time period to our node in Tanzania? And this was the question that was sent during that incident and then just standard SQL to group I, order it and have the top query all the way up there and I limit to 25 just so that we don't get too much output. When I fired this query off is what happens is that Hive now does all of the ?? creates the map reduce jobs for you and thinks about which map reduce are going to be necessary to create this result for you.
Went we go here, this is the overview of the map reduce part of Hadoop running and you can see there is one job running right now and this need your help this panel we can follow the progress. And it's going through the map stage right now. This takes about a good hundred seconds and I also have also have [Ganglia] here to show you what you can see here is that the cluster gets under load when it actually does this. So basically it distributes all those tasks here.
Since I didn't want us to make ?? since I didn't want to wait around here for this result to come up. I ran this just before, an assure you however it works. So, what we can see is, oh, yeah, we actually received some of this traffic in Tanzania and we can also see that the large part of it is actually, mind you, this is limited, so that it doesn't show the full amounts of sources but it shows me the ones that were most severely affected and there is a considerable drop off. So what that gives us as an operator is we can very quickly go and basically contact those ISPs and see what's going on in their networks.
For the live demo part, that's all that I wanted to show you.
Coming back to my presentation, there are a few conclusions to draw out of that. Well, the number 1 is that this really works well at scale. So when we talk scale here, this is not something when you want to run something in parallel on two computers, or on two servers, it's really 100 plus CPU course. In our case we currently have a system with 16 data nodes running with a total of of 128 CPU course so we can run on 128 of them in parallel that that works really well when we feed in terabytes and terabytes of data. So you will also notice that the example that I gave took 200 seconds and we actually only crunched 500 meg bites, so this is something that lib raise can do in Ray less time so that is basically because there is a lot of overhead when you do smaller data sets. This is really interesting for large, large data sets.
There is a screen cast which I have the URL for here. Which explains how you can use this on Amazon EC 2, this offers Apache Hadoop in a cloud based fashion. And just do it right there. And this goes step by step on how to do this and that screen cast also shows how the scaling works really well because they were able to crunch for instance 80 gigabytes of data in less than three minutes. This is basically what I wanted to share with you and I'd be happy to take your questions on this now.
(Applause)
CHAIR: Comments? Questions?
TIZIANA REFICE: So, you describe all the nice things of using this technology. Which were the biggest problems, the biggest challenges?
WOLFGANG NAGELE: Well there are really two things. One of course there was no PCAP library out there at the time. There was quite trivial to solve if you look at the library we created. It's a couple of hundred lines in total. So it's not too hard. The biggest challenge is Apache Hadoop is still quite young in terms of technology, so it's using fast so in order to deploy this in a stable environment, you really need to understand how it works, so if you really want to go for this, first of all stick with what the suggestions are on the parameters, there is lots and lots of parameters that you can change in the setup and sometimes if you get one wrong, your results are considerably worse in terms of performance.
AUDIENCE SPEAKER: Thank you.
CHAIR: Thanks Wolfgang.
CHAIR: The next one is Vesna Manoijovic, which you might know as a trainer. So now she will tell us what she is actually doing now.
VESNA MANOIJOVIC: Hi, so I still work for the RIPE NCC. I just changed the departments and I have been speaking at the RIPE Meeting before but this is the first time that I actually gave a presentation at the MAT Working Group, so I am quite nervous.
So, I want to tell you three things today. I have only ten minutes, this is absolutely not enough for me, I am used to speaking like the whole day. So I want to give you a short status update.
Then, I want to reveal the mystery of what MCB stands for, and then I want to pose some questions to you.
The status report is actually already begin by Daniel in the NCC Services Working Group, and here is the link, so you can see all the details there. I want to only repeat it here very briefly, so that if you have any questions, then either me or some of the colleagues that are also here can give you some answers to it.
So, what happens since the previous RIPE Meeting? Well, in RIPE Stat we actually have these monthly demos and we report the progress in every demo, we have half an hour on WebEx, we get remote participation and questions and we present like the newest features there and the biggest achievement was that we published the mobile app for iPhone.
Then in DNSMON, we improved some of the timing parameters and we did some improvements on the back end also.
The RIPE Atlas is keeping us very busy, so we have more than 800 probes activity and you can see the map there. And we are introducing more measurement types, more maps via making, like, every month we actually have a goal to publish a new map. And we are busy with user defined measurements, but that's a presentation by Robert which is following so I won't talk more about that.
So, what is the strategy? Daniel also presented this, so this is a short outline of our strategy and we already received some feedback that this is nice as goals that we have set up, but what is the actual road map? What are the steps that we are going to take to achieve this? So we are working on this and we expect a lot of feedback and support from you. So please think about it and then I will pose some questions later on in the presentation.
The most involving project for me personally was to take this future of test traffic measurements project, which has been sadly neglected for quite a while, so we got back in contact with all the hosts, and we conducted a survey. We got quite a lot of responses which were surprising to us because people seemed to love this service still. So, we are planning to continue with with it, but since it's quite old project, let's say, it needs to evolve. It needs to change. So, together with the feedback that we collected of the the operators, we are going to create a plan and we only talked to people who were present at the RIPE Meeting but we will also organise another session for the remote participation quite soon and we will present, then, the results to the Working Group and ask more feedback from you.
So, that's what MCB stands for. So we have a new department that does community building. And the goal is to involve the users. So we want to do that in several ways, so we are going to make presentations to all kinds of community events, where the operators meet and we are going to go there and actually actively seek your feedback and ask you what is it that you want our tools to do for you. Then we will publish articles and we are actually present on all kinds of of mailing lists, social media and so on and we are bringing back this feedback to the developers, so we are kind of an interface between the users and the geeks that are sitting back in the office and programming all this and they don't want to bother to talk to anybody so they want us to go and talk to people and then translate that back to them. And we will also be making business case for the new services and the new tools that we will be building.
So that's who we are. You see me on the stage and Anne is over there. So, I know some of you from giving training courses to you or meeting you at the conferences where I was speaking for many, many years as representing RIPE NCC and Anne is quite new to the RIPE community, but she has been a programmer, she has been very active in the pearl community for many years and she also has the background and the education, which is business orientated, so we are hoping to be able to be this interface as I said between the developers and the users. On the other hand, we don't want to stand in between you and the developers. So, if you think that you can talk to them in their own language, that's great. So just do that and they are also here at the RIPE Meeting, they also subscribe to all the mailing lists and they are listening to your feedback, but we are trying to be an addition to this communication channel.
So what did we plan to do at the RIPE Meeting? And we actually did most of these things already. So, we have been mingling around, trying to meet a lot of you, and we were sitting in the info booth, tomorrow is actually the last day so in the coffee break you can come there and talk to us. We had a lot of meeting, participated in the live demo, the presentation is almost over, and there is maybe an opportunity for the lightning talk. And if not, then I will tell you everything about it now in one minute.
This is actually not a measurement. It's more like a rating system but it's my baby and I love to talk about it and I got the speaking slot, so that's it.
We have been busy with this. Actually we started ?? like the goal was from the training services to see where do we have to go to which country do we have to go to promote IPv6? So we wanted to somehow measure the deployment of v6 per country, but we only looked into the RIPE NCC Services. So, if an LIR, if the member of the RIPE NCC has IPv6 address space, if they have requested reverse delegation, if they have registered the Route?6 object and if they actually visible in RIS they get four stars and then we publish their name on the list of all the LIRs that have four stars. They get a T?shirt from us and we actually publish these results per country in these beautiful colours from red to green, and we have a competition also, who is going to guess when are we going to get to the 50% of all the LIRs who don't have v6. Well we are not there yet.
And we are going to make this into production service and making new features all the time. New maps and so on.
So now, finally, questions for you:
So this is what we want to hear from you and well we have the social event, so you can grab me and Anne and other people during the social and tell us everything about this, and we may remember or not, depending how late in the social that conversation happens to be. So the best thing is if you actually write to the mailing list. We can talk ?? you can also say that now to the microphone, it's going to be minuted, but the better way of communicating with the community is to actually post to the mailing list or alternatively, to write a comment on the RIPE Labs articles.
So the hot questions are: Well not the first one, because Robert is going to talk about that. The beta testers for the user defined measurements, but if you have any talks thoughts on how to integrate test measurements with atlas or you have some other brilliant idea on how we can develop the traffic measurements further, talk to us. We also value the use cases for the RIPE Stat, so that we can adjust the presentation of the RIPE Stat. We already published a video, and we want more use cases and if you come to us, we will give you a prize. And that is ?? well I forgot. It's there where I am sitting so it's either a mug or it's a fridge magnet, configurable IPv6 fridge magnet or stickers, we have all kind of goodies, depending how good your use case is.
And please visit the public demos. We put a lot of effort into it and they are announced everywhere. They are happening once a month. So I want to see at least five new faces there next time. Daniel also presented a proposal about the RIPE Atlas for every LIR, so only members of the RIPE NCC can actually say what they about it, do they think it's a good idea, so on. So you can comment on the members list, the NCC Services list and once we publish our road map, then we would like to get some comments from you on the list.
So, this is more links and questions and e?mail addresses to contact us. Thank you.
(Applause)
CHAIR: So, there is a little bit of a timing issue and we get kicked out because the BoF starts, I want to put the different questions from Robert and you together, because they are end of the day more or less the same questions, I guess. Thanks.
ROBERT KISTELEKI: My name is Robert Kisteleki, I am doing research in the RIPE NCC and this is pretty much the RIPE Atlas update despite what the title says but I am going to talk about what the title says.
Some of you may have noticed that we haven't been spamming you with mails about try this new feature, try that new feature inside Atlas the last couple of times but that was because we are preparing something bigger that's more or less the user defined measurements. We have been doing some background work to enable this, so we are actually in the process of cashing in on these ones so we would like to give this out to you and show this to you but in order to get there we had to do some background work and there are a couple of examples there. You can imagine if you are operating this distributed measurement system, there are a couple of things you actually have to do to get somewhere.
So, what we have done in the last couple of weeks or months, we have rolled out a set of new features for you. More build in ping destinations. If you publish in an Atlas probe it will do more ping measurements than before. Roughly twice as much or so. We also do trace route measurements to all of these destinations. We are experimenting with DNS Anycast checks. We are preparing ourselves for the user defined measurements.
More built in ping destinations. Virtually all route name servers are on a list as well as the those operating inside the Atlas structure so give you a beak on so you can detect problems or at least look at these results. We also do trace routes to these particular destinations with the idea that a historical record of all these trace routes is going to be useful especially if there is some kind of enter you want to identify and explain what is going on if my RT, it drops from this huge amount to that low amount or worse, from the low amount increases to some other things.
You can actually look into the data, or you will be able to look into that data very soon.
But this is something that's I think in some sense way more interesting, because as far as I know, no one has really produced these kind of maps before and this really, really explains and tries to stress the power of RIPE Atlas, not only in terms of ping measurements and trace route measurements because those could be considered similar builder, blue it is something different, this is DNS. So what you are looking at here is RIPE Atlas and all the nodes of RIPE Atlas that were alive, I think this is roughly a week ago or so. On the map so wherever they are geolocated on the map by the hosts. They are coloured which whichever instance of C route in this case they see if they do a DNS query. So there is this feature that you can ask hey, who are you? Please identify yourself and you get back a response. So we coloured all of these nodes based on who responded to this particular DNS query. Now, the one of the reasons why I chose C?root is because this very well illustrates that if you do your routing right, and maybe you are lucky in some sense, this is more or less as clear as it gets in terms of distribution. The guys at zero they actually have one node in LA, one in ?? that's the purpose one, one in Chicago, coloured red, one in Washington, that's the white one, I think in Frankfurt, that's the blue one, and something else, the yellow is Madrid. So, what you can see is pretty much what you would expect, that these probes that actually try to reach a node of C?root, they actually end up almost at the closest instance possible. This is a pretty clear picture.
Another example is I root. Colour and scheme is a bit different here because I root has way more nodes so what we used here is red ones, red colour actually means that those are European nodes, so if a probe is coloured red here, that means that whenever it did a DNS Anycast discovery, it ended up on a European node. The purple one is the US, and the blue one is Asia. So the colouring is more or less regional. I think yellow is Africa as well.
So what you can see on this one is most of the cases the probes really end up on the closest or a relatively close instance, but it's not always the case. The power of these maps is that you actually have a better insight of what is your, let's say, gravitational radius of newer Anycast instance if you have such. This is a feature this will be available inside a user defined measurements framework so if you are operating at Anycast DNS service, you can also get the power of this from 1,000 probes or 2,000 or 20,000 maybe.
Okay. So onto the juicy stuff. User defined measurements. We choose the strategy that we are going to start with something that is smaller than what you could imagine, if you wanted to do everything right from the start. So, what we have here is, we are going to allow the users of UDM to specify their own measurements and they are going to allow more and more functions and more and more features and more and more resources over time.
And all of this of course will tie into the credit system so the more difficult things you do the more expensive it will get.
This ?? let me walk you through the dialogue that we just came up with to specify measurements because this will really illustrate the point of how we are going to scale this up.
When you want to schedule a measurement, you will be obviously able to specify what kind of measurements you are going to run. At the beginning it's going to be ping on v4 and VIX, trace route and ?? DNS queries are in the making and maybe other types of measurements, SSL, http, you name it. Okay.
From where you want to do these measurements, originally or at the beginning we will say, we will select the probes for you, but later on we want to give you more control so we will be ?? you will be able to select from which region you want to measure or from which country or from which AS or from which prefix or even at the very end of of the line, we'll see if we get there, from which particular probe you want to measure, some of you have actually asked for that feature.
What is your target? That's pretty trivial. This is an interesting feature. Do you want this name to be resolved on the probes or centrally. There are different use cases for both so if you want to I don't know test [] /ABG a, you probably want to resolve on the probe but if you want to test your own service you are find with central resolution as well.
Controlling when and how long the measurement should run. That's easy, start as soon as you can, finish ?? don't even finish, don't ever finish, let it run or you can say start it tomorrow and finish one week later, that will be possible.
How many probes do you want to assign to this thing? Obviously the more probes you want the more expensive it's going to be, but you should have a basic idea of what you want. Now, we are talking about a distributed system where probes come and go, they go down, they come up, we have virtually no control of how many probes are actually active at any point in time therefore we introduced the concept of this is the number of probes you want to use, and if it falls below a certain limit, which is we call the minimum threshold, then do something, and that something is an action you will also be able to definend we are thinking in terms of actions like never mind, just go on I carry on and see what happens. Or you can say: Reschedule the measurement with 100 probes again, just like I said before, or, that's not useful any more, stop the measurement. So, there is an action that could happen if we fall throw a certain threshold.
Reporting frequency, currently we set it as soon as possible, but it may be that some people don't need that and it might be cheaper to say once a day. That's okay.
There are other parameters that we are thinking about, for example, with ping measurements we do them with a certain frequency but you can say, that's just too often, I don't need that much, it's okay if you do it less often. Or, please send me a mail if there is a configuration change because you had to reschedule the measurement or something like that. And at the very end, make this data explicitly public.
Okay. And this is what you get. This is an actual measurement that we ran on a pro ducks network using the UDM feature and I think the destination was atlas.ripe.net and we used ten probes or something like that. These are relatively simple. The top one just shows purely the RT, its, pure probe to the destination over time. You can imagine that graph is simple.
The middle one and the bottom one are those of you who know TTM or DNSMON, you can be familiar with this, this is a stacked plot of what the individual probes saw over time. So, the middle graphs is actually the RT, it, colour coded so if it's green, then it's very short, 10 milliseconds. If it's red then you were 500 or more away. Whereas the bottom graph actually shows the packet losses.
What this example illustrates is that probe number 3 on both graphs, it actually started way later than the others. That's okay. This is the distributing nature of how we are going to do measurements. If you have a stable set of big servers that you can rely on, then you will, unlikely to have these colours or they happen anyway, but in this case, this is just perfectly normal.
So what you can really see here is basically everything was fine apart from occasional hiccups that where some probes lost some packets, but generally it's all fine. Again, if you know how to input and read DNSMON graphs, then you will know that a vertical bar is a problem, a horizontal bar is not.
We are starting tests soon so if you are interested in becoming a beta tester, please drop a mail to that address and I am pretty sure we will receive more e?mails than we can handle so I make no guarantees that you will be included, but anyway, please tell us.
And then just to finish this off. Plan next steps. We haven't forgotten by the other features that you asked forks access to the raw data, APIs to interact with the system, automatic schedule and stop the measurements. Automatic alerts and notifications all of those things, they are coming.
With that, I am finished.
AUDIENCE SPEAKER: Richard Barnes, BBN: So I notice that there is the public check box to say if you want to make your measurement results public. I was going to suggest that we make the default for that checked.
ROBERT KISTELEKI: Defaults have huge power, we know that. Noted.
AUDIENCE SPEAKER: Thomas. I have a question regarding scaleability of the measurements. I mean, how many measurements are the probes able to do and how do you control or reserve the resources of the individual probes so that at the end of the day you do not get a carpet shot of the measurements?
ROBERT KISTELEKI: It's a fair question, we hear that from more and more host that is they actually ?? once this is rolled out, they want to have some kind of a control of how many measurements, or how many measurements their probes should be doing. We are thinking in terms of bandwidth limitations so you are saying I am dedicating 10 kilobits or 100 kilobits to my probe and if it's over that it shouldn't be involved in any more measurements. That is roughly what we are thinking about. Another line of thinking is that we would like to give you control of what kind of measurements can be run on your probe. For example, some people might want to opt out from http for various reasons but it's also true that the more types of measurements people allow for the community to schedule in such a system, the more useful the system it is. The way of thinking is more words to bandwidth limitations, but then again if you have ideas besides that or instead of that, what would be the preferred way, then we can work on that and build that in.
CHAIR: The next one was Martin.
AUDIENCE SPEAKER: Martin Levy, thank you, good stuff. Could you ?? I have got two questions, page 16. Is there any sort order 1 through 10 or, you know, and should there be?
ROBERT KISTELEKI: It's an interesting question, but these are probe IDs, we might come up with better names, maybe we will use or should use the names that the hosts actually gave. Maybe not ??
AUDIENCE SPEAKER: Anonymous 1 through anonymous 10 is fine.
ROBERT KISTELEKI: Short answer is networks these are random probe IDs.
AUDIENCE SPEAKER: But the actual data value here could be sorted based upon value?
ROBERT KISTELEKI: It could be. The question is what happens ??
MARTIN LEVY: That's fine. Now go back to the graphs ?? sorry, the maps for the route name servers, will it be the UDN mechanism that allows somebody to add let's say a ccTLD Anycasts named server or enterprises Anycast named server? What method does an operator of Anycast something else get access to this type of graph?
ROBERT KISTELEKI: I think that would actually be the power of Atlas, if you can do that, so that's what we are aiming at indeed.
MARTIN LEVY: Will this be A through M, will this be public?
ROBERT KISTELEKI: Yes, it is not public because it's only available on our development network but I think it will public within two weeks or so.
MARTIN LEVY: Public to everybody, not public to the operators of the DNS route main servers.
ROBERT KISTELEKI: That is the intention. This is whatever, it's v6 or v4. Whatever the probe gets. And you can ?? you will notice that there are a couple of shadows here, those didn't get a response. So, that happens also.
AUDIENCE SPEAKER: Sebastian. If you do global measurements, how are the probes selected? Is it some sort of algorithm, is it purely random? Will I probably end up with ten probes in the same city or...
ROBERT KISTELEKI: Yeah, good question, currently it is random. So, we'll look at how busy the probes are, in order not to over load them, but within the allowed parameters, it's pretty much random, but this is where the more control will be actually used for what you can say, I don't care about the US, I want to measure from Europe.
AUDIENCE SPEAKER: Then it would be nice I could check I want one from Europe and the US and you know ??
ROBERT KISTELEKI: Yes, I would consider that as an advanced level feature but yes, that's the way of thinking, yes.
AUDIENCE SPEAKER: Wolfgang, RIPE NCC, if you go back to the map about the Anycast deployment, so we already shared this as an early prototype with the other route server operators and being some of the systems of deployed for the longest time in global Anycast deployments, everybody agreed that this is the most comprehensive view of Anycast effectiveness that we have ever seen so far. So it's a very well received to actually see from real life data what are Anycast deployment effects are and also long term planning, where we should deploy next. So more a statement than a question.
ROBERT KISTELEKI: Thank you.
CHAIR: Last one please.
AUDIENCE SPEAKER: Ivan Beveridge. I am just wondering what the plans are with regard to the user defined tests and how many requests would get sort of run at the same time as in, you know, the same second potentially just for the potential for DOS, DDoS or rather that's worth considering?
ROBERT KISTELEKI: Absolutely. So, ultimately this is going to be controlled from the hosts point of view by the amount of credits you have, so if you run more probes you can do for measurements, that was the basic idea behind the whole project. From the hosts ?? from the ?? hosts' point of view who are running the probes, you don't want your network to be overloaded, that is the previous question was was pretty much the along the same lines, that you want to have some kind of control there, so do not allow too much because it's ultimately your network. But the power of the system comes from, if you participate, you get back something. If you want to get back something, you should participate. That is ?? and Daniel has a better answer.
DANIEL KARRENBERG: The question was more about regimenting towards the destination. That is also part of the architecture. The scheduling of the user defined measurements is actually a central process. It's sort of the few central process that is we have and there is in the design, a limiting possibility for per destination, so that it can not happen ?? this is there for two reasons: One is the DOS reason that was just mentioned, otherwise that this could be used at least for pretty small DOS attacks, but still.
And the other reason is that when something happens on NANOG and says blah is down, we don't want everybody and their mother do the user defined measurements and thereby adding to the problems or doing redundant measurements. So what the system will do, it doesn't right now but the architectural possibility is there is it's basically saying I am already running 15 measurements of google.com, I am not going to run the 51st one and I think does that answer the question?
AUDIENCE SPEAKER: Hi. My name is Ben Gordon. I asked the question for last December about tagging down time in Atlas. I didn't receive any ?? I just received one plus from another user. I would like to tag things, down times, stuff I know about but the obviously the probe doesn't know about that.
ROBERT KISTELEKI: I remember such a question like from a week or two something ??
AUDIENCE SPEAKER: It was in December last year.
ROBERT KISTELEKI: Apologies for not answering that. It is possible. We could deduct why the probe reconnected every now and then, so that is ?? we can tag all the ups and downs with that kind of information, I think that's useful. And there could be a couple of reasons why that actually happens. It could be a local network problem, it could be our network problem, it could be the network problem of our providers, so reasoning is a bit more difficult, but what we can definitely do is like yeah the probe itself was up and running, power was there and local Internet or something like that was there as opposed to the probe had to reboot for whatever reasons. And we have to see how that valuable it is really in real life.
AUDIENCE SPEAKER: Okay.
CHAIR: Thanks. There is one last point. Despite the fact that I am already overrunning 15 minutes and will never chair a session again because it was taken away from me. There is one additional point interest David and as it's a hot topic this week I thought he should have two minutes.
DAVID FREEDMAN: My name is David Freedman, I am from Claranet, some of you may remember me from such panel sessions as the RIPE Meeting panel on Monday in which I was sat with Malcolm Hutty, Steve Kent and Sandra Murphy, who were both representing the IETF. And there was a discussion which went something along the lines of: If there was an attack on the registry by a Government now, if somebody asks for registrations to be removed from the registry, what would happen, what would be the result of that? And really, what the implications on people that already filter? The counter?argument to this was, well, not a lot of filtering actually goes on. I have gone around and on the topic of RPKI asked people in crowded room put your hand up if you filter to your customers. I said keep your hands up if you filter to your peers, a lot of hands went down. Having spoken to Kevin from Database, he says he got a number of calls saying that because their resources aren't registered correctly in the RIPE database, they actually have issues with reachability. I thought we should be testing this especially in light of the results of the general meeting which I am not sure if many of you know where options A and B have been voted out which means that we are continuing with the RPKI activities as planned. My question was, how do we measure the reachability of a prefix that is considered rogue. I want to test this. And I want to test this by taking a prefix that isn't registered correctly, propagating it and seeing how far it will propagate, measuring quantitatively and qualitatively what the Internet experience is for such a prefix and how long it lasted from when the prefix comes becomes valid to the prefix becoming invalid. It's an experiment I wish to propose and I believe the infrastructure that we have here, the Atlas probes and the TTM can help tell us allot about it. In the interests of time to keep this short. This is just an idea. I think I am going to come back with a solid proposal for how we are going to proceed but if anybody wants to talk to me about it please approach me afterwards or if we have time for any questions now, please let me know.
WOLFGANG NAGELE: I did a presentation on the Plenary earlier this week, and since we got the agreement for the RIS resources we are able to carry this forward and carry on with this, we do currently announce from RIS anchor prefixes, one valid with a valid ROA and one with an actually broken ROA.
DAVID FREEDMAN: I am not talking about ROAs at all. I am talking about prefixes that don't have real objects that won't be built into filters. Prefixes that should not have any reachability. An contentious as it sounds, invalid prefixes for an RPSL standpoint.
RICHARD BARNES: I am interested in helping with this in any way I can. I was going to suggest with my Chair hat on, if you don't have a chance to get all the information you need, the map Working Group mailing list would be a good place.
DAVID FREEDMAN: Sure. I have spoken to a number of people in post the RIPE community and the security community outside, I have asked for some advice as to what the best way to go about doing this without creating too much disruption would be and I have had a lot of positive feedback, so watch this space. Thank you.
CHAIR: That brings us to the end of the MAT for this time. Thanks a lot for staying with us 20 minutes longer. In Russia I had 45 minutes longer.
There is a BoF going on for the IPv6 privacy issues, which is actually in the other room. So if you want to go there, this is your chance. Otherwise, see you at the social.