A Tutorial in multi-homing with BGP on a Cisco
! version 10.3 service timestamps debug uptime service password-encryption ! ! encrypts your passwords so that you can't tell what they are if someone ! does a "show config" command at your terminal when left unattended ! hostname Router-GW ! enable secret 5 $1$Wnbx$hq5a9foDayCB3VAuX3qB/. enable password 7 07052F585C081E160453 ! ! interface Ethernet0 ip address xxx.xxx.xxx.xxx 255.255.255.0 ! interface Serial0 description Internet Link to Sprint AS 1795 ip address 144.228.xxx.xxx 255.255.255.252 ! interface Serial1 description Internet Link to Net99 AS 4388 ip address 204.157.1.xxx 255.255.255.252 ! ! basic IGP config using OSPF ! router ospf 3830 redistribute static passive-interface Serial0 passive-interface Serial1 network 0.0.0.0 255.255.255.255 area 0 ! ! basic IGP config using RIP ! router rip network <local network number> default-metric 1 redistribute static ! router bgp 4992 redistribute static ! Sprint neighbor statement neighbor 144.228.xxx.xxx remote-as 1795 neighbor 144.228.xxx.xxx filter-list 1 out neighbor 144.228.xxx.xxx route-map set-sprint-weight in ! Net99 neighbor statement neighbor 204.157.1.xxx remote-as 4388 neighbor 204.157.1.xxx filter-list 1 out neighbor 204.157.1.xxx route-map set-net99-weight in ! ! It's a good idea to take any CIDR blocks you were allocated, and add the ! following statement. This will force any more specifics you might have, to ! be suppressed so that you do not inject more routes than you need to into ! the global routing table. ! aggregate-address 10.10.xxx.xxx 255.255.0.0 summary-only ! ip domain-name yourdomain.net ! ns.internic.net -- Just an Example ip name-server 198.41.0.4 ! ! This shouldn't be needed if you get full routes from both providers, ! but it will force an outside router to generate the ICMP messages, which will ! mean less CPU load for you. For overseas links, the delay may be ! substantial enough to improve turn-around on icmp destination and host ! unreachables by removing the default route (a.k.a. run defaultless) ! ip route 0.0.0.0 0.0.0.0 Serial0 ! ! just a static route example so there is something to distribute above ! ip route x.x.x.x x.x.x.x x.x.x.x ! ! Filter list to prevent you from redistributing routes from one provider ! to the other. ! ip as-path access-list 1 deny _4388_ ip as-path access-list 1 deny _1795_ ip as-path access-list 1 permit .* ! ! Filter list used in route-map set-sprint-weight below... ! ip as-path access-list 10 permit _690_ (ANS) ip as-path access-list 10 permit _701_ (Alternet) ! ! Filter-list used in route-map set-net99-weight below... ! ip as-path access-list 11 permit _3561_ (MCI) ! ! here we go...set the weights on all AS Path's matching the regular ! expressions listed in as-path access-list 10 ! The 'permit 10' is only a identifier which specifies the order in which it ! should be processed. If you have a second route-map, and you want it ! executed after the first one, simply use a permit 20 (for example) ! NOTE: This translates to "weight all ANS and Alternet routes with a 10 ! that we received from this peer" ! route-map set-sprint-weight permit 10 match as-path 10 set weight 10 ! ! same as above, except as-path access-list 11 ! NOTE: This translates to "weight all MCI routes with a 10 that we ! received from this peer" ! route-map set-net99-weight permit 10 match as-path 11 set weight 10 ! snmp-server community public RO ! line con 0 line aux 0 line vty 0 password 7 15202A0F0F27232566 login line vty 1 password 7 073D004F4504110459 login line vty 2 password 7 003632050F56030741 login line vty 3 password 7 097E6F0A12081F1345 login line vty 4 password 7 02342558000B072002 login ! end
Side note: an IGP would be a local network protocol such as RIP, OSPF, IGRP, EIGRP, or IS-IS. The use of these protocols, and the design of your IGP is discussed in a different document, "Choosing and desiging the IGP for an ISP".
It is also really bad to redistribute the routes you receive from one peer to another, e.g., Sprint to MCI and visa versa. There are two methods (filter-lists or distribute-lists) and two different ways to implement each method (inclusive or exclusive). You may use one or the other or both depending on the rest of the configuration, and how much you trust yourself.
router>show ip bgp xx.xx.xx.x
BGP routing table entry for xx.xx.0.0 255.255.0.0, version 244607
Paths: (2 available, best #2, advertised over IBGP, EBGP)
*1 1239 1740 31
198.32.136.11 from 198.32.136.11 (144.228.105.1)
Origin IGP, metric 21, weight 2, valid, external
*2 3561 2150 226 31
198.32.136.12 from 198.32.136.12 (204.70.7.57)
Origin IGP, weight 5, valid, external, best
The line with the * in front of it is the AS-PATH. That is, the
exact Autonomous Systems that this route has passed through before
you received it. In this case, it looks something like:*1 Sprint CerfNet Caltech
*2 MCI CSUnet LosNettos Caltech
Note how the end point is Caltech on both sides. Caltech is dual homed with CERFnet, and Los Nettos. This is what NSP's on the other side of both your providers will see when they check their BGP entries. There are a few situations where this is not the case, however.
If you get an assignment from an upstream provider, they are most likey proxy-aggregating an entire /16 or /15 prefix. This means that while they will show your prefix in their tables, the announcement will be suppressed in favor of their own, larger block (or smaller prefix, depending on how you look at it).
This has a curious effect on traffic flow. Cisco's will always prefer a more specific route, that is, a longer prefix over a shorter one. This means that as a general rule, most traffic will prefer the link from the provider that did NOT assign you the blocks to route traffic to you, unless your provider that assigned the blocks has allowed your specific network announcement to pass through their summarization (either done with a suppress-map statement, or an unsuppress-map statment).
The drawback in this design is that if you have a third link to the Internet, *and* you inject those routes into this router, you will send those routes to both Sprint as well as Net99. To prevent this, simply add a line in the first access-list to the effect of 'ip as-path access-list 1 deny _174_'. (This example uses PSI). Unfortunately, if you just type this into the config, it will be added after the permit .* statement, which is not what you want, so you'll have to 'no ip access-list 1' and then re-add the deny statements, put in your new deny statements, and then add the permit line last. If you think you will forget this step when you add that third link, then skip this and use inclusive filtering. It's better that you don't make any mistakes!
Your new list would look something like:
ip as-path access-list 1 deny _4388_ ip as-path access-list 1 deny _1795_ ip as-path access-list 1 deny _174_ ip as-path access-list 1 permit .*The other model is inclusive. Inclusive actually looks simpler. This statement will not redistribute anything with an AS-PATH. Period. Since your local routes don't have a path, it will forward those.
ip as-path access-list 1 deny ^$
That's it. However, if you have a customer with their own AS (ASN 6000 in the below example), you will probably want to advertise their routes, so you'll need to "include" them in your access-list, hence the "inclusiveness" of the design.
ip as-path access-list 1 deny ^$ ip as-path access-list 1 permit ^6000_This is a little more work when you add customers that run BGP, but you may not do this very often if you do not sell to resellers, and do not sell any exceptionally large institutions that need to be dual-homed. You should generally not run bgp with any customers that are not dual-homed unless they have an exceptionally good reason.
One advantage is that it allows you to see what your customers are sending you before you redistribute their routes, allowing you more protection.
The first thing you have to do is figure out the syntax for access-lists. This is non-trivial, but if you know what a netmask is, and turn it inside out, you should be able to stumble through it.
If your goal is to provide strict outbound filtering, set up an access-list that contains only the networks that you have. Add them into an access-list with permit statments, and deny everything else. Set your peer to use the access-list by doing a:
neigh 144.228.xx.xx distribute-list 50 outThis example will use your access-list 50 in your config.
That's what the ultra-paranoid mode looks like, but if you are doing lot's of business (which we hope you are, that way you'll be back for more tutorials), then you may want not want the above option. There are still reasons to use an access-list, however.
If you have an 'aggregate-address 207.200.0.0 255.255.0.0 summary-only' in your bgp section, in order to announce that route, you need to actually have a route in the bgp table. This will occur either to receiving a route in that prefix from another bgp peer, or by redistributing one or more protocols into bgp. If you have an ethernet address in this block, you may be accomplishing this by simply adding a 'redistribute connected' in your bgp section, which will redistribute *all* connected networks.
In the above cisco config, this will actually accomplish the following things, assuming an ethernet address of 207.200.50.1, you will successfully "kick-start" the bgp announcement, but you will also redistribute 204.157.1.xx, and 144.228.xx.xx as originating from your own autonomous system! Most providers are using some intelligent filtering to prevent this from actually causing problems, but it doesn't hurt to filter these out, and it's real easy besides.
neighbor 204.157.1.xx remote-as 4388 neighbor 204.157.1.xx filter-list 1 out neighbor 204.157.1.xx distribute-list 50 out neighbor 144.228.xx.xx remote-as 1795 neighbor 144.228.xx.xx filter-list 1 out neighbor 144.228.xx.xx distribute-list 50 out ip as-path access-list 1 deny _4388_ ip as-path access-list 1 deny _1795_ ip as-path access-list 1 permit .* access-list 50 deny 204.157.1.0 0.0.0.255 access-list 50 deny 144.228.0.0 0.0.255.255 access-list 50 permit 0.0.0.0 255.255.255.0This will have the effect of not redistributing any routes in 204.157.1.*, a Net99 Class C just for serial links, and the Sprint Class B, 144.228.*.*, which is used for their entire backbone. I have also included the exlusive as-path filters so as not to redistribute bgp obtained routes to the other peer, merely for consistencies sake.
Comments on the load balancing:
In this scenario, we are only weighting three NSP's, plus two by default, and the rest by pot luck (an American term meaning "whatever we get.") Here's how the Cisco determines which of the routes it has received via BGP it should prefer for traffic. Comments are in square brackets. One important note here is that weights affect only affect the router you apply them to, local preferences affect every router in your network that you run BGP to (assuming you are using the same ASN). For this reason, be careful with local preferences. I don't believe this bit of information is in the Cisco documentation, despite looking several times. I had to learn it the hard way. A more detailed look at the use of route-maps, weights, and local preferences may be found in the document "A Cisco Implentation of route-maps for the NSP."
NOTE: Parts of this are taken directly from the Cisco manual.
The BGP process selects a single autonomous system path to use and to pass along to other BGP-speaking routers. Cisco's BGP implementation has a reasonable set of factory defaults that can be overridden by administrative weights. The algorithm for path selection is as follows:
This is key. Since we don't weight Sprint or Net99 routes (a match clause would automatically match everything we here from them, not just their routes and their customers routes) we rely on the weights and local preferences being equivalent, so that the AS-PATH will automatically be shortest to get to the respective provider and their downstream customers.
I hope you have found this tutorial useful. If you have any questions, or comments, please email them to dsiegel@rtd.com, as I am always looking for ways to expand this document. Who knows, maybe it will be a chapter in my next book. ;-)
Copyright (c) 1995 Dave Siegel
Copyright (c) 1995 RTD Systems & Networking, Inc.
Redistribution of this document in all or in part without explicit permision is strictly prohibited.
ii. License i. Intro I. Concepts A. The Routing table vs. the BGP table B. Redistribution C. Controlling advertisements either by controlling redistributed IGP's, or filtering outbound BGP announcements. D. BGP characteristical redistribution E. Selection of routes -- weight, local-pref, AS-PATH F. CIDR & Aggregation 1. HowTo 2. redistribution II. distribute-lists III. filter-lists IV. Pro's and Con's of using filter-lists vs. distribute-lists V. Miscellaneous Options A. BGP Communities B. next-hop-self VI. Sample configuration VII. Glossaryii. License
This document is intended to be distributed by NAP/MAE Project Coordinators. Distribution under these circumstances is provided by Dave Siegel and RTD Systems & Networking, Inc. free of charge. If these circumstances are not met, the document may be purchased for $100 by emailing dsiegel@rtd.com.
[Note: I got informal permission to redistribute this document, although David warned me that it is still under development and incomplete].
i. Introduction
When participanting in a public exchange point, it is important to be aware of all the aspects involved in the exchange of routes and that they have the desired effects on routing. There are a few concepts in higher levels of the Internet known as peering and transit, and just peering. The distinction between the two is quite important.
When you purchase a T1 from a national provider, you expect to be able to point a default route at them, and have your routes announced such that you will have full Internet connectivity despite the fact that you maintain only a single Internet connection. This is known as transit. Another version of this is where you receive full routing tables from this provider, and you announce your own routes via BGP, which would be more appropriately referred to as peering and transit.
If you were to simply "peer" with another provider, be they national or local, your are getting something quite different. First off, the term peer is used in two ways, the first of which is simply a technical term referring to a BGP neighbor, or a router you have an established BGP [peering] session with. Unlike many IGP's which broadcast their routing information, BGP uses a tcp connection in order to exchange routing information, and each connection must be explicity configured from both sides. Technically speaking, a BGP peer refers to the single point where the session is set up, whereas in a broader sense, it refers to the entirety of both networks being peered, as well as a difference in the way the relationship between both companies is viewed. Peering only implies that each network has access and permission to access the other network and their clients. Defaulting to a peer is considered impolite, and is technically considered stealing [bandwidth].
If the financial impact was not obvious at this point, peering is usually given for free, where as transit is sold for a fee. Despite the fact that peering is almost always given away for free, it is many times not in the best interest of another provider to peer with you. This may be because the reachability of your network is of little concern to them, and their's may be of great concern to you. While paying for this priveledge is certainly a possibility, the usual course is to not obtain peering at all. An example of this relationship would be a typical Mom&Pop ISP attempting to peer with Sprintlink. This displays an uneven relationship where the smaller network is in every position to be a client, and there is no reason why the larger entity should give away service.
The notes below explain configuration of a Cisco router for the purpose of integrating local peering sessions (local ISP's -- competitors) with a transit provider (Sprint, MCI, UUnet, Net99, or other).
I. Concepts
When dealing with BGP, there are two distinct routing tables that you should be aware of, the main routing table, and the BGP table. The routing table consists of the chosen best route to reach a network for any routing protocol, be that RIP, OSPF, IGRP, static, connected, or BGP derived networks. The BGP table consists of routes received only via BGP, and may contain several duplicate ways to reach the same network (henceforth refered to as prefix). As updates are recieved via BGP, the best route to a prefix may change, and this change will be reflected in the main routing table. You can analyze the different routing tables by using the commands 'show ip route' and 'show ip bgp'.
As these tables being logically seperate, it follows that just because the route is in the main routing table does not mean that it will be in the BGP table, and hence will not be advertised via BGP. In order for the desired routes to appear in the BGP table, they must be redistributed into BGP. This is performed by selecting an IGP to be redistributed. It is not uncommon to redistribute RIP or OSPF into BGP, however, this is considered quite dangerous, as it does not afford a great deal of control. It is much safer to redistribute static routes, as they must be manually entered into the router in order to be advertised. Routes from IGP's can be received from may places, such as customers (though usually on accident). If you make a typo in your configuration, at the very least you should be able to blame it on yourself, and not have to make excuses for your clients. This can get exceptionally embarrasing, particularly if you were to, say, accidentally break routing to netscape, or some other equally known or important network, for parts of the Internet.
Redistribution is done in the router bgp <asn> section of your router config, and implemented using the commands 'redistribute static', 'redistribute rip', or 'redistribute ospf <process number>', for example.
If you insist on redistributing an IGP, you will at the very least wish to filter your announcements after they have been inserted into the BGP table, but before they are advertised to a peer. This can be done by using distribute-lists. A distribute-list is comprised of a statement after the bgp peer statement, such as 'neighbor <neighbor ip address> distribute-list 30 out' where 30 is the number of an access-list which restricts the ranges of IP addresses that are distributed. Note that just because this list does not mean that you will announce these ranges of IP addresses, only that you will *permit* prefix's in this range to be announced. If the prefix is not in the BGP table it will not be advertised. This may be a result of the prefix either not being in the routing table or not being redistributed, or both. Distribute-lists are discussed in more detail in section II.
A characteristic of BGP is that all routes received from an external peer (EBGP) (a neighbor with an Autonomous System Number ('ASN') different than that of your own) are advertised to other EBGP peers and iBGP peers (a neighbor with the same ASN, or internal). If you have a distribute-list in place for all peers, you should be impervious to this behavior, howver, you can also filter the information received from other peers through the use of a filter-list, which is discussed in section III.
In order to predict how your router will behave when you recieve routes for the same prefix, there is a complex set of relationships you should understand, as well as know a bit about BGP itself.
Unlike RIP, which computes a metric or distance based on the number of routers the announcement transverses on it's way to get to you, BGP determines the distance based on number of AS's the prefix passes on it's way to you. Each AS may contain many router hops. It stores the ASN for each AS it passes through in the form of an AS-PATH. All else being equal, the prefix with the shortest AS-PATH will be the preferred route. Naturally, the Cisco has many overrides for this behaviour by using weights or preferences. These weights or preferences can either be applied to an entire peer by using a simple statement against the neighbor in the bgp section, such as 'neighbor <ip address> weight 50', or by matching regular expressions for paths, and using a route-map, such as 'neighbor <ip address> route-map ANS-pref in' where the route map would look like
route-map ANS-pref permit 10 match as-path 55 set weight 10 route-map ANS-pref permit 20 match as-path 100 set weight 5 ip as-path access-list 55 permit _690_ ip as-path access-list 100 permit .*For the purposes of a typical NAP configuration, where you are connected to only a single NAP, weighting should not be necessary, and the shortest path to your peers will always be prefered given that the AS-PATH will only contain one hop, and anything heard from anywhere else will contain at least one additional hop. If this is not the behaviour you are seeing, then you either have another provider advertising a route they shouldn't, or you have received the route via an IGP (on accident, presumably). It is generally a good idea to passive-interface your NAP interface for all IGP's you run, as you do not particularly care to advertise or receive routes in an unmanagable form.
Weights, local-preferences, and the use of route-maps is discussed extensively in another paper.
While CIDR has been around for several years now, it should probably be given some background in case you haven't worked with it extensively in the past. CIDR allows you to take consequetive prefix's (usually Class C's) and advertise them as a single prefix. The easiest way to "aggregate" a group of prefix's is to use the aggregate-address command in the bgp section. A simple 'agg <beginning network address> <supernet mask>' will take care of it. This probably won't perform as expected, though, because while your Cisco will announce the aggregate, it will also advertise the more specific routes as well. In order to fix this, add a "summary-only" switch to the end of the aggregate-address command.
An interesting side affect of adding this command is that you will see this prefix show up in the BGP table as being routed to Null0, but it may not be advertised. This is one of a few exceptions where a prefix showing up in BGP table will not be advertised to it's peers. What is needed is what I refer to as a way to "kick start" the announcement. You must have a route inside that CIDR block in your main table, and redistributed into BGP. I typically take the entire aggregate route, and statically route it to Null0. This has at least one nice side effect which is that you can garantee that you will be announcing this aggregate to all of your peers given that your router and link to them is still up, regardless of anything happening inside your network, such as router reloads, down circuits, or IGP screwups. Your neighbors will appreciate that you do not send them frequent BGP updates, known as flaps. Frequent flaps can cause your AS to be dampened, but even worse, flaps have a horrid affect on the Internet, as they chew router CPU, and can even bring a router to it's knees forcing a reload if route-flaps are received in high enough succession. Statically routing large blocks of network allocations to Null0 is often referred to as "nailing down" announcements.
As a side note, if you need help figuring out what the supernet's are for chunks of IP addresses, the aggis perl script works quite nicely, and can be obtained from ftp://ftp.rtd.com/pub/networking/aggis. It's use is simple. "aggis -d 205.199.184 8" where 8 specifies the number of networks you want in the block, and the 205.199.184 is the starting point.
8 Class C nets starting at 205.199.184 can be represented as:
205.199.184/21(255.255.248) ( 8 nets: 205.199.184 - 205.199.191 )
To configure this in your router, you would use the line:
agg 205.199.184.0 255.255.248.0 summ
and then to nail it down:
ip route 205.199.184.0 255.255.248.0 null0 NOTE: Please don't try this on your router. I used an RTD block as an example. ;-)
II. distribute-lists
III. filter-lists
IV. Pros and Cons for distribute-lists vs. filter-lists.
I personally like using a combination of both, however they can be effective used by themselves. The choice depends a lot on what kind of clients you typically provide services to. For many ISP's, a distribute-list should be sufficient, and provide good protection against accidental redistribution. They do not tend to work well if you have clients that run BGP with you, as you have to manually update your access-list each time they want to announce a new network. While this is nice from a security standpoint, they may not appreciate it a great deal. If you obtain a number of smaller blocks, you also may find yourself updating frequently, in which case it will get annoying, particularly when you forget to update your access-list, and spend half the day debugging the wrong problem.
Filter lists can be fairly simple to implement, particularly if you have no BGP speaking clients, or very few of them. They are safe as long as you are careful and aware of how your local network impacts your announcements.
V. Misc. Options
A. Communities
B. next-hop-self
VI. Sample configurations
router bgp 101 no sync aggregate-address 215.199.32.0 255.255.224.0 summary-only neighbor 198.32.230.10 remote-as 200 neighbor 198.32.230.10 distribute-list 2 out neighbor 198.32.230.11 remote-as 300 neighbor 198.32.230.11 distribute-list 2 out redistribute static access-list 2 permit 215.199.32.0 0.0.255.255 ip route 215.199.32.0 255.255.224.0 null0VII. Glossary
aggregate: To combine something, in this case, routing advertisements
AS: Autonomous System. Usually the same network, though many larger networks are seperated into several Autonomous Systems, and utilize different ASN's.
ASN: Autonomous System Number
BGP: Border Gateway Protocol
CIDR: Classless Inter-Domain Routing
Cisco: If you don't know what a Cisco is, you're reading the wrong tutorial.
iBGP: Internal Peering session
IGP: Internal Gateway Protocol, such as RIP, OSPF, or IGRP (static counts, too)
EBGP: External Peering session
EGP: External Gateway Protocol, such as EGP or BGP.
peer: a bgp configuration that establishes another router that you would exchange routes.
prefix: A generic classification referring to a network or group of networks that appear as a single network entry.