top of page

Craft, activity and play ideas

Public·29 members
Rezo Frolov
Rezo Frolov

Extra Quality Download Satisfactory V0.5.2.1 OnLine


  • Abstract:Existing peer-to-peer systems rely on overlay network protocols forobject storage and retrieval and message routing. These overlayprotocols can be broadly classified as structured and unstructured -structured overlays impose constraints on the network topology forefficient object discovery, while unstructured overlays organize nodesin a random graph topology that is arguably more resilient to peerpopulation transiency. There is an ongoing discussion on the pros andcons of both approaches. This paper contributes to the discussion amultiple-site, measurement-based study of two operational andwidely-deployed file-sharing systems. The two protocols are evaluatedin terms of resilience, message overhead, and query performance. Wevalidate our findings and further extend our conclusions throughdetailed analysis and simulation experiments. 1 IntroductionPeer-to-peer Internet applications for data sharing have gained inpopularity over the last few years to become one of today's mainsources of Internettraffic [31,12]. Their peer-to-peerapproach has been proposed as the underlying model for a wide varietyof applications, from storage systems and cooperative contentdistribution to Web caching and communicationinfrastructures. Existing peer-to-peer systems rely on overlay networkprotocols for object storage/retrieval and message routing. Theseoverlay protocols can be classified broadly as either structured orunstructured based on the constraints imposed on how peers areorganized and where stored objects are kept. The research communitycontinues to debate the pros and cons of these alternativeapproaches [5]. This paper contributes tothis discussion the first multi-site, measurement based study of twooperational and widely deployed P2P file-sharing systems.Most P2P systems in use today [8,13] adoptfully distributed and largely unstructured overlays. In such unstructured systems there are few constraints on the overlayconstruction and data placement: peers set up overlay connections to a(mostly) arbitrary set of other peers they know, and shared objectscan be placed at any node in the system. While the resulting randomoverlay structures and data distributions may provide high resilienceto the degrees of transiency (i.e., churn) found in peer populations,they limit clients to nearly ``blind'' searches, using either floodingor random walks to cover a large number of peers.Structured, or DHT (Distributed Hash Table)-basedprotocols [28,33,36,25],on the other hand, reduce the cost of searches by constraining boththe overlay structure and the placement of data - data objects andnodes are assigned unique identifiers or keys, and queries are routedbased on the searched object keys to the node responsible for keepingthe object (or a pointer to it). Although the resulting overlayprovides efficient support for exact-match queries (normally in), this may come at a hefty price in terms of churnresilience, and the systems' ability to exploit node heterogeneity andefficiently support complex queries.This paper reports on a detailed, measurement-based study of twooperational file-sharing systems - the unstructuredGnutella [8] network, and the structuredOvernet [23] network. In a closely related effort,Castro et al. [5] presents a simulation-based,detailed comparison of both approaches using traces of Gnutella nodesarrival and departures [30]. Our study complementstheir work, focusing on the characterization - not comparison- of two operational instances of these approaches in terms ofresilience, query and control message overhead, query performance, andload balancing.Some highlights of our measurement results include:Both systems are efficient in terms of control traffic (bandwidth) overhead under churn. In particular, Overnet peers havesurprisingly small demands on bandwidth.

  • While both systems offer good performance for exact-match queries of popular objects, Overnet surprisingly yields almost twicethe success rate of Gnutella (97.4%/53.2%) when querying for a setof shared objects extracted from a Gnutella client.

  • Both systems support fast keywordsearches. Flooding in Gnutella guarantees fast query replies,especially for highly popular keywords, while Overnet successfullyhandles keyword searches by leveraging its DHT structure.

  • Overnet does an excellent job at balancing search load; even peersresponsible for the most popular keywords consume only 1.5x morebandwidth than that of the average peer.

We validate our findings and further extend our conclusions(Sections 7 and 8) throughadditional measurements as well as detailed analysis and simulationexperiments. The measurement and characterization of the two large,operational P2P systems presented in this paper will shed light on theadvantages/disadvantages of each overlay approach and provide usefulinsights for the design and implementation of new overlay systems.After providing some background on unstructured and structured P2Pnetworks in general and on the Gnutella and Overnet systems inparticular, we describe our measurement goals and methodology inSection 3. Sections 4-6present and analyze our measurement results from bothsystems. Section 9 discusses related work. We concludein Section 10.2 BackgroundThis section gives a brief overview of general unstructured andstructured P2P networks and the deployed systems measured in our study- Gnutella and Overnet.2.1 The Gnutella ProtocolIn unstructured peer-to-peer systems, the overlay graph is highlyrandomized and difficult to characterize. There are no specificrequirements for the placement of data objects (or pointers to them),which are spread across arbitrary peers in the network. Given thisrandom placement of objects in the network, such systems use floodingor random walk to ensure a query covers a sufficiently large number ofpeers. Gnutella [8] is one of the most popularunstructured P2P file-sharing systems. Its overlay maintenancemessages include ping, pong and bye, where pings are used to discover hosts on the network, pongs arereplies to pings and contain information about the responding peer andother peers it knows about, and byes are optional messages thatinform of the upcoming closing of a connection. For query/search,early versions of Gnutella employ a simple flooding strategy,where a query is propagated to all neighbors within a certain numberof hops. This maximum number of hops, or time-to-live, isintended to limit query-related traffic.Two generations of the Gnutella protocols have been made public: the``flat'' Gnutella V0.4 [7], and the newerloosely-structured Gnutella V0.6 [14]. Gnutella V0.6attempts to improve query efficiency and reduce control trafficoverhead through a two-level hierarchy that distinguishes betweensuperpeers/ultrapeers and leaf-peers. In this version, the core ofthe network consists of high-capacity superpeers that connect to othersuperpeers and leaf-pears; the second layer is made of low-capacity(leaf-) peers that perform few, if any, overlay maintenance andquery-related tasks.2.2 The Overnet/Kademlia ProtocolStructured P2P systems, in contrast, introduce much tighter control onoverlay structuring, message routing, and object placement. Each peeris assigned a unique hash ID and typically maintains a routing tablecontaining entries, where is the total number of peersin the system. Certain requirements (or invariants) must be maintainedfor each routing table entry at each peer; for example, the locationof a data object (or its pointer) is a function of an object's hashvalue and a peer's ID. Such structure enables DHT-based systems tolocate an object within a logarithmic number of steps, using query messages. Overnet [23] is one of thefew widely-deployed DHT-based file-sharing systems. Because it is aclosed-source protocol, details about Overnet's implementation arescarce, and few third-party Overnet clients exist. Nevertheless, someof these clients, such as MLDonkey [21], and librarieslike KadC [11] provide opportunities for learning about theOvernet protocol.Overnet relies on Kademlia [20] as itsunderlying DHT protocol. Similar to other DHTs, Kademlia assigns a160-bit hash ID to each participating peer, and computes anequal-length hash key for each data object based on the SHA-1 hash ofthe content. key,value pairs are placed on peers with IDsclose to the key, where ``closeness'' is determined by the oftwo hash keys; i.e., given two hash identifiers, , and , theirdistance is defined by the bitwise exclusive or (XOR) (). In addition, each peer builds a routing table thatconsists of up to buckets, with the th bucket containing IDs of peers that share a -bit long prefix. In a 4-bitID space, for instance, peer 0011 stores pointers to peers whose IDsbegin with 1, 01, 000, and 0010 for its buckets , , and , respectively (Fig. 1). Compared to otherDHT routing tables, the placement of peer entries in Kademlia bucketsis quite flexible. For example, the bucket for peer 0011 cancontain any peers having an ID starting with 1.Figure 1:Routing table of peer 0011 in a 4-digit hash space.Kademlia supports efficient peer lookup for the closest peers fora given hash key. The procedure is performed in an iterative manner,where the peer initiating a lookup chooses the closest nodesto the target hash key from the appropriate buckets and sends them RPCs. Queried peers reply with peer IDs that are closerto the target key. This process is thus repeated, with the initiatorsending RPCs to nodes it has learned about from previousRPCs until it finds the closest peers. The XOR metric and therouting bucket's implementation guarantee a consistent, upper bound for the hash key lookup procedure inKademlia. The Kademlia protocol is discussed in detailin [20].Overnet builds a file-sharing P2P network with an overlay organizationand message routing protocol based on Kademlia. Overnet assigns eachpeer and object a 128-bit ID based on a MD4 hash. Object searchlargely follows the procedure described in the previousparagraph with some modifications. We will introduce additionaldetails on Overnet's search mechanism as we present and analyze itsquery performance in Section 5.3 Measurement Goals and MethodologyOur study focuses on the characterization of two operationalinstances of the unstructured (Gnutella) and unstructured (Overnet)approaches to P2P networks in terms of churn resilience(Section 4.1), query and control message(Sections 6 and 4.2)overhead, query performance (Section 5.1 and 5.2)), and load balancing(Section 6.1). A fair head-to-head comparisonof the two deployed systems would be impossible as one cannot controlkey parameters of the systems such as the number of active peers, thecontent and the query workload.We employed a combination of passive and active techniques to carryout our measurement study. For Gnutella, we modifiedMutella-0.4.3 [22], an open-source Gnutellaclient. Mutella is a command-based client, which conforms to Gnutellaspecifications V0.4 and V0.6. Our measurements of Overnet are based ona modified MLDonkey [21] client, an open-source P2Pclient written in Objective Caml, that supports multiple P2P systemsincluding Overnet. Modifications to both clients included extra codefor parameter adjustment, probing and accounting, among others. Noneof these modifications affected the outcome of the collected metrics.Our modified Gnutella and Overnet clients, each performing a similarbatch of experiments, were instantiated at four differentlocations Evanston, USA; Zurich, Switzerland; Paris, Franceand Beijing, China. around the world and run concurrently to identifypotential geographical biases and factor out time-of-day effects fromour measurements. All experiments were conducted from April 1st tothe 30th, 2005. For brevity, unless otherwise stated, the datapresented in the following sections as well as the associateddiscussions are based on clients placed behind a DSL connection inEvanston, Illinois. Measurements and analysis from the remainingthree sites yield similar results and will be briefly discussed inSection 7.4 Churn and Control Traffic OverheadThe transiency of peer populations (churn), and its implicationson P2P systems have recently attracted the attention of the researchcommunity. A good indication of churn is a peer's session length -the time between when the peer joins a network until it subsequentlyleaves. Note that a single peer could have multiple sessions duringits lifetime by repeatedly joining and leaving the network. Weperformed measurements of session length for peers in both theGnutella and Overnet networks, and studied the level of churn of thesetwo systems.In the context of file-sharing P2P systems, the level of replication,the effectiveness of caches, and the spread and satisfaction rate ofqueries will all be affected by how dynamic the peers' populationis [1,3,6,15,27]. ForP2P networks in general, control traffic overhead is also a functionof the level of churn. Control traffic refers to protocol-specificmessages sent between peers for overlay maintenance, including peersjoining/leaving the overlay, updating routing tables or neighbor sets,and so on. It does not include any user-generated traffic such asquery request and replies. In this section, we study the controltraffic overhead for both networks and discuss our findings.4.1 Level of ChurnWe performed session-length measurements of Gnutella by modifying theMutella client [22]. We first collected a large numberof (IP, port) tuples for Gnutella peers by examining ping/pongmessages that our Mutella client received. From the set of all the(IP, port) tuples, we probe a randomly selected subset tocollect data representative of peers' session-length distribution forthe Gnutella network. While performing the measurement, our clientperiodically (every 20 minutes) tries to initiate a Gnutella-specificconnection handshake with each peer in the list. The receiving peer atthe probed IP and port, if active, either accepts orrefuses the connection request (indicating its ``BUSY'' status). Oursession length measurement for Gnutella lasted for 7 days, andcaptured approximately 600,000 individual peer sessions.Session-length probing for Overnet was conducted using our modifiedOvernet (MLDonkey) client [21]. Each peer in Overnet isassigned a unique 128-bit hash ID that remains invariant acrosssessions. To search for a particular user by hash ID, Overnet providesthe OvernetSearch message type. Peers connect using a OvernetConnect message; when a peer receives an OverentConnectmessage, it responds with an OvernetConnectReply, which contains20 other peers' IDs known by the replying peer. To begin thesession-length measurement, we collected the hash IDs of 40,000Overnet peers by sending OvernetConnect messages and examiningIDs contained in the corresponding reply messages. We thenperiodically probed these peers to determine whether they were stillonline. Since it is possible for a peer to use different (IP,port) pairs for different sessions, we rely on OvernetSearchmessages to iteratively search for and detect peers that start newsessions using different (IP, port) tuples. As in the case ofGnutella, the session length measurement for Overnet also lasted 7days. During this time we continuously probed 40,000 distinct peersand measured over 200,000 individual sessions.Figure 2:Peers' session length in Gnutella.Figures 2 and 3 give theComplementary Cumulative Distribution Function (CCDF) of peers'session lengths for Gnutella and Overnet,respectively. Figure 2 shows that peers in theGnutella network show a medium degree of churn: 50% of all peers havea session length smaller than 4,300 seconds, and 80% have sessionlengths smaller than 13,400 seconds ( 4 hours). Only 2.5% of thesession lengths are longer than one day. The median session length ofan Overnet peer (Fig. 3) is around 8,100 seconds,80% of the peers have sessions that last less than 29,700 seconds,and 2.7% of all session lengths last more than a day. Overall, theOvernet network has a measurably lower, but still similar, level ofchurn when compared to the Gnutella network. In the next section, weanalyze the impact of these levels of churn on control trafficoverhead for each of these systems.Figure 3:Peers' session length in Overnet.4.2 Control Traffic OverheadIn this section, we present and discuss a three-day measurement of thecontrol traffic for Gnutella and Overnet. To better illustrate changeson bandwidth demands at finer time resolution, each figure in thissection shows a representative measurement window of 10,000 secondstaken from a client behind a DSL connection in Evanston,Illinois. Each data point corresponds to the average bandwidth demandsfor every 100 seconds. Data on bandwidth demand for other measurementperiods and measurement sites produced similar results.The measurement of the Gnutella network was done for Gnutella V0.6using our modified client, which can act either as a leaf peer or asan ultrapeer. The modified client records all control-related messagesthat it generates in addition to those messages originating from otherpeers and routed through it. The majority of control-related messagesin Gnutella are pings and pongs, along with small percentages of othertypes of control messages such as those used to repair routingtables. We opted for Gnutella V0.6 as this is the most common versionin the Gnutella network.Figure 4:Bandwidth consumption of control messages for the Gnutella client.Gnutella uses ultrapeers to exploit peers' heterogeneity in attributessuch as bandwidth capacity and CPU speed, thereby making the systemmore scalable and efficient. Ultrapeers typically connect to a largernumber of neighbors than do leaf peers and they are assigned moreresponsibility for responding to other peers' messages, thus consumingseveral times more bandwidth for control messages than leafpeers. Figure 4 illustrates this for a leaf peerconnected to no more than 4 ultrapeers, and an ultrapeer connected to amaximum of 5 ultrapeer neighbors and 8 leafpeer children. As would beexpected, while a leaf peer typically only consumes around 200Bytes/second for control messages, an ultrapeer normally needs tocontribute 5 to 6 times more bandwidth for control traffic (between800 and 1,400 Bytes/second). Despite this high relative differencebetween peer types, the bandwidth consumption for an ultrapeer isstill reasonably low and never exceeds 2,000 Bytes/second in ourmeasurement.Overall, a Gnutella peer does not consume a large amount of bandwidthfor control-related messages, as it would be expected given the looseorganization of its overlay infrastructure. Peers joining and leavingthe Gnutella network have little impact on other peers or on theplacement of shared data objects, and thus do not result insignificant control traffic.Figure 5:Bandwidth consumption of control messages for the Overnet client.Figure 5 shows the control traffic overheadfor our modified Overnet client. Contrary to common belief, we foundthat Overnet clients consume surprisingly little bandwidth: onlyaround 100 to 180 Bytes/second. The control message overhead for apeer is determined by a number of factors, such as the peer's numberof neighbors, the peer's (and its neighbors') probing intervals, thesize of the control message, etc. It is thus difficult to directlycompare control overhead across different protocols. Nevertheless, thereader should consider that while the measured Gnutella client has asignificantly shorter probing interval (10 seconds), it also limitsthe number of neighbors to 13 (in the ultrapeer case). The Overnetclient, on the other hand, often has over 300 neighbor peers in itsbuckets, which are probed at 1,800-sec intervals.Although a structured (DHT-based) system has strict rules for neighborselection


About

Welcome to the group! You can connect with other members, ge...

Members

bottom of page