PDA

View Full Version : Performance Question ...


njones
08-24-2009, 02:16 PM
My questions are best expressed as a hypothetical situation:

A system includes a "bridge" connected to the internet, and 50 "worker" nodes. The bridge needs to:
- Listen for new "worker" nodes "logging in"
- Gather status information from up to 50 nodes every 10 seconds
- Receive infrequent event-driven priority status updates from "worker" node(s)
- Send infrequent event-driven commands to the nodes

After reading numerous posts on this forum, I have reached the following conclusion - correct me if I'm wrong.

Option A: The "bridge" node could manage bandwidth by scheduling unicast "poll" calls to each "worker" node, leaving enough bandwidth for event-driven unicast messages.

Option B: Each "worker" node could have a "reporting interval" configured by the "bridge" node. This will save the bandwidth of the "poll" messages, leaving more bandwidth for event-driven messages, but could result in collisions as the "worker" nodes would not by synchronized.

Option C: The "bridge" node could multicast a "poll" using a predefined multicast group number. All "worker" nodes could respond which would result in collisions, but the network would eventually sort it out. This will save the bandwidth of unicast "poll" messages, but may result in some lost bandwidth due to collisions.

Q1: In all cases it is possible for multiple "worker" nodes to attempt to send a mesage to the "bridge". One Poll response and one or more event-driven messages. Presumaby the first message received invokes a SNAPpy script in the "bridge", but what happens if another message is received while the script is executing? Does the SNAP stack include buffers for received messages? How many messages can be "buffered" while a previous one is executing? Assuming SNAPpy script execution can keep up with the network I suupose only 2 RX buffers are required. So, subject to reasonably efficient scripts, can SNAPpy keep up with the network?

Q2: Assuming the "bridge" cannot keep up, what happens to the "worker" nodes whose messages cannot be received due to full bridge buffers? Retry? Does the calling application script get an error response so it can implement an application level retry? I'd hate to have to add an application level acknowledgement to catch these errors.

Q3: Mostly out of curiosity, if a node hears a retransmission of a multicast it has already heard and retransmitted, is it smart enough to ignore it the second time? If not, adding this capability would reduce the severity of a packet storm in a small network where all nodes hear each other.

Q4: In my example of 50 nodes, if SNAP has a route table size of 9 (I think I read that somewhere here), then presumably every message will result in route discovery - which is a multicast and has a very high bandwidth cost. Can I sacrifice some available memory space and increase the routing table size?

kbanks
08-24-2009, 09:03 PM
Option A: The "bridge" node could manage bandwidth by scheduling unicast "poll" calls to each "worker" node, leaving enough bandwidth for event-driven unicast messages.

Yes

Option B: Each "worker" node could have a "reporting interval" configured by the "bridge" node. This will save the bandwidth of the "poll" messages, leaving more bandwidth for event-driven messages, but could result in collisions as the "worker" nodes would not by synchronized.

Yes

Option C: The "bridge" node could multicast a "poll" using a predefined multicast group number. All "worker" nodes could respond which would result in collisions, but the network would eventually sort it out. This will save the bandwidth of unicast "poll" messages, but may result in some lost bandwidth due to collisions.

Not quite. In particular, when you say "but the network would eventually sort it out". If enough collisions occur, all retries would be exhausted before the message got through. (The number of retries is configurable, but finite.)

Q1: In all cases it is possible for multiple "worker" nodes to attempt to send a mesage to the "bridge". One Poll response and one or more event-driven messages. Presumaby the first message received invokes a SNAPpy script in the "bridge", but what happens if another message is received while the script is executing? Does the SNAP stack include buffers for received messages?

Yes we have multiple buffers (for all types of incoming data, not just "data from the radio").

How many messages can be "buffered" while a previous one is executing?

This varies between builds, depending on how much RAM is used for other things.

Assuming SNAPpy script execution can keep up with the network I suupose only 2 RX buffers are required.

We use more than two.

So, subject to reasonably efficient scripts, can SNAPpy keep up with the network?

SNAPpy can keep up with traffic also generated by SNAPpy. I suspect a fast PC could outrun the little SNAP nodes.

Q2: Assuming the "bridge" cannot keep up, what happens to the "worker" nodes whose messages cannot be received due to full bridge buffers? Retry?

Yes, a special type of retry. The configurable retry parameter is the "no response" retry. However, in the situation you are describing, the receiving node "hears" the packet just fine, it just has no place to put it.

We use a special handshake packet to say "I heard you fine, but have no room, send it to me again in a bit".

So, in this situation, the packet(s) will eventually get through.

This by the way is how end-to-end flow control works. Try accepting 19200 baud data at one end of a data mode pair, and spitting it out at 1200 baud at the other end. As long as you turn on flow control (and the originating source honors it), no characters will be dropped.

Does the calling application script get an error response so it can implement an application level retry? I'd hate to have to add an application level acknowledgement to catch these errors.

You can guess the answer to this one. As just described, SNAP handles this for you.

Q3: Mostly out of curiosity, if a node hears a retransmission of a multicast it has already heard and retransmitted, is it smart enough to ignore it the second time? If not, adding this capability would reduce the severity of a packet storm in a small network where all nodes hear each other.

It's smart enough, but does not always have sufficient memory to remember ALL of the recent multicasts (yet another LIFO data structure within the node). (This was discussed in another post recently, you might try a forum search).

Q4: In my example of 50 nodes, if SNAP has a route table size of 9 (I think I read that somewhere here), then presumably every message will result in route discovery - which is a multicast and has a very high bandwidth cost. Can I sacrifice some available memory space and increase the routing table size?

In all current builds it's 10 (not 9), and no you cannot increase it.

There is very little dynamic memory usage within these nodes. In fact, until the 2.2 "dynamic string buffers" feature, there was NO dynamic allocation.

You can do things to help like "focusing" on one node at a time. In other words, don't do


rpc(nodeA,'func1')
rpc(nodeB,'func1')
rpc(nodeC,'func1')

rpc(nodeA,'func2')
rpc(nodeB,'func2)
rpc(nodeC,'func2')

rpc(nodeA,'func3')
rpc(nodeB,'func3')
rpc(nodeC,'func3')


Instead do


rpc(nodeA,'func1')
rpc(nodeA,'func2')
rpc(nodeA,'func3')

rpc(nodeB,'func1')
rpc(nodeB,'func2)
rpc(nodeB,'func3')

rpc(nodeC,'func2')
rpc(nodeC,'func1')
rpc(nodeC,'func3')


In other words, use the route while you've got it, don't wait and have to rediscover it again.

njones
08-25-2009, 08:29 AM
Thank you for your very complete answer.

I have concluded that I should use an unsolicited, scheduled messages to send status from each "worker" to the "bridge". That would only use one routing table entry in each "worker", and none in the "bridge" (no callback) - right?

If I limit the bridge-to-worker traffic to event-driven traffic, then routes would still have to be discovered, but much less frequently and only once for the first message in a burst to a particular worker.

kbanks
08-25-2009, 11:31 AM
That would only use one routing table entry in each "worker", and none in the "bridge" (no callback) - right?

No, two route table entries will be filled in (the "reverse route" back to the node is also automatically discovered).

BUT, SNAP 2.2 uses a smarter "route discard" algorithm than 2.1 did. This lets the "bridge" discard the "up to Portal route" less than 2.1 did.

njones
08-25-2009, 12:21 PM
Thanks for the response, I'm not certain what you meant by:

No, two route table entries will be filled in (the "reverse route" back to the node is also automatically discovered).
Are both route table entries created in the originating node, since it triggered the unicast message and subsequent route discovery, or is it one table entry in each of the unicast target and the originator?

I'm hoping that the routing table burden is placed entirely on the unicast originator, and that each unicast message contains both a "route to the target" and "route back to the originator" for confirmation (ACK). The callback API is new, but it seems to me that the route used for callbacks could bypass the routing table and use the "route back to the originator" from the inital RPC message, and would not require route discovery or updating the target's routing table - unless the return route had failed in the meantime.

This would be advantageous to my particular case (many-to-one unicast messaging), I would use two route table entries in the unicast originators, and none in the unicast target. And, other than periodic route re-discovery, would not burn bandwidth with constantly repeated route discovery traffic.

If my assumption is not true, I'll have to deal with the reduced throughput, but please take note of this as a suggestion for future optimization.

kbanks
08-25-2009, 03:58 PM
You are obviously familiar with some other routing protocols, so I think the following facts will clear things up for you. (I'm trying to avoid a long post that potentially muddies the water further).

1) An individual "Route Table entry" in SNAP is unidirectional. It tells how to get to some other node, but says nothing about how they would get back to you.

2) Some of your comments made me think you were thinking in terms of Source Routing. SNAP does not use this, it is all about "next hops".

njones
08-26-2009, 09:54 AM
Sorry to get in deep like this, but I neeed to be very clear about routing table updates so I can design a reliable, deterministic system.

If you have a whitepaper (or patent) to can direct me to, I'll be happy to read up on this.

1) Does the recipient of a unicast message update its routing table as a result of just receiving the message? (It is obvious that the recipient of the unicast RPC will have to update its routing tables if it subsequently executes a callback, which is itself unicast RPC message).

2) Does each node along the routing path update its routing table?

Clearly understanding the conditions that update routing tables will help me avoid unnecessarily destroying routing information and wasting bandwidth on route discovery.

All I really need to know is if my "bridge" never makes any unicast or broadcast PRC calls (including no use of callbacks), and is only the target of unicast RPC calls from 50 other nodes: does the routing table in the "bridge" get updated as a result of receiving the RPC calls, and will there be ongoing route re-discovery as a result of 50 nodes sending status updates regularly.

kbanks
08-26-2009, 12:13 PM
Sorry to get in deep like this, but I neeed to be very clear about routing table updates so I can design a reliable, deterministic system.

If you have a whitepaper (or patent) to can direct me to, I'll be happy to read up on this.

We currently have no white papers or application notes on this topic.

Although there are several patents pending on various aspects of SNAP/SNAPpy, none pertain directly to our Mesh Routing.

SNAP's Mesh Routing is proprietary, but is similiar in spirit to DYMO (do a web search for draft-ietf-manet-dymo). Understanding the process that DYMO goes through will make you better understand what SNAP is doing for you behind the scenes.

1) Does the recipient of a unicast message update its routing table as a result of just receiving the message? (It is obvious that the recipient of the unicast RPC will have to update its routing tables if it subsequently executes a callback, which is itself unicast RPC message).

Route Discovery precedes unicast message transfer. In other words, if a node is the recipient of a unicast message, the routing tables have already been updated.

2) Does each node along the routing path update its routing table?

Each node along the route will already have updated it's routing table (as a result of the route discovery process), before the actual data packet gets sent along that route.

It is true that some "timestamp" information gets updated in the table entries by the passage of the data packet, but I think you have been asking specifically about "route table additions".

Clearly understanding the conditions that update routing tables will help me avoid unnecessarily destroying routing information and wasting bandwidth on route discovery.

All I really need to know is if my "bridge" never makes any unicast or broadcast PRC calls (including no use of callbacks), and is only the target of unicast RPC calls from 50 other nodes: does the routing table in the "bridge" get updated as a result of receiving the RPC calls,...

Yes, as each node learns a route to the bridge, the bridge automatically learns a route back to them.

... and will there be ongoing route re-discovery as a result of 50 nodes sending status updates regularly.

Not by the bridge (you said it was never sending anything). The nodes may have to rediscover routes, depending on your physical network topology (are there any nodes that are also acting as routers for other nodes, or are all nodes only "one hop away"?).

It also makes a difference if you are talking to the bridge, or through the bridge (in other words, through the bridge to get to Portal or some other SNAPconnect client).

In 2.1, the route "up" to Portal could get discarded to make room for routes "down" to network nodes. One of the 2.2 changes (please look in the SNAP 2.2 Release Notes elswehere on this forum) was to "notice" who the remote node was trying to get to (through the bridge), and to be sure and keep that route too.

So, in 2.0/2.1 some unnecessary discoveries could take place, but we enhanced that code for 2.2.

njones
08-26-2009, 01:33 PM
I completely understand that discovery is triggered if a unicast message is pending and an existing route is not already known, or has gone stale - and that the discovery process happens before the unicast message is transmitted.

but I remain unclear about one point:

I think you said: as part of the route discovery process, the routing table of the sender, recipient and all "hop" nodes along the path, get updated.

But you also said that there would not be any ongoing route discovery even with 50 nodes sending messages to the bridge (P.S. the bridge is the endpoint of the messages, they do not pass through)

This seems to imply that although the bridge node's routing table gets updated by the route discovery, this table entry is not used to "receive" a unicast message. It is only required if the bridge needs to send a message back (like a callback).

It also seems to imply, that a remote node can still sent a unicast to the bridge as long as the remote node still has a good routing path to the bridge, even if the bridge's routing table may have been modified by intervening route discoveries by other remote nodes.

Is this correct?

kbanks
08-26-2009, 04:43 PM
I completely understand that discovery is triggered if a unicast message is pending and an existing route is not already known, or has gone stale - and that the discovery process happens before the unicast message is transmitted.

Remember "or has gone stale", we will talk about it further down in this post.

but I remain unclear about one point:

I think you said: as part of the route discovery process, the routing table of the sender, recipient and all "hop" nodes along the path, get updated.

But you also said that there would not be any ongoing route discovery even with 50 nodes sending messages to the bridge (P.S. the bridge is the endpoint of the messages, they do not pass through)

Ongoing Route Discovery could occur due to routes timing out (stale routes), unless you are disabling route timeouts (I don't recall if you said you were doing that or not). FYI, you do this by setting Mesh Routing Max Delete Timeout to 0.

This seems to imply that although the bridge node's routing table gets updated by the route discovery, this table entry is not used to "receive" a unicast message. It is only required if the bridge needs to send a message back (like a callback).

True

It also seems to imply, that a remote node can still sent a unicast to the bridge as long as the remote node still has a good routing path to the bridge, even if the bridge's routing table may have been modified by intervening route discoveries by other remote nodes.

Is this correct?

It is, but your exact wording still has me concerned you have the wrong mental picture of what a Route Table entry looks like within each node.

Assume four nodes A-D arranged like A B C D and further assume each one can only "hear" his adjacent neighbors. So, for example, to get from A to D is 3 hops.

If A needs to send to D, and successful route discovery takes place, then A winds up knowing that to get to D, the "next hop" is B. B winds up with an entry that tells him that to get to D, his "next hop" is C.

In other words, the composite knowledge of the full route is distributed between the nodes along the way.

Maybe you already "get this", but when you talk about nodes at the ends of a route "using their route" it makes me unsure.

njones
08-26-2009, 09:13 PM
Thanks for the explanation, you were right, I had assumed that the originating node knew the full route.

Thanks for patiently answering my questions.