Derecho  0.9
Distributed systems toolkit for RDMA
Public Member Functions | Static Public Member Functions | Static Public Attributes | Private Member Functions | Private Attributes | List of all members
derecho::RestartLeaderState Class Reference

#include <restart_state.hpp>

Collaboration diagram for derecho::RestartLeaderState:
Collaboration graph
[legend]

Public Member Functions

 RestartLeaderState (std::unique_ptr< View > _curr_view, RestartState &restart_state, const SubgroupInfo &subgroup_info, const node_id_t my_id)
 
void await_quorum (tcp::connection_listener &server_socket)
 Waits for nodes to rejoin at this node, updating the last known View and RaggedTrim (and corresponding longest-log information) as each node connects, until there is a quorum of nodes from the last known View and a new View can be installed that is adequately provisioned. More...
 
bool has_restart_quorum ()
 Checks to see whether the leader has achieved a restart quorum, which may involve recomputing the restart view if the minimum number of nodes have rejoined. More...
 
bool resend_view_until_quorum_lost ()
 Repeatedly attempts to send a new restart view, recomputing it on each failure, until either there is no longer a restart quorum or the view was sent successfully to everyone. More...
 
int64_t send_restart_view ()
 Sends the currently-computed restart view, the current ragged trim, the current location of the longest logs (the "shard leaders"), and the DerechoParams to all members who are currently ready to restart. More...
 
void send_abort ()
 Sends an Abort message to all nodes that have previously been sent the restart View, indicating that they must go back to waiting for a new View. More...
 
int64_t send_prepare ()
 Sends a Prepare message to all members who are currently ready to restart; this checks for failures one more time before committing. More...
 
void send_commit ()
 Sends a Commit message to all members of the restart view, then closes the TCP sockets connected to them. More...
 
const Viewget_curr_view () const
 Read the curr_view (last known view) managed by RestartLeaderState. More...
 
const Viewget_restart_view () const
 Read the current restart view managed by RestartLeaderState. More...
 
std::unique_ptr< Viewtake_restart_view ()
 Remove and return the restart view managed by RestartLeaderState; this will take ownership back to the caller (ViewManager). More...
 
void print_longest_logs () const
 

Static Public Member Functions

static std::unique_ptr< Viewmake_next_view (const std::unique_ptr< View > &curr_view, const std::vector< node_id_t > &joiner_ids, const std::vector< std::tuple< ip_addr_t, uint16_t, uint16_t, uint16_t, uint16_t >> &joiner_ips_and_ports)
 Constructs the next view from the current view and a list of joining nodes, by ID and IP address. More...
 
static bool contains_at_least_one_member_per_subgroup (std::set< node_id_t > rejoined_node_ids, const View &last_view)
 

Static Public Attributes

static const int RESTART_LEADER_TIMEOUT = 2000
 

Private Member Functions

void receive_joiner_logs (const node_id_t &joiner_id, tcp::socket &client_socket)
 Helper method for await_quorum that processes the logged View and RaggedTrims from a single rejoining node. More...
 
bool compute_restart_view ()
 Recomputes the restart view based on the current set of nodes that have rejoined (in waiting_join_sockets and rejoined_node_ids). More...
 
std::unique_ptr< Viewupdate_curr_and_next_restart_view ()
 Updates curr_view and makes a new next_view based on the current set of rejoining nodes during total restart. More...
 

Private Attributes

std::unique_ptr< Viewcurr_view
 Takes ownership of ViewManager's curr_view pointer, because await_quroum() might replace curr_view with a newer view discovered on a restarting node. More...
 
RestartStaterestart_state
 Mutable reference to RestartState, since this class needs to update the restart state stored in ViewManager. More...
 
const SubgroupInfosubgroup_info
 
std::unique_ptr< Viewrestart_view
 
std::map< node_id_t, tcp::socketwaiting_join_sockets
 
std::map< node_id_t, std::tuple< ip_addr_t, uint16_t, uint16_t, uint16_t, uint16_t > > rejoined_node_ips_and_ports
 
std::set< node_id_tmembers_sent_restart_view
 
std::set< node_id_trejoined_node_ids
 
std::set< node_id_tlast_known_view_members
 
std::vector< std::vector< persistent::version_t > > longest_log_versions
 
std::vector< std::vector< int64_t > > nodes_with_longest_log
 
const node_id_t my_id
 

Detailed Description

Definition at line 80 of file restart_state.hpp.

Constructor & Destructor Documentation

◆ RestartLeaderState()

derecho::RestartLeaderState::RestartLeaderState ( std::unique_ptr< View _curr_view,
RestartState restart_state,
const SubgroupInfo subgroup_info,
const node_id_t  my_id 
)

Definition at line 74 of file restart_state.cpp.

Member Function Documentation

◆ await_quorum()

void derecho::RestartLeaderState::await_quorum ( tcp::connection_listener server_socket)

Waits for nodes to rejoin at this node, updating the last known View and RaggedTrim (and corresponding longest-log information) as each node connects, until there is a quorum of nodes from the last known View and a new View can be installed that is adequately provisioned.

Parameters
server_socketThe TCP socket to listen for rejoining nodes on

Definition at line 103 of file restart_state.cpp.

◆ compute_restart_view()

bool derecho::RestartLeaderState::compute_restart_view ( )
private

Recomputes the restart view based on the current set of nodes that have rejoined (in waiting_join_sockets and rejoined_node_ids).

This just ties together update_curr_and_next_restart_view and make_subgroup_maps.

Returns
True if the restart view would be adequate, false if it would be inadequate.

Definition at line 251 of file restart_state.cpp.

◆ contains_at_least_one_member_per_subgroup()

bool derecho::RestartLeaderState::contains_at_least_one_member_per_subgroup ( std::set< node_id_t rejoined_node_ids,
const View last_view 
)
static
Returns
true if the set of node IDs includes at least one member of each subgroup in the given View.

Definition at line 500 of file restart_state.cpp.

◆ get_curr_view()

const View& derecho::RestartLeaderState::get_curr_view ( ) const
inline

Read the curr_view (last known view) managed by RestartLeaderState.

Only used for debugging.

Definition at line 185 of file restart_state.hpp.

◆ get_restart_view()

const View& derecho::RestartLeaderState::get_restart_view ( ) const
inline

Read the current restart view managed by RestartLeaderState.

Definition at line 187 of file restart_state.hpp.

◆ has_restart_quorum()

bool derecho::RestartLeaderState::has_restart_quorum ( )

Checks to see whether the leader has achieved a restart quorum, which may involve recomputing the restart view if the minimum number of nodes have rejoined.

Returns
True if there is a restart quorum, false if there is not

Definition at line 157 of file restart_state.cpp.

◆ make_next_view()

std::unique_ptr< View > derecho::RestartLeaderState::make_next_view ( const std::unique_ptr< View > &  curr_view,
const std::vector< node_id_t > &  joiner_ids,
const std::vector< std::tuple< ip_addr_t, uint16_t, uint16_t, uint16_t, uint16_t >> &  joiner_ips_and_ports 
)
static

Constructs the next view from the current view and a list of joining nodes, by ID and IP address.

This is slightly different from the standard ViewManager::make_next_view because it gets explicit inputs rather than examining the SST, and assumes that all nodes marked failed in curr_view will be removed (instead of removing only the "accepted changes").

Parameters
curr_viewThe current view, including the list of failed members to remove
joiner_idsThe list of joining node IDs
joiner_ips_and_portsThe list of IP addresses and ports for the joining nodes
logger
Returns
A View object for the next view

Definition at line 439 of file restart_state.cpp.

◆ print_longest_logs()

void derecho::RestartLeaderState::print_longest_logs ( ) const

Definition at line 398 of file restart_state.cpp.

◆ receive_joiner_logs()

void derecho::RestartLeaderState::receive_joiner_logs ( const node_id_t joiner_id,
tcp::socket client_socket 
)
private

Helper method for await_quorum that processes the logged View and RaggedTrims from a single rejoining node.

This may update curr_view or logged_ragged_trim if the joiner has newer information.

Parameters
joiner_idThe ID of the rejoining node
client_socketThe TCP socket connected to the rejoining node

Definition at line 173 of file restart_state.cpp.

◆ resend_view_until_quorum_lost()

bool derecho::RestartLeaderState::resend_view_until_quorum_lost ( )

Repeatedly attempts to send a new restart view, recomputing it on each failure, until either there is no longer a restart quorum or the view was sent successfully to everyone.

Returns
True if the view was sent successfully, false if the quorum was lost

Definition at line 328 of file restart_state.cpp.

◆ send_abort()

void derecho::RestartLeaderState::send_abort ( )

Sends an Abort message to all nodes that have previously been sent the restart View, indicating that they must go back to waiting for a new View.

Definition at line 349 of file restart_state.cpp.

◆ send_commit()

void derecho::RestartLeaderState::send_commit ( )

Sends a Commit message to all members of the restart view, then closes the TCP sockets connected to them.

Definition at line 389 of file restart_state.cpp.

◆ send_prepare()

int64_t derecho::RestartLeaderState::send_prepare ( )

Sends a Prepare message to all members who are currently ready to restart; this checks for failures one more time before committing.

Returns
-1 if all sends were successful; the ID of a node that has failed if sending a Prepare message failed.

Definition at line 356 of file restart_state.cpp.

◆ send_restart_view()

int64_t derecho::RestartLeaderState::send_restart_view ( )

Sends the currently-computed restart view, the current ragged trim, the current location of the longest logs (the "shard leaders"), and the DerechoParams to all members who are currently ready to restart.

Returns
-1 if all sends were successful; the ID of a node that has failed if sending the View to a node failed.

Definition at line 257 of file restart_state.cpp.

◆ take_restart_view()

std::unique_ptr<View> derecho::RestartLeaderState::take_restart_view ( )
inline

Remove and return the restart view managed by RestartLeaderState; this will take ownership back to the caller (ViewManager).

Definition at line 190 of file restart_state.hpp.

◆ update_curr_and_next_restart_view()

std::unique_ptr< View > derecho::RestartLeaderState::update_curr_and_next_restart_view ( )
private

Updates curr_view and makes a new next_view based on the current set of rejoining nodes during total restart.

Returns
The next view that will be installed if the restart continues at this point

Definition at line 410 of file restart_state.cpp.

Member Data Documentation

◆ curr_view

std::unique_ptr<View> derecho::RestartLeaderState::curr_view
private

Takes ownership of ViewManager's curr_view pointer, because await_quroum() might replace curr_view with a newer view discovered on a restarting node.

Definition at line 85 of file restart_state.hpp.

◆ last_known_view_members

std::set<node_id_t> derecho::RestartLeaderState::last_known_view_members
private

Definition at line 96 of file restart_state.hpp.

◆ longest_log_versions

std::vector<std::vector<persistent::version_t> > derecho::RestartLeaderState::longest_log_versions
private

Definition at line 97 of file restart_state.hpp.

◆ members_sent_restart_view

std::set<node_id_t> derecho::RestartLeaderState::members_sent_restart_view
private

Definition at line 94 of file restart_state.hpp.

◆ my_id

const node_id_t derecho::RestartLeaderState::my_id
private

Definition at line 99 of file restart_state.hpp.

◆ nodes_with_longest_log

std::vector<std::vector<int64_t> > derecho::RestartLeaderState::nodes_with_longest_log
private

Definition at line 98 of file restart_state.hpp.

◆ rejoined_node_ids

std::set<node_id_t> derecho::RestartLeaderState::rejoined_node_ids
private

Definition at line 95 of file restart_state.hpp.

◆ rejoined_node_ips_and_ports

std::map<node_id_t, std::tuple<ip_addr_t, uint16_t, uint16_t, uint16_t, uint16_t> > derecho::RestartLeaderState::rejoined_node_ips_and_ports
private

Definition at line 93 of file restart_state.hpp.

◆ RESTART_LEADER_TIMEOUT

const int derecho::RestartLeaderState::RESTART_LEADER_TIMEOUT = 2000
static

Definition at line 127 of file restart_state.hpp.

◆ restart_state

RestartState& derecho::RestartLeaderState::restart_state
private

Mutable reference to RestartState, since this class needs to update the restart state stored in ViewManager.

Definition at line 88 of file restart_state.hpp.

◆ restart_view

std::unique_ptr<View> derecho::RestartLeaderState::restart_view
private

Definition at line 91 of file restart_state.hpp.

◆ subgroup_info

const SubgroupInfo& derecho::RestartLeaderState::subgroup_info
private

Definition at line 89 of file restart_state.hpp.

◆ waiting_join_sockets

std::map<node_id_t, tcp::socket> derecho::RestartLeaderState::waiting_join_sockets
private

Definition at line 92 of file restart_state.hpp.


The documentation for this class was generated from the following files: