Robust and Flexible Reward Modeling for LLM Alignment