Abstract: Vision-language models such as CLIP have boosted the performance of open-vocabulary object detection, where the detector is trained on base categories but required to detect novel categories ...
Abstract: Soft object manipulation poses significant chal-lenges for robots, requiring effective techniques for state representation and manipulation policy learning. State representation involves ...