The tasks of image manipulation have always been important. Recently, with the advancements in AI-based image manipulation in the field of computer vision, many methods for editing in the 2D images domain have been proposed and proven effective. It is natural to extend the manipulation to the 3D domain, whose success could bring improvement in the multimedia industry and even areas such as VR, Gaming, etc. This critical task is feasible with the success of diffusion models (DMs) and text models, generating or editing an image with only textual input is possible. DMs could also be used to generate 3D models conditioned on text input, which shows the possibility of guiding 3D generation using 2D models.