Yirany / UniMM-Chat

Dataset Card for Dataset Name

Dataset Description

An open-source, knowledge-intensive, and multi-round multimodal dialogue data powered by Turbo APIs.

UniMM-Chat leverage complementary annotations from different VL datasets. It is constructed for visual instruction tuning and expected to be helpful in improving models’ capabilities to solve different tasks without harming their generation ability.

Dataset Structure

Each line in the downloaded data file is a json dict containing the image name and dialogue data in a list format. Below is an example line. Each image_name is corresponding to the f'{image_name}.jpg' image file in COCO images.

{
  "image_name": "000000000285",
  "data": [
      {"from": "human", "value: <image>\nHow many people are on the field?\n"},
      {"from": "gpt", "value": "There are three people on the field. One player is holding a bat, standing on the dirt within the playing area, while another player, the catcher, is holding out his mitt. The third person is not easily visible."},
      {"from": "human", "value": "What sport is being played in this picture?\n"},
      {"from": "gpt", "value": "Baseball is being played in this picture. You can see the players holding a bat and catchers mitt, and there is a baseball field in the background, which is typical of a baseball game."},
      {"from": "human", "value": "Are there many seats available in the game?\n"},
      {"from": "gpt", "value": "No, there arent many seats available in the game. The stands can be seen filled with a lot of people. This implies that the game is highly anticipated and widely attended."},
    ...
   ]
}