繁体   English   中英

转换 dataframe 列中的整数 arrays 中的字符串数组

[英]Convert array of strings in arrays of integers in a dataframe column

我正在尝试转换 arrays 中的字符串数组,这些整数在 dataframe 列中关联其 ID。

那是因为我需要 map 每个 id 的家庭房间列表,如下所示:

那就是 JSON 我必须 map:

[ 
   {
     "id": 1,
     "name": "dining room",
   }, 
   {
      "id": 2,
      "name": "living room",
   },
   {
      "id": 3,
      "name": "guest room",
   },
   {
      "id": 4,
      "name": "bathroom",
   },
   {
      "id": 5,
      "name": "game room",
   },
   {
      "id": 6,
      "name": "kitchen",
   },
   {
      "id": 7,
      "name": "storage room",
   },
   {
      "id": 8,
      "name": "bedroom",
   },
   {
      "id": 9,
      "name": "family room",
   }
]

那就是 dataframe 我有:

index     home_rooms             
0         [dining room, living room, bathroom]                     
1         [guest room, kitchen, game room] 
2         [storage room, family room, bedroom] 
3         [dining room, living room, bathroom] 
4         [guest room, kitchen, game room]
5         [storage room, family room, bedroom] 
6         [dining room, living room, bathroom] 
7         [guest room, kitchen, game room]
8         [storage room, family room, bedroom]

这就是我需要的 dataframe:

index     home_rooms             
0         [1, 2, 4]                     
1         [3, 6, 5] 
2         [7, 9, 8] 
3         [1, 2, 4]
4         [3, 6, 5]
5         [7, 9, 8] 
6         [1, 2, 4] 
7         [3, 6, 5]
8         [7, 9, 8]

有什么解决办法吗?

提前致谢。

我们将 json 字符串称为l_str 将其加载到 dataframe 作为df_map df_map构造字典d中的结构name: id 使用itemgetter和列表理解来构造每个indexid列表

from operator import itemgetter

df_map = pd.read_json(l_str)
d = dict(zip(df_map.name, df_map.id))
df['home_rooms'] = [list(itemgetter(*x)(d)) for x in df.home_rooms]

Out[415]:
   index home_rooms
0      0  [1, 2, 4]
1      1  [3, 6, 5]
2      2  [7, 9, 8]
3      3  [1, 2, 4]
4      4  [3, 6, 5]
5      5  [7, 9, 8]
6      6  [1, 2, 4]
7      7  [3, 6, 5]
8      8  [7, 9, 8]

尝试:

mapper = pd.read_json(jsonstr).set_index('name')['id']
df_out = df.explode('home_rooms').replace('dinig room', 'dining room') #fix typo with replace
df_out['home_rooms'] = df_out['home_rooms'].map(mapper)
df_out.groupby('index').agg(list).reset_index()

Output:

   index home_rooms
0      0  [1, 2, 4]
1      1  [3, 6, 5]
2      2  [7, 9, 8]
3      3  [1, 2, 4]
4      4  [3, 6, 5]
5      5  [7, 9, 8]
6      6  [1, 2, 4]
7      7  [3, 6, 5]
8      8  [7, 9, 8]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM