根據同一個字典中的另一個，提取熊貓中一個字典鍵的值

Question

這是來自R家伙的。

我在Pandas列中有這個爛攤子： data['crew'] 。

array(["[{'credit_id': '54d5356ec3a3683ba0000039', 'department': 'Production', 'gender': 1, 'id': 494, 'job': 'Casting', 'name': 'Terri Taylor', 'profile_path': None}, {'credit_id': '56407fa89251417055000b58', 'department': 'Sound', 'gender': 0, 'id': 6745, 'job': 'Music Editor', 'name': 'Richard Henderson', 'profile_path': None}, {'credit_id': '5789212392514135d60025fd', 'department': 'Production', 'gender': 2, 'id': 9250, 'job': 'Executive In Charge Of Production', 'name': 'Jeffrey Stott', 'profile_path': None}, {'credit_id': '57892074c3a36835fa002886', 'department': 'Costume & Make-Up', 'gender': 0, 'id': 23783, 'job': 'Makeup Artist', 'name': 'Heather Plott', 'profile_path': None}

它持續了一段時間。 每個新字典都以credit_id字段開頭。 一個賣出可以將多個字典排列成陣列。

假設我想要所有Casting導演的姓名，如第一個條目所示。 我需要檢查每個字典中的job條目，如果是Casting ，則獲取name字段中的內容並將其存儲在data['crew']數據框中。

我嘗試了幾種策略，然后退而求其次。 運行以下命令關閉了我，因此我什至無法訪問一個簡單的字段。 我如何在Pandas中完成此操作。

for row in data.head().iterrows():
    if row['crew'].job == 'Casting':
        print(row['crew'])

編輯：錯誤消息

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-138-aa6183fdf7ac> in <module>()
      1 for row in data.head().iterrows():
----> 2     if row['crew'].job == 'Casting':
      3         print(row['crew'])

TypeError: tuple indices must be integers or slices, not str

編輯：首先用於獲取dict（字符串？）數組的代碼。

def convert_JSON(data_as_string):
    try:
        dict_representation = ast.literal_eval(data_as_string)
        return dict_representation
    except ValueError:
        return []

data["crew"] = data["crew"].map(lambda x: sorted([d['name'] if d['job'] == 'Casting' else '' for d in convert_JSON(x)])).map(lambda x: ','.join(map(str, x))

Answer 1

要從示例數據創建一個DataFrame，請編寫：

df = pd.DataFrame(data=[
  { 'credit_id': '54d5356ec3a3683ba0000039', 'department': 'Production',
    'gender': 1, 'id': 494, 'job': 'Casting', 'name': 'Terri Taylor',
    'profile_path': None},
  { 'credit_id': '56407fa89251417055000b58', 'department': 'Sound',
    'gender': 0, 'id': 6745, 'job': 'Music Editor',
    'name': 'Richard Henderson', 'profile_path': None},
  { 'credit_id': '5789212392514135d60025fd', 'department': 'Production',
    'gender': 2, 'id': 9250, 'job': 'Executive In Charge Of Production',
    'name': 'Jeffrey Stott', 'profile_path': None},
  { 'credit_id': '57892074c3a36835fa002886', 'department': 'Costume & Make-Up',
    'gender': 0, 'id': 23783, 'job': 'Makeup Artist',
    'name': 'Heather Plott', 'profile_path': None}])

然后，您可以通過一條指令獲取數據：

df[df.job == 'Casting'].name

結果是：

0    Terri Taylor
Name: name, dtype: object

以上結果是找到名稱的Pandas Series對象。 在這種情況下， 0是找到的記錄的索引值，而Terri Taylor是Casting Director的名稱（數據中唯一的名稱）。

編輯

如果只需要一個列表（而不是Series ），請輸入：

df[df.job == 'Casting'].name.tolist()

結果是['Terri Taylor'] -只是一個列表。

我認為，兩種解決方案都應該比基於iterrows() “普通”循環更快。

檢查執行時間，您還可以嘗試其他解決方案：

df.query("job == 'Casting'").name.tolist()

==========

就您的代碼而言：

每次包含以下內容的對時， iterrows()返回：

當前行的鍵，
一個命名的元組-此行的內容。

因此，您的循環應類似於：

for row in df.iterrows():
    if row[1].job == 'Casting':
        print(row[1]['name'])

您不能寫row[1].name因為它引用了索引值（此處與命名元組的默認屬性發生沖突）。

根據同一個字典中的另一個，提取熊貓中一個字典鍵的值

問題描述

1 個解決方案

解決方案1
1 2019-04-24 16:01:40

編輯

根據同一個字典中的另一個，提取熊貓中一個字典鍵的值

問題描述

1 個解決方案

解決方案1 1 2019-04-24 16:01:40

編輯

解決方案1
1 2019-04-24 16:01:40