如何從列表中提取數據作為字符串，並在熊貓中按值選擇數據？

Question

我有一個這樣的數據框：

col1              col2
[abc, bcd, dog]   [[.4], [.5], [.9]]
[cat, bcd, def]   [[.9], [.5], [.4]]

在數字col2列表描述中的元素（基於列表索引位置） col1 。 所以。” 4"在col2描述了‘ABC’ col1 。

我想創建 2 個新列，一個只提取col1中col2中 >= .9 的元素，另一列作為col2的數字； 所以兩行都是“.9”。

結果：

col3     col4
[dog]   .9
[cat]   .9

我認為走一條從col2中刪除嵌套列表的路線很好。 但這比聽起來要難。 我已經嘗試了一個小時來移除那些尖括號。

嘗試：

spec_chars3 = ["[","]"]

for char in spec_chars3: # didn't work, turned everything to nan
    df1['avg_jaro_company_word_scores'] = df1['avg_jaro_company_word_scores'].str.replace(char, '')

df.col2.str.strip('[]') #didn't work b/c the nested list is still in a list, not a string

我什至不知道如何提取列表索引號並過濾 col1

Answer 1

根據問題末尾的解釋，似乎兩列都是str類型，需要轉換為list類型
- 將.applymap與ast.literal_eval .applymap使用。
- 如果只有一列是str類型，則使用df[col] = df[col].apply(literal_eval)
必須使用pandas.DataFrame.explode提取每列中的數據列表
- 正如RichieV在評論中闡明的那樣：
  - lamdba explode不需要參數，因為它應用於每一列，並且.apply將所有輸出收集到數據幀中。
  - 外部explode將值從列表轉換為標量（即[0.4]到0.4 ）。
一旦值位於不同的行上，請使用布爾索引選擇所需范圍內的數據。
如果要將df與df_new結合使用，請使用df.join(df_new, rsuffix='_extracted')

import pandas as pd
from ast import literal_eval

# setup the test data: this data is lists
# data = {'c1': [['abc', 'bcd', 'dog'], ['cat', 'bcd', 'def']], 'c2': [[[.4], [.5], [.9]], [[.9], [.5], [.4]]]}

# setup the test data: this data is strings
data = {'c1': ["['abc', 'bcd', 'dog', 'cat']", "['cat', 'bcd', 'def']"], 'c2': ["[[.4], [.5], [.9], [1.0]]", "[[.9], [.5], [.4]]"]}

# create the dataframe
df = pd.DataFrame(data)

# the description leads me to think the data is columns of strings, not lists
# convert the columns from string type to list type
# the following line is only required if the columns are strings
df = df.applymap(literal_eval)

# explode the lists in each column
df_new = df.apply(lambda x: x.explode()).explode('c2')

# use Boolean Indexing to select the desired data
df_new = df_new[df_new['c2'] >= 0.9]

# display(df_new)
    c1   c2
0  dog  0.9
1  cat  0.9

Answer 2

您可以使用列表理解來根據您的條件填充新列。

df['col3'] = [
    [value for value, score in zip(c1, c2) if score[0] >= 0.9]
    for c1, c2 in zip(df['col1'], df['col2'])
]
df['col4'] = [
    [score[0] for score in c2 if score[0] >= 0.9]
    for c2 in df['col2']

輸出

              col1                   col2   col3   col4
0  [abc, bcd, dog]  [[0.4], [0.5], [0.9]]  [dog]  [0.9]
1  [cat, bcd, def]  [[0.9], [0.5], [0.4]]  [cat]  [0.9]

如何從列表中提取數據作為字符串，並在熊貓中按值選擇數據？

問題描述

2 個解決方案

解決方案1
2 2020-09-24 21:55:17

解決方案2
1 已采納 2020-09-24 21:55:30

如何從列表中提取數據作為字符串，並在熊貓中按值選擇數據？

問題描述

2 個解決方案

解決方案1 2 2020-09-24 21:55:17

解決方案2 1 已采納 2020-09-24 21:55:30

解決方案1
2 2020-09-24 21:55:17

解決方案2
1 已采納 2020-09-24 21:55:30