如何擺脫 pandas 中每一行的每個列表中的字符串

Question

假設我在 pandas 中有一個字符串列，其中每一行都由字符串列表組成

Class	學生
一	[亞當、坎耶、愛麗絲·斯托克斯、約瑟夫·馬修]
二	[賈斯汀比伯，賽琳娜戈麥斯]

我想刪除每個 class 中字符串長度超過 8 個字符的所有名稱。

所以結果表將是：

Class	學生
一	亞當，坎耶

大部分數據都會消失，因為只有 Adam 和 Kanye 滿足 len(StudentName)<8 的條件

我嘗試自己提出一個.apply過濾器，但似乎代碼在每個字符級別而不是單詞上運行，有人能指出我哪里出錯了嗎？

這是代碼： [[y for y in x if not len(y)>=8] for x in df['Student']]

Answer 1

檢查下面的代碼。 似乎您沒有定義需要拆分的內容，因此事情會自動拆分為字符級別。

import pandas as pd 
df = pd.DataFrame({'Class':['One','Two'],'Student':['[Adam, Kanye, Alice Stocks, Joseph Matthew]', '[Justin Bieber, Selena Gomez]'],
                   })
df['Filtered_Student'] = df['Student'].str.replace("\[|\]",'').str.split(',').apply(lambda x: ','.join([i for i in x if len(i)<8]))
df[df['Filtered_Student'] != '']

Output：

Answer 2

IIUC，這輛面包車可以在單線np.where中完成：

import pandas as pd
import numpy as np

df = pd.DataFrame( {'Class': ['One', 'Two'], 'Student': [['Adam', 'Kanye', 'Alice Stocks', 'Joseph Matthew'], ['Justin Bieber', 'Selena Gomez']]})

df.explode('Student').iloc[np.where(df.explode('Student').Student.str.len() <= 8)].groupby('Class').agg(list).reset_index()

Output：

  Class        Student
0   One  [Adam, Kanye]

Answer 3

# If they're not actually lists, but strings:
if isinstance(df.Student[0], str):
    df.Student = df.Student.str[1:-1].str.split(', ')

# Apply your filtering logic:
df.Student = df.Student.apply(lambda s: [x for x in s if len(x)<8])

Output：

  Class        Student
0   One  [Adam, Kanye]
1   Two             []

如何擺脫 pandas 中每一行的每個列表中的字符串

問題描述

3 個解決方案

解決方案1
1 2022-07-27 15:49:54

解決方案2
1 2022-07-27 15:56:05

解決方案3
1 2022-07-27 16:12:40

如何擺脫 pandas 中每一行的每個列表中的字符串

問題描述

3 個解決方案

解決方案1 1 2022-07-27 15:49:54

解決方案2 1 2022-07-27 15:56:05

解決方案3 1 2022-07-27 16:12:40

解決方案1
1 2022-07-27 15:49:54

解決方案2
1 2022-07-27 15:56:05

解決方案3
1 2022-07-27 16:12:40