從pandas df列的預設字符串列表中拆分字符串

Question

我有一個熊貓數據框，如下所示。 它有大約一百萬行。

name = ['Jake','Matt', 'Henry']

0   A        
1 Jake Hill
2 Matt Dawn
3 Matt King
4 White Henry
5 Hyde Jake

我想迭代列表和df ['A']列並僅返回名字。 例如，最終的數據框應如下所示。

0   A
1  Jake
2  Matt
3  Matt
4  Henry
5  Jake

提前致謝。 我是python的新手，所以仍然找出最簡單的方法。

Answer 1

您有一個要匹配的名稱列表，以及要檢查的一系列名稱。 在這里使用帶str.extract的正則表達式。

df.A.str.extract(r'({})'.format('|'.join(name)))

       0
0   Jake
1   Matt
2   Matt
3  Henry
4   Jake

Answer 2

這是實現此目的的一種方法：

first_name = ['Jake','Matt', 'Henry']

df = pd.DataFrame({'A': ['Jake Hill', 'Matt Dawn', 'Matt King', 'Henry White', 'Jake Hyde']})

df['B'] = df['A'].str.split().apply(lambda x: x[0] if x[0] in first_name else ' '.join(x))

你得到：

             A      B
0    Jake Hill   Jake
1    Matt Dawn   Matt
2    Matt King   Matt
3  Henry White  Henry
4    Jake Hyde   Jake

Answer 3

你需要：

first_name = ['Jake','Matt', 'Henry']

df = pd.DataFrame({'A': ['Jake Hill', 'Matt Dawn', 'Matt King', 'Henry White','Jake Hyde','Dwayne John']})

def func(x):
    for k in first_name:
        if k in x:
            return k 
    return x

df['A'] = df['A'].apply(lambda x: func(x))

輸出：

            A
0           Jake
1           Matt
2           Matt
3          Henry
4           Jake
5    Dwayne John

Answer 4

name = ['Jake','Matt', 'Henry']
df = pd.read_csv("file.csv")

#filling nan values in-case if it is there
df.fillna(0, inplace = True)
df["First Name"] = df.A.apply(lambda x: list(set(x.split(" ")) & set(name))[0]  if x != 0 else "Not Found")

輸出：

             A First Name
0    Jake Hill       Jake
1    Matt Dawn       Matt
2    Matt King       Matt
3  Henry White      Henry
4    Hyde Jake       Jake

Answer 5

除了早期的編輯，我現在理解你想要替換，這可以用列表理解完成，如下所示，拆分列A Fist並選擇它的第一個索引並使用apply方法傳遞給lambda。

DataFrame結構：

df
             A
0    Jake Hill
1    Matt Dawn
2    Matt King
3  Henry White
4    Jake Hyde

你的name Var ..

$ name
['Jake', 'Matt', 'Henry']

您最終需要的數據集：

參數n可用於限制輸出中的分割數。

df['A'] = df['A'].str.split(n=1, expand=True)[0].apply(lambda x: x if x in name else ' '.join(x))

   print(df)
           A
    0   Jake
    1   Matt
    2   Matt
    3  Henry
    4   Jake

如果您沒有按下從Var獲取名稱並且最終目標是從數據幀中獲取名字，那么應該很簡單：

>>> df
             A
0    Jake Hill
1    Matt Dawn
2    Matt King
3  Henry White
4    Jake Hyde


>>> df['A'].str.split(n=1, expand=True)[0]
0     Jake
1     Matt
2     Matt
3    Henry
4     Jake
Name: 0, dtype: object

或者如果您想要替換A列A原位..

df['A'] = df['A'].str.split(n=1, expand=True)[0]

Answer 6

嘗試使用：

A_final=A[0].str.split(' ',expand=True, n=1).str.get(0) A_final[0] ，你的問題解決了。

Answer 7

此方法不會被包含其中一個名字字符串的姓氏所欺騙，例如“Matten”或“Jakes”，如果它們都在名字列表中找到，則將組合使用名字和姓氏，例如“馬特亨利”（在輸出數據框中顯示“MattHenry”）。

# split the name strings into columns as new dataframe
df1 = df.A.str.split(' ', expand=True)
# Keep the first names in the new dataframe and fill the rest with
# empty strings, then sum the df1 column string values to make a new array
names_result = np.where(df1.isin(name), df1, '').sum(axis=1)
# find the array indexes where no first names were found
no_match_idx = np.where(names_result == '')[0]
# fill the no first name index locations with original dataframe values
names_result[no_match_idx] = df.A.values[no_match_idx]
# make a dataframe using the results
df_out = pd.DataFrame(names_result, columns=['A'])

# to find names with a first and last name that are both found in the
# first names list:
# df_out['dups'] = df1.isin(name).sum(axis=1) > 1

從pandas df列的預設字符串列表中拆分字符串

問題描述

7 個解決方案

解決方案1
3 2018-11-20 05:59:16

解決方案2
2 2018-11-20 05:37:24

解決方案3
2 已采納 2018-11-20 05:37:45

解決方案4
0 2018-11-20 05:40:14

解決方案5
0 2018-11-20 05:44:29

解決方案6
0 2018-11-20 06:01:46

解決方案7
0 2018-11-21 02:00:20

從pandas df列的預設字符串列表中拆分字符串

問題描述

7 個解決方案

解決方案1 3 2018-11-20 05:59:16

解決方案2 2 2018-11-20 05:37:24

解決方案3 2 已采納 2018-11-20 05:37:45

解決方案4 0 2018-11-20 05:40:14

解決方案5 0 2018-11-20 05:44:29

解決方案6 0 2018-11-20 06:01:46

解決方案7 0 2018-11-21 02:00:20

解決方案1
3 2018-11-20 05:59:16

解決方案2
2 2018-11-20 05:37:24

解決方案3
2 已采納 2018-11-20 05:37:45

解決方案4
0 2018-11-20 05:40:14

解決方案5
0 2018-11-20 05:44:29

解決方案6
0 2018-11-20 06:01:46

解決方案7
0 2018-11-21 02:00:20