應用轉換並連接現有數據框中的多個列，以在Pandas中形成新的數據框

Question

假設我有一個如下數據框：

import pandas as pd

df1 = pd.DataFrame({
    'A' : ['foo ', 'b,ar', 'fo...o', 'bar', 'foo', 'bar', 'foo', 'foo'],
    'B' : ['one', 'one', 'two', 'three','two', 'two', 'one', 'three'],
})

我想創建一個新的數據幀 df2 ，它是df1中列'A'和'B'的連接形式，其中每個數據都是大寫的。 這是一個玩具示例，在我的用例中， 我可能還有不止列'A'和'B'，所以我想使列列表變量（也就是列的名稱可以變化） 。

def tokenize(s):
    # replaces comma with space; removes non-alphanumeric chars; etc.
    return re.sub('[^0-9a-zA-Z\s]+', '', re.sub('[,]+', ' ', s)).lower().split()

df2 = pd.DataFrame() # create a new dataframe; not sure if I'm doing this right
cols_to_concat = ['A','B'] # there can be more than two columns in this list
for col in cols_to_concat:
    df2 = df1[col].apply(tokenize).apply(str.upper)
print(df2)
# here, I'd like the df2 to have just ONE column whose rows are 'FOOONE', 'BARONE', 'FOOTWO', 'BARTHREE','FOOTWO', 'BARTWO','FOOONE','FOOTHREE',...

Answer 1

簡潔版本

list_o_cols = ['A', 'B']

df1[list_o_cols].sum(1).str.upper()

0      FOOONE
1      BARONE
2      FOOTWO
3    BARTHREE
4      FOOTWO
5      BARTWO
6      FOOONE
7    FOOTHREE
dtype: object

df2 = df1[list_o_cols].sum(1).str.upper().str.replace('O', '').to_frame('col_name')
df2

   col_name
0       FNE
1     BARNE
2       FTW
3  BARTHREE
4       FTW
5     BARTW
6       FNE
7    FTHREE

Answer 2

ConcatCol = ['A', 'B']

df2 = pd.DataFrame(df1[ConcatCol].apply(lambda x: ''.join(x.str.upper()), axis=1), columns=['Col'])

根據你的評論你可以在lambda函數之后應用你的函數，如果你想連接然后應用你的函數：

# your function
def tokenize(s):
    # replaces comma with space; removes non-alphanumeric chars; etc.
    return re.sub('[^0-9a-zA-Z\s]+', '', re.sub('[,]+', ' ', s)).lower().split()

ConcatCol = ['A', 'B']

df2 = pd.DataFrame(df1[ConcatCol].apply(lambda x:  ''.join(x), axis=1).apply(tokenize), columns=['Col'])

       Col
0   [foo, one]
1   [b, arone]
2   [footwo]
3   [barthree]
4   [footwo]
5   [bartwo]
6   [fooone]
7   [foothree]

要首先應用您的函數然后concat會有一個稍微不同的答案，因為您的函數使用split來創建列表。 所以，最終，你只需要使用sum將列表組合在一起：

def tokenize(s):
    # replaces comma with space; removes non-alphanumeric chars; etc.
    return re.sub('[^0-9a-zA-Z\s]+', '', re.sub('[,]+', ' ', s)).lower().split()

ConcatCol = ['A', 'B']

df2 = pd.DataFrame(df1[ConcatCol].apply(lambda x: (x.apply(tokenize))).sum(axis=1), columns=['Col'])

       Col
0   [foo, one]
1   [b, ar, one]
2   [foo, two]
3   [bar, three]
4   [foo, two]
5   [bar, two]
6   [foo, one]
7   [foo, three]

應用轉換並連接現有數據框中的多個列，以在Pandas中形成新的數據框

問題描述

2 個解決方案

解決方案1
2 2018-08-23 21:04:18

簡潔版本

解決方案2
1 已采納 2018-08-23 21:02:35

應用轉換並連接現有數據框中的多個列，以在Pandas中形成新的數據框

問題描述

2 個解決方案

解決方案1 2 2018-08-23 21:04:18

簡潔版本

解決方案2 1 已采納 2018-08-23 21:02:35

解決方案1
2 2018-08-23 21:04:18

解決方案2
1 已采納 2018-08-23 21:02:35