簡體   English   中英

如何將字符串的一部分提取到另一列

[英]How to extract a part of the string to another column

我有一列包含如下數據

虛擬數據:

df = pd.DataFrame(["Lyreco A-Type small 2i",
"Lyreco C-Type small 4i",
"Lyreco N-Part medium", 
"Lyreco AKG MT 4i small",
"Lyreco AKG/ N-Type medium 4i",
"Lyreco C-Type medium 2i",
"Lyreco C-Type/ SNU medium 2i",
"Lyreco K-part small 4i",
"Lyreco K-Part medium", 
"Lyreco SNU small 2i",
"Lyreco C-Part large 2i",
"Lyreco N-Type large 4i"])

我想創建一個額外的列來剝離數據並在每一行中為您提供所需的字符串部分(見下文)。 提取的列應如下所示

Column_1                      Column_2
Lyreco A-Type small 2i         A-Type
Lyreco C-Type small 4i         C-Type
Lyreco N-Part medium           N-Part
Lyreco STU MT 4i small         STU MT
Lyreco AKG/ N-Type medium 4i   AKG/ N-Type
Lyreco C-Type medium 2i        C-Type
Lyreco C-Type/ SNU medium 2i   C-Type/ SNU       
Lyreco K-part small 4i         K-part
Lyreco K-Part medium           K-Part
Lyreco SNU small 2i            SNU
Lyreco C-Part large 2i         C-Part
Lyreco N-Type large 4i         N-Type

如何從第一列中提取第 2 列? 任何線索都會有所幫助。

您可能會發現以下邏輯適用於您的數據:

df["Column_2"] = df["Column_1"].str.extract(r'\w+ (\S+(?: \S+)*) \b(?:small|medium|large)\b')

上述模式匹配從第二個詞直到達到smallmediumlarge關鍵字。 這是一個有效的正則表達式演示

查看您發布的示例,拆分列值並返回“中間”項就足夠了。 您可以制作一個簡單的 function 來封裝邏輯並將其應用於 dataframe。

from math import floor

df = pd.DataFrame(
    {'Columns_1':
     ["Lyreco A-Type small 2i",
      "Lyreco C-Type small 4i",
      "Lyreco N-Part medium", 
      "Lyreco AKG MT 4i small",
      "Lyreco AKG/ N-Type medium 4i",
      "Lyreco C-Type medium 2i",
      "Lyreco C-Type/ SNU medium 2i",
      "Lyreco K-part small 4i",
      "Lyreco K-Part medium", 
      "Lyreco SNU small 2i",
      "Lyreco C-Part large 2i",
      "Lyreco N-Type large 4i"
     ]
    }
)


def f(row):
    blocks = row['Columns_1'].split()
    mid_index = 1 if len(blocks) <= 4 else floor(len(blocks)/2)
    return ' '.join(blocks[1:mid_index+1])

df['Columns_2'] = df.apply(f, axis=1)

print(df)

Output:

                       Columns_1    Columns_2
0         Lyreco A-Type small 2i       A-Type
1         Lyreco C-Type small 4i       C-Type
2           Lyreco N-Part medium       N-Part
3         Lyreco AKG MT 4i small       AKG MT
4   Lyreco AKG/ N-Type medium 4i  AKG/ N-Type
5        Lyreco C-Type medium 2i       C-Type
6   Lyreco C-Type/ SNU medium 2i  C-Type/ SNU
7         Lyreco K-part small 4i       K-part
8           Lyreco K-Part medium       K-Part
9            Lyreco SNU small 2i          SNU
10        Lyreco C-Part large 2i       C-Part
11        Lyreco N-Type large 4i       N-Type
df.columns = ['column_1']

df["column_2"] = [col.split(" ")[1] for col in df.column_1]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM