![](/img/trans.png)
[英]How to extract part of a string in Pandas column and make a new column
[英]How to extract a part of the string to another column
我有一列包含如下數據
虛擬數據:
df = pd.DataFrame(["Lyreco A-Type small 2i",
"Lyreco C-Type small 4i",
"Lyreco N-Part medium",
"Lyreco AKG MT 4i small",
"Lyreco AKG/ N-Type medium 4i",
"Lyreco C-Type medium 2i",
"Lyreco C-Type/ SNU medium 2i",
"Lyreco K-part small 4i",
"Lyreco K-Part medium",
"Lyreco SNU small 2i",
"Lyreco C-Part large 2i",
"Lyreco N-Type large 4i"])
我想創建一個額外的列來剝離數據並在每一行中為您提供所需的字符串部分(見下文)。 提取的列應如下所示
Column_1 Column_2
Lyreco A-Type small 2i A-Type
Lyreco C-Type small 4i C-Type
Lyreco N-Part medium N-Part
Lyreco STU MT 4i small STU MT
Lyreco AKG/ N-Type medium 4i AKG/ N-Type
Lyreco C-Type medium 2i C-Type
Lyreco C-Type/ SNU medium 2i C-Type/ SNU
Lyreco K-part small 4i K-part
Lyreco K-Part medium K-Part
Lyreco SNU small 2i SNU
Lyreco C-Part large 2i C-Part
Lyreco N-Type large 4i N-Type
如何從第一列中提取第 2 列? 任何線索都會有所幫助。
您可能會發現以下邏輯適用於您的數據:
df["Column_2"] = df["Column_1"].str.extract(r'\w+ (\S+(?: \S+)*) \b(?:small|medium|large)\b')
上述模式匹配從第二個詞直到達到small
、 medium
或large
關鍵字。 這是一個有效的正則表達式演示。
查看您發布的示例,拆分列值並返回“中間”項就足夠了。 您可以制作一個簡單的 function 來封裝邏輯並將其應用於 dataframe。
from math import floor
df = pd.DataFrame(
{'Columns_1':
["Lyreco A-Type small 2i",
"Lyreco C-Type small 4i",
"Lyreco N-Part medium",
"Lyreco AKG MT 4i small",
"Lyreco AKG/ N-Type medium 4i",
"Lyreco C-Type medium 2i",
"Lyreco C-Type/ SNU medium 2i",
"Lyreco K-part small 4i",
"Lyreco K-Part medium",
"Lyreco SNU small 2i",
"Lyreco C-Part large 2i",
"Lyreco N-Type large 4i"
]
}
)
def f(row):
blocks = row['Columns_1'].split()
mid_index = 1 if len(blocks) <= 4 else floor(len(blocks)/2)
return ' '.join(blocks[1:mid_index+1])
df['Columns_2'] = df.apply(f, axis=1)
print(df)
Output:
Columns_1 Columns_2
0 Lyreco A-Type small 2i A-Type
1 Lyreco C-Type small 4i C-Type
2 Lyreco N-Part medium N-Part
3 Lyreco AKG MT 4i small AKG MT
4 Lyreco AKG/ N-Type medium 4i AKG/ N-Type
5 Lyreco C-Type medium 2i C-Type
6 Lyreco C-Type/ SNU medium 2i C-Type/ SNU
7 Lyreco K-part small 4i K-part
8 Lyreco K-Part medium K-Part
9 Lyreco SNU small 2i SNU
10 Lyreco C-Part large 2i C-Part
11 Lyreco N-Type large 4i N-Type
df.columns = ['column_1']
df["column_2"] = [col.split(" ")[1] for col in df.column_1]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.