簡體   English   中英

Python 刪除特定字符串之后的所有內容並循環遍歷數據幀中多列中的所有行

[英]Python remove everything after specific string and loop through all rows in multiple columns in a dataframe

我有一個充滿 URL 路徑的文件,如下所示,跨越我試圖清理的數據框中的 4 列:

Path1 = ["https://contentspace.global.xxx.com/teams/Australia/WA/Documents/Forms/AllItems.aspx?\
RootFolder=%2Fteams%2FAustralia%2FWA%2FDocuments%2FIn%20Scope&FolderCTID\
=0x012000EDE8B08D50FC3741A5206CD23377AB75&View=%7B287FFF9E%2DD60C%2D4401%2D9ECD%2DC402524F1D4A%7D"]

我想刪除我將其定義為“string1”的特定字符串之后的所有內容,並且我想遍歷定義為“df_MasterData”的數據幀中的所有 4 列:

string1 = "&FolderCTID"
import pandas as pd 

df_MasterData = pd.read_excel(FN_MasterData)

cols = ['Column_A', 'Column_B', 'Column_C', 'Column_D']

for i in cols:  

    # Objective: Replace "&FolderCTID", delete all string after
    string1 = "&FolderCTID"

    # Method 1
    df_MasterData[i] = df_MasterData[i].str.split(string1).str[0]
    
    # Method 2
    df_MasterData[i] = df_MasterData[i].str.split(string1).str[1].str.strip()
    
    # Method 3
    df_MasterData[i] = df_MasterData[i].str.split(string1)[:-1]

我進行了搜索和谷歌搜索,發現了類似的解決方案,但都沒有使用。

任何大師都可以對此有所了解嗎? 任何幫助表示贊賞。

下面添加的是 A 列和 B 列中這些 URL 的一些示例行:

Column_A = ['https://contentspace.global.xxx.com/teams/Australia/NSW/Documents/Forms/AllItems.aspx?\
RootFolder=%2Fteams%2FAustralia%2FNSW%2FDocuments%2FIn%20Scope%2FA%20I%20TOPPER%20GROUP&FolderCTID=\
0x01200016BC4CE0C21A6645950C100F37A60ABD&View=%7B64F44840%2D04FE%2D4341%2D9FAC%2D902BB54E7F10%7D',\
'https://contentspace.global.xxx.com/teams/Australia/Victoria/Documents/Forms/AllItems.aspx?RootFolder\
=%2Fteams%2FAustralia%2FVictoria%2FDocuments%2FIn%20Scope&FolderCTID=0x0120006984C27BA03D394D9E2E95FB\
893593F9&View=%7B3276A351%2D18C1%2D4D32%2DADFF%2D54158B504FCC%7D']

Column_B = ['https://contentspace.global.xxx.com/teams/Australia/WA/Documents/Forms/AllItems.aspx?\
RootFolder=%2Fteams%2FAustralia%2FWA%2FDocuments%2FIn%20Scope&FolderCTID=0x012000EDE8B08D50FC3741A5\
206CD23377AB75&View=%7B287FFF9E%2DD60C%2D4401%2D9ECD%2DC402524F1D4A%7D',\
'https://contentspace.global.xxx.com/teams/Australia/QLD/Documents/Forms/AllItems.aspx?RootFolder=%\
2Fteams%2FAustralia%2FQLD%2FDocuments%2FIn%20Scope%2FAACO%20GROUP&FolderCTID=0x012000E689A6C1960E8\
648A90E6EC3BD899B1A&View=%7B6176AC45%2DC34C%2D4F7C%2D9027%2DDAEAD1391BFC%7D']

這就是我要做的

首先用目標列聲明一個變量。 然后使用stack()str.split來獲取目標輸出。 最后, unstack並將輸出重新應用於原始 df。

cols_to_slice = ['ColumnA','ColumnB','ColumnC','ColumnD']
string1 = "&FolderCTID"

df[cols_to_slice].stack().str.split(string1,expand=True)[1].unstack(1)

在此處輸入圖片說明

如果你想在你的目標 df 中替換這些列,那么只需執行 -

df[cols_to_slice] = df[cols_to_slice].stack().str.split(string1,expand=True)[1].unstack(1)

您應該首先使用獲取字符串的索引

indexes = len(string1) + df_MasterData[i].str.find(string1)
# This selected the final location of this string
# if you don't want to add string in result just use below one
indexes = len(string1) + df_MasterData[i].str.find(string1)

現在做

df_MasterData[i] = df_MasterData[i].str[:indexes]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM