[英]Regular Expression search/replace on columns with python pandas
以下是我正在嘗試對其進行一些數據操作的 a.csv 文件的一個小示例。 每個“comment”列都有自己的列,由半冒號(“date;user;comment”)分隔。 我的目標是在用戶前面加上“gp-”
原來的:
issue_key,summary,comment,comment,comment,comment,resolution
ABC-1234,summary1,"03/11/2021 12:18;user1;a text comment","03/10/2021 11:18;user2,a text comment",,,Unresolved
ABC-4321,summary2,"03/08/2021 12:10;user7;a text comment","03/10/2021 11:18;user5,a text comment",,,Unresolved
ABC-2214,summary3,"03/09/2021 12:20;user9;a text comment",,"03/10/2021 11:18;user3,a text comment",,Unresolved
我希望將其轉換為:
issue_key,summary,comment,comment,comment,comment,resolution
ABC-1234,summary1,"03/11/2021 12:18;gp-user1;a text comment","03/10/2021 11:18;gp-user2,a text comment",,,Unresolved
ABC-4321,summary2,"03/08/2021 12:10;gp-user7;a text comment","03/10/2021 11:18;gp-user5,a text comment",,,Unresolved
ABC-2214,summary3,"03/09/2021 12:20;gp-user9;a text comment",,"03/10/2021 11:18;gp-user3,a text comment",,Unresolved
我到目前為止的代碼。 我想我很接近:
with open(destination_filename) as f:
orig_header = f.readline()
orig_header = orig_header.split(",")
orig_header[-1] = orig_header[-1].strip()
csv_data = pd.read_csv(destination_filename)
cols = csv_data.columns[csv_data.columns.str[:7]=='Comment']
csv_data[cols] = csv_data[cols].apply(lambda x: re.sub(r'(\d+\/\d+\/\d\d\d\d \d+:\d+);(\S+);(.*)', r'\1;gp-\2;\3', str(x)))
csv_data.to_csv(f"{destination_filename}", index = False, header=orig_header)
一種方法是使用內置的csv
庫。 它也可用於將評論字段處理為;
分隔 csv 行。
例如:
import io
import csv
def replace_user(entry):
if len(entry):
values = next(csv.reader(io.StringIO(entry, newline=''), delimiter=';'))
values[1] = f'gp-{values[1]}'
entry = ';'.join(values)
return entry
with open('input.csv', newline='') as f_input, open('output.csv', 'w', newline='') as f_output:
csv_input = csv.reader(f_input)
csv_output = csv.writer(f_output)
csv_output.writerow(next(csv_input)) # copy the header
for row in csv_input:
row[2:6] = [replace_user(v) for v in row[2:6]]
csv_output.writerow(row)
給你一個output.csv
包含:
issue_key,summary,comment,comment,comment,comment,resolution
ABC-1234,summary1,03/11/2021 12:18;gp-user1;a text comment,"03/10/2021 11:18;gp-user2,a text comment",,,Unresolved
ABC-4321,summary2,03/08/2021 12:10;gp-user7;a text comment,"03/10/2021 11:18;gp-user5,a text comment",,,Unresolved
ABC-2214,summary3,03/09/2021 12:20;gp-user9;a text comment,,"03/10/2021 11:18;gp-user3,a text comment",,Unresolved
如果注釋也可以有引號或換行符,則可以使用額外的csv.writer()
代替join()
。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.