简体   繁体   English

如何替换python中包含引号的字符串?

[英]How to replace a string contains quotation marks in python?

I have a CSV file is about HTML code.我有一个 CSV 文件大约是 HTML 代码。 在此处输入图像描述

import pandas as pd
import numpy as np
import csv 
import seaborn as sns
import re
import os
pd.set_option("display.max_rows",1000000000)
pd.set_option("display.max_columns",1000000000)

dirs = os.listdir('DataCollectionCA/')
for i in dirs:
    if os.path.splitext(i)[1] == ".csv":
        print(i)

dirss = 'DataCollectionCA/'

print("<div class=\"\"ContentGrid\"\">")

df = pd.read_csv(dirss+"7197409.csv") #導入資料
df_num = len(df) #計算有多少行
print(df_num)
real_df_num = df_num+1
with open ('719740999999.csv', 'a' ,newline='', encoding="utf-8") as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['互動作者','發表時間','互動內容'])

for post in range(1,real_df_num):
    with open (dirss+'7197409.csv', newline='', encoding="utf-8") as csvfile: 
        reader = csv.reader(csvfile)
        column0 = [row[0] for row in reader]
        for i, rows in enumerate(column0):
            if i == post:
                row000 = rows
    with open (dirss+'7197409.csv', newline='', encoding="utf-8") as csvfile: 
        reader = csv.reader(csvfile)
        column1 = [row[1] for row in reader]
        for j, rows in enumerate(column1):
            if j == post:
                row001 = rows
    with open (dirss+'7197409.csv', newline='', encoding="utf-8") as csvfile: 
        reader = csv.reader(csvfile)
        column2 = [row[2] for row in reader]
        for k, rows in enumerate(column2):
            if k == post:
                row002 = rows
    author = row000
    res_time = row001
    original_html_code = row002
    new_html_code_01 = original_html_code.replace('"<div class=""ContentGrid"">', " ")
    new_html_code_02 = new_html_code_01.replace('<br>', " ")
    print(new_html_code_02)
    print("======")
    with open ('719740999999.csv', 'a' ,newline='', encoding="utf-8") as csvfile: 
        writer = csv.writer(csvfile)
        writer.writerow([author,res_time,new_html_code_02])

I want to use Python to replace the following string (it is HTML code):我想使用 Python 来替换以下字符串(它是 HTML 代码):

"<div class=""ContentGrid"">

<img data-icons="":~("" src=""

"" onload=""DrawImage(this)"" width=""300"" height=""617"">

and so on.等等。

I tried to use the following code to do it, but it was failed.我尝试使用以下代码来执行此操作,但失败了。 I want to replace to blank.我想替换为空白。

new_html_code_02 = re.sub('<div class=\"\"ContentGrid\"\">', " ", new_html_code_01)
new_html_code_02 = re.sub('<div class=""ContentGrid"">', " ", new_html_code_01)

The new file still shows these string.新文件仍然显示这些字符串。 I don't know what to solve.我不知道要解决什么。

I'm not exactly sure what you want, but the second replacement statement you tried worked for me.我不确定您想要什么,但是您尝试的第二个替换语句对我有用。 You don't need to escape the qoutes ( " ). If you only want to replace static expressions, you don't even need to use regex, you could also use Python's replace() method of the string type:你不需要转义 qoutes ( " )。如果你只想替换 static 表达式,你甚至不需要使用正则表达式,你也可以使用 Python 的string类型的replace()方法:

import re


html = '<div class=""ContentGrid"">' \
       '<img data-icons="":~("" src=""' \
       '"" onload=""DrawImage(this)"" width=""300"" height=""617"">'

new_html = re.sub('<div class=""ContentGrid"">', " ", html)
print(new_html)

new_html = html.replace('<div class=""ContentGrid"">', " ")
print(new_html)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM