简体   繁体   English

Append 字符串在 for 循环中清空 pandas 列

[英]Append string to empty pandas column in a for loop

The code uses an OCR to read text from URLs in the list 'url_list'.该代码使用 OCR 从列表“url_list”中的 URL 读取文本。 I am trying to append the output in the form of a string 'txt' into an empty pandas column 'url_text'.我正在尝试将 append 和 output 以字符串“txt”的形式放入一个空的 pandas 列“url_text”。 However, the code does not append anything to the column 'url_text'?但是,代码对“url_text”列没有任何内容吗? When什么时候

df = pd.read_csv(r'path') # main dataframe

df['url_text'] = "" # create empty column that will later contain the text of the url_image
url_list = (df.iloc[:, 5]).tolist() # convert column with urls to a list 

print(url_list)

['https://pbs.twimg.com/media/ExwMPFDUYAEHKn0.jpg', 
'https://pbs.twimg.com/media/ExuBd4-WQAMgTTR.jpg', 
'https://pbs.twimg.com/media/ExuBd5BXMAU2-p_.jpg', 
' ',
'https://pbs.twimg.com/media/Ext0Np0WYAEUBXy.jpg', 
'https://pbs.twimg.com/media/ExsJrOtWUAMgVxk.jpg', 
'https://pbs.twimg.com/media/ExrGetoWUAEhOt0.jpg',
' ',
' ']
for img_url in url_list: # loop over all urls in list url_list
    try:
        img = io.imread(img_url) # convert image/url to cv2/numpy.ndarray format

        # Preprocessing of image
        gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
        (h, w) = gry.shape[:2]
        gry = cv2.resize(gry, (w*3, h*3))
        thr = cv2.threshold(gry, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]

        txt = pytesseract.image_to_string(thr)  # read tweet image text

        df['url_text'].append(txt)

        print(txt)
    except: # ignore any errors. Some of the rows does not contain a URL causing the loop to fail
        pass

print(df)

I couldn't test it but please try this, as you may need to create the list first and then add it as a new column to the df (I converted the list itself to dataframe and then concatenated to the original df)我无法对其进行测试,但请尝试一下,因为您可能需要先创建列表,然后将其作为新列添加到 df(我将列表本身转换为 dataframe,然后连接到原始 df)

txtlst=[]
for img_url in url_list: # loop over all urls in list url_list
    try:
        img = io.imread(img_url) # convert image/url to cv2/numpy.ndarray format

        # Preprocessing of image
        gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
        (h, w) = gry.shape[:2]
        gry = cv2.resize(gry, (w*3, h*3))
        thr = cv2.threshold(gry, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]

        txt = pytesseract.image_to_string(thr)  # read tweet image text
        txtlst.append(txt)


        print(txt)
    except: # ignore any errors. Some of the rows does not contain a URL causing the loop to fail
        txtlst.append("")
        pass
dftxt=pd.Dataframe({"url_text":txtlst})
df=pd.concat([df, dftxt, axis=1)
print(df)

As noted in the documentation for Series.append() , the append call works only between two series.Series.append()的文档中所述, append 调用仅在两个系列之间有效。

Better will be to create an empty list outside of the loop, append to that list of strings within the loop itself, and then insert that list into df["url_list"] = list_of_urls .更好的是在循环之外创建一个空列表, append 到循环本身内的字符串列表,然后将该列表插入df["url_list"] = list_of_urls This is also much faster at runtime than appending two series together repeatedly.这在运行时也比重复将两个系列附加在一起要快得多。

url_list = []

for ...:
    ...
    url_list.append(url_text)

df["url_list"] = url_list   

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM