简体   繁体   English

遍历 Panda 的 df col 以删除 str

[英]Iterate through a Panda's df col to remove str

I'm trying to solve this issue.我正在努力解决这个问题。 Basically my column 'review' has a bunch of junk HTML stuff which was returned by soup.基本上,我的'review'专栏有一堆由汤返回的垃圾 HT​​ML 内容。 I am unsure how to remove this and have tried to iterate in various ways.我不确定如何删除它并尝试以各种方式进行迭代。 How would you iterate through the df and replace these values?您将如何遍历 df 并替换这些值? I would like them blank but I'm using HELLO for test.我希望它们是空白的,但我正在使用 HELLO 进行测试。

for index, row in enumerate(df['review']):
    row = df.replace('<div class="text show-more__control">', 'HELLO', inplace=False)
    df['review'] = row

You can use regular string functions such as replace() with DataFrame["columname"].str.replace() .您可以使用常规字符串函数,例如replace()DataFrame["columname"].str.replace() If you want to replace all values in the column by nothing you can just use DataFrame["columname"] = '' .如果你想用DataFrame["columname"] = ''替换列中的所有值,你可以使用DataFrame["columname"] = ''

Your approach is overkill, because iterating a dataframe with iterrows is slow as it iterates through all the rows (with all the columns for each one) and you really only want to apply the replacement to one column.您的方法是矫枉过正,因为用iterrows迭代数据帧很慢,因为它遍历所有行(每一行都有所有列),而您真的只想将替换应用于一列。

My advise will be to use a lambda funcion only applied to the column where you want the replacement:我的建议是使用仅适用于要替换的列的 lambda 函数:

df['column'] = df['column'].apply(lambda x: x.replace('replacethis', 'withthis'))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM