如何使用python根据句子中的关键字从xlsx文件中过滤数据？

Question

I scraped some online data using Twitter scraper. 我使用Twitter抓取工具抓取了一些在线数据。 I know I can filter this fairly easily using excel, and I did export the data to an xlsx. 我知道我可以使用excel轻松过滤此数据，并且确实将数据导出到xlsx。 But, I want to filter using Python. 但是，我想使用Python进行过滤。 I scraped data containing Hurricane Dorian . 我抓取了包含Hurricane Dorian数据。 Also, I want to filter everything that does not include the word "Bahamas" in it. 另外，我想过滤所有不包含"Bahamas"一词的内容。 How would I do this? 我该怎么做？

Thank you! 谢谢！

from twitterscraper import query_tweets
import datetime as dt
import pandas as pd

begin_date = dt.date(2019, 7, 1)
end_date = dt.date(2019, 9, 9)

limit = 1000
lang = 'english'

tweets = query_tweets('Hurricane Dorian', begindate = begin_date, enddate = end_date, limit = limit, lang = lang)

df = pd.DataFrame(t.__dict__ for t in tweets)

export_excel = df.to_excel (r'C:\Users\victo\Desktop\HurricaneData.xlsx', index = None, header=True)

Answer 1

You can use the str functions in pandas to filter. 您可以在熊猫中使用str函数进行过滤。 See pandas help on indexing. 请参阅熊猫的索引帮助。 Here's the specific answer (code) for your posted questions: 这是您发布的问题的特定答案（代码）：

from twitterscraper import query_tweets 
import datetime as dt 
import pandas as pd

begin_date = dt.date(2019, 7, 1) 
end_date = dt.date(2019, 9, 9)

limit = 1000 
lang = 'english'

tweets = query_tweets(
    'Hurricane Dorian', 
    begindate = begin_date, 
    enddate = end_date, 
    limit = limit, 
    lang = lang
)

# Convert to dataframe
df = pd.DataFrame(t.__dict__ for t in tweets)

# make a boolean mask
filt = df['text'].str.contains('Bahamas')

# compare the lengths of the dataframes
print(df.shape)
print(df.loc[filt].shape)

You can see the unfiltered df has 340 rows. 您可以看到未过滤的df有340行。 Restricting it to rows where the text had 'Bahamas' reduced it to 55 rows. 将其限制为文本带有“巴哈马”的行，将其减少到55行。

(340, 16) （340，16）

(55, 16) （55，16）

To keep the ones that were true, reassign it using the filter: 要保留真实的内容，请使用过滤器将其重新分配：

df = df.loc[filt]

Or you could assign it to a new dataframe if you want to preserve the original raw data. 或者，如果您要保留原始原始数据，则可以将其分配给新的数据框。

如何使用python根据句子中的关键字从xlsx文件中过滤数据？

问题描述

1 个解决方案

解决方案1
0 2019-09-09 00:06:07

如何使用python根据句子中的关键字从xlsx文件中过滤数据？

问题描述

1 个解决方案

解决方案1 0 2019-09-09 00:06:07

解决方案1
0 2019-09-09 00:06:07