[英]Removing Rows in Python DataFrame rows using conditional
I'm trying to remove rows of data that I don't need after importing from files and concatenating my list of dataframes.我正在尝试删除从文件导入并连接我的数据框列表后不需要的数据行。 Here is what my current DataFrame looks like:
这是我当前的 DataFrame 的样子:
Best Movie
0 Movie: Orphan
1 2.
2 Movie: Avatar
3 3.
4 Movie: Inglourious Basterds
... ...
2371 Movie: The Deep End of the Ocean
2372 49.
2373 Movie: Drop Dead Gorgeous
2374 50.
2375 Movie: Go
I need to remove all rows with just the number in them so result looks like this:我需要删除所有仅包含数字的行,因此结果如下所示:
Best Movie
0 Movie: Orphan
2 Movie: Avatar
4 Movie: Inglourious Basterds
... ...
2371 Movie: The Deep End of the Ocean
2373 Movie: Drop Dead Gorgeous
2375 Movie: Go
Thank you for your help谢谢您的帮助
One solution using str.match
一种使用
str.match
的解决方案
mask = ~df["Best Movie"].str.match(r"^\s*\d+\.$")
res = df[mask]
print(res)
Output Output
Best Movie
0 Movie: Orphan
2 Movie: Avatar
4 Movie: Inglourious Basterds
5 Movie: The Deep End of the Ocean
7 Movie: Drop Dead Gorgeous
9 Movie: Go
UPDATE更新
To replace "Movie:" and reset the index, do:要替换“电影:”并重置索引,请执行以下操作:
res = df[mask].reset_index()
res = res["Best Movie"].str.replace(r"^\s*Movie:", "", regex=True)
print(res)
Output Output
0 Orphan
1 Avatar
2 Inglourious Basterds
3 The Deep End of the Ocean
4 Drop Dead Gorgeous
5 Go
Name: Best Movie, dtype: object
You can do:你可以做:
df.loc[~df['Best Movie'].str.match('^\d+.$')]
Sample input样本输入
df = pd.DataFrame({
"Best_Movie": ["Movie: Orphan", "2.", "Movie: Avatar", "3."]
})
apply pd.to_numeric.应用 pd.to_numeric。 the rows with only numbers will be converted to float and others will be marked as NaN.
只有数字的行将被转换为浮点数,其他行将被标记为 NaN。
df["nums"] = pd.to_numeric(df['Best_Movie'], errors='coerce')
extract rows which has text (ie rows marked as nan )提取具有文本的行(即标记为 nan 的行)
df.loc[df.nums.isnull(), "Best_Movie"]
Sample output样品 output
0 Movie: Orphan
2 Movie: Avatar
Name: Best_Movie, dtype: object
Try the following.试试下面的。 '|'
'|' is basically means or in this case
基本上是手段或在这种情况下
df[~df['Best Movie'].str.contains('|'.join(str(i) for i in range(10)))]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.