[英]How would I find the longest string per row in a data frame and print the row number if it exceeds a certain amount
我想編寫一個程序來搜索數據框,如果其中的任何項目超過 50 個字符,打印行號並詢問是否要繼續搜索數據框。
threshold = 50
mask = (df.drop(columns=exclude, errors='ignore')
.apply(lambda s: s.str.len().ge(threshold))
)
out = df.loc[~mask.any(axis=1)]
我嘗試使用它,但我不想刪除行,只打印字符串超過 50 的行號
輸入:
0 "Robert","20221019161921","London"
1 "Edward","20221019161921","London"
2 "Johnny","20221019161921","London"
3 "Insane string which is way too longggggggggggg","20221019161921","London"
Output:
Row 3 is above the 50-character limit.
我還希望程序打印太長的特定值或字符串。
您可以使用:
exclude = []
threshold = 30
mask = (df.drop(columns=exclude, errors='ignore')
.apply(lambda s: s.str.len().ge(threshold))
)
s = mask.any(axis=1)
for idx in s[s].index:
print(f'row {idx} is above the {threshold}-character limit.')
s2 = mask.loc[idx]
for string in df.loc[idx, s2.reindex(df.columns, fill_value=False)]:
print(string)
Output:
row 3 is above the 30-character limit.
"Insane string which is way too longggggggggggg","20221019161921","London"
s
:
0 False
1 False
2 False
3 True
dtype: bool
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.