简体   繁体   中英

pandas query for string returns empty result

I have a data frame imported with read_csv, as in this sample, but when i use query to filter the content i get an empty result.

The expected result i'm looking to get from query is row 0 and 2.

(pandas v1.3.1, python v3.9)

df1 = pd.read_csv(r'C:\Users\Dorin\Desktop\folder_files\test_1.txt',
              encoding='utf-8',
              sep=';',
              names=["i_line", "f_path", "f_type", "f_hash"],
              dtype={'i_line': 'string', 'f_path': 'string', 'f_type': 'string', 'f_hash': 'string'},
              keep_default_na=False,
              na_values=['_'],
              index_col=False)

DataFrame print(df1)

  i_line        f_path f_type f_hash
0   i: 1   "content 1"      d    n/a
1   i: 2   "content 2"      f   1111
2   i: 3   "content 3"      d    n/a

Result of query print(df1.query("f_hash == 'n/a'"))

Empty DataFrame
Columns: [i_line, f_path, f_type, f_hash]
Index: []

File content

在此处输入图片说明

In your file, the separator is not ; but rather ; (with an optional space).

Thus your n/a is in fact a n/a

You have to change the separator in read_csv :

df1 = pd.read_csv('/tmp/t.csv',
              encoding='utf-8',
              sep='; ?',  ## sep is ";" with optional space
              names=["i_line", "f_path", "f_type", "f_hash"],
              dtype={'i_line': 'string', 'f_path': 'string', 'f_type': 'string', 'f_hash': 'string'},
              keep_default_na=False,
              na_values=['_'],
              index_col=False)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM