[英]Python pandas return value from other column
I have a file "specieslist.txt" which contain the following information: 我有一个文件“specieslist.txt”,其中包含以下信息:
Bacillus,genus
Borrelia,genus
Burkholderia,genus
Campylobacter,genus
Now, I want python to look for a variable in the first column (in this example "Campylobacter") and return the value of the second ("genus"). 现在,我希望python在第一列中查找变量(在此示例中为“Campylobacter”)并返回第二列(“genus”)的值。 I wrote the following code
我写了以下代码
import csv
import pandas as pd
species_import = 'Campylobacter'
df = pd.read_csv('specieslist.txt', header=None, names = ['species', 'level'] )
input = df.loc[df['species'] == species_import]
print (input['level'])
However, my code return too much, while I am only want "genus" 但是,我的代码返回太多,而我只想要“属”
3 genus
Name: level, dtype: object
You can select first value of Series by iat
: 您可以通过
iat
选择Series的第一个值:
species_import = 'Campylobacter'
out = df.loc[df['species'] == species_import, 'level'].iat[0]
#alternative
#out = df.loc[df['species'] == species_import, 'level'].values[0]
print (out)
genus
Better solution working if no value matched and empty Series
is returned - it return no match
: 如果没有值匹配且返回
empty Series
则更好的解决方案工作 - 它返回no match
:
@jpp comment
@jpp评论
This solution is better only when you have a large series and the matched value is expected to be near the top只有当您拥有一个大型系列且匹配值预计接近顶部时,此解决方案才会更好
species_import = 'Campylobacter'
out = next(iter(df.loc[df['species'] == species_import, 'level']), 'no match')
print (out)
genus
EDIT: 编辑:
Idea from comments, thanks @jpp: 来自评论的想法,谢谢@jpp:
def get_first_val(val):
try:
return df.loc[df['species'] == val, 'level'].iat[0]
except IndexError:
return 'no match'
print (get_first_val(species_import))
genus
print (get_first_val('aaa'))
no match
EDIT: 编辑:
df = pd.DataFrame({'species':['a'] * 10000 + ['b'], 'level':np.arange(10001)})
def get_first_val(val):
try:
return df.loc[df['species'] == val, 'level'].iat[0]
except IndexError:
return 'no match'
In [232]: %timeit next(iter(df.loc[df['species'] == 'a', 'level']), 'no match')
1.3 ms ± 33.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [233]: %timeit (get_first_val('a'))
1.1 ms ± 21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [235]: %timeit (get_first_val('b'))
1.48 ms ± 206 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [236]: %timeit next(iter(df.loc[df['species'] == 'b', 'level']), 'no match')
1.24 ms ± 10.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Performance of various methods, to demonstrate when it is useful to use next(...)
. 表现各种方法,以证明
next(...)
使用时有用next(...)
。
n = 10**6
df = pd.DataFrame({'species': ['b']+['a']*n, 'level': np.arange(n+1)})
def get_first_val(val):
try:
return df.loc[df['species'] == val, 'level'].iat[0]
except IndexError:
return 'no match'
%timeit next(iter(df.loc[df['species'] == 'b', 'level']), 'no match') # 123 ms per loop
%timeit get_first_val('b') # 125 ms per loop
%timeit next(idx for idx, val in enumerate(df['species']) if val == 'b') # 20.3 µs per loop
get
With pandas.Series.get
, you can return either a scalar value if the 'species'
is unique or a pandas.Series
if not unique. 使用
pandas.Series.get
,如果'species'
是唯一的,则可以返回标量值,或者如果不是唯一的,则返回pandas.Series
。
f = df.set_index('species').level.get
f('Campylobacter')
'genus'
If not in the data, you can provide a default 如果不在数据中,则可以提供默认值
f('X', 'Not In Data')
'Not In Data'
We could also use dict.get
and only return scalars. 我们也可以使用
dict.get
并只返回标量。 If not unique, this will return the last one. 如果不是唯一的,这将返回最后一个。
f = dict(zip(df.species, df.level)).get
If you want to return the first one, you can do that a few ways 如果你想返回第一个,你可以通过几种方式做到这一点
f = dict(zip(df.species[::-1], df.level[::-1])).get
Or 要么
f = df.drop_duplicates('species').pipe(
lambda d: dict(zip(d.species, d.level)).get
)
# Change the last line of your code to
print(input['level'].values)
# For Explanation refer below code
import csv
import pandas as pd
species_import = 'Campylobacter'
df = pd.read_csv('specieslist.txt', header=None, names = ['species', 'level'] )
input = df['species'] == species_import # return a pandas dataFrame
print(type(df[input])) # return a Pandas DataFrame
print(type(df[input]['level'])) # return a Pandas Series
# To obtain the value from this Series.
print(df[input]['level'].values) # return 'genus'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.