[英]How to pick part of a string in a column with multiple values
I have a.tsv Dataframe with a specific column with more than one value separated by commas.我有一个 .tsv Dataframe 具有一个特定列,其中多个值用逗号分隔。 It looks like this:它看起来像这样:
Col1 Col2 Col3
1 star1 HIP1, KOI1, Gaia1 3.4
2 star2 HIP2, KOI2, Gaia2 4.3
3 star3 HIP3, KOI3, Gaia3 7.2
My objective is to take only part of the string value from column 2, so that I only have one of the options separated by commas.我的目标是只从第 2 列中提取部分字符串值,这样我就只有一个选项,用逗号分隔。 In this case, it would be the KOIs.在这种情况下,它将是 KOI。 It'd look like this:它看起来像这样:
Col1 Col2 Col3
1 star1 KOI1 3.4
2 star2 KOI2 4.3
3 star3 KOI3 7.2
Is there a way to do it, considering the numbers right after KOI(x) do not follow an ordinal order (as in the example)?考虑到 KOI(x) 之后的数字不遵循序数顺序(如示例中所示),有没有办法做到这一点? I've tried using the str.lsplit()
and split function, but the code returns the message: 'StringMethods' object has no attribute 'lsplit'
.我尝试使用str.lsplit()
并拆分 function,但代码返回消息: 'StringMethods' object has no attribute 'lsplit'
。 This is what I tried:这是我尝试过的:
for i in df['Col2']:
df['Col2'][i] = df['Col2'].str.lsptrip(', K').str[0]
I would then try adding the missing 'K' letter to the string, when I had it isolated, but never got to that part.然后,当我将其隔离时,我会尝试将缺少的“K”字母添加到字符串中,但从未到达那部分。
You could use pd.Series.str.extract
too:您也可以使用pd.Series.str.extract
:
df['Col2']=df['Col2'].str.extract('.*, (K.*), .*')
Same as this, with pd.Series.str.split
:与此相同,使用pd.Series.str.split
:
df['Col2']=df['Col2'].str.split(', ').str[1]
Output: Output:
df
Col1 Col2 Col3
1 star1 KOI1 3.4
2 star2 KOI2 4.3
3 star3 KOI3 7.2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.