bugfix> r > 投稿

文字列の最初の数字の単語位置を見つける関数をRで作成するにはどうすればよいですか?

例えば:

string1 <- "Hello I'd like to extract where the first 1010 is in this string"
#desired_output for string1
9
string2 <- "80111 is in this string"
#desired_output for string2
1
string3 <- "extract where the first 97865 is in this string"
#desired_output for string3
5

回答 5 件
  • 目的の出力を返す方法は次のとおりです。

    library(stringr)
    min(which(!is.na(suppressWarnings(as.numeric(str_split(string, " ", simplify = TRUE))))))
    
    

    これがその仕組みです:

    str_split(string, " ", simplify = TRUE) # converts your string to a vector/matrix, splitting at space
    as.numeric(...) # tries to convert each element to a number, returning NA when it fails
    suppressWarnings(...) # suppresses the warnings generated by as.numeric
    !is.na(...) # returns true for the values that are not NA (i.e. the numbers)
    which(...) # returns the position for each TRUE values
    min(...) # returns the first position
    
    

    出力:

    min(which(!is.na(suppressWarnings(as.numeric(str_split(string1, " ", simplify = TRUE))))))
    [1] 9
    min(which(!is.na(suppressWarnings(as.numeric(str_split(string2, " ", simplify = TRUE))))))
    [1] 1
    min(which(!is.na(suppressWarnings(as.numeric(str_split(string3, " ", simplify = TRUE))))))
    [1] 5
    
    

  • 私はただ使うだろう grep そして strsplit ここでベースRオプションについて:

    sapply(input, function(x) grep("\\d+", strsplit(x, " ")[[1]]))
    Hello I'd like to extract where the first 1010 is in this string
                                                                   9
                                             80111 is in this string
                                                                   1
                     extract where the first 97865 is in this string
                                                                   5
    
    

    データ:

    input <- c("Hello I'd like to extract where the first 1010 is in this string",
               "80111 is in this string",
               "extract where the first 97865 is in this string")
    
    

  • 次のことを試してください。

    library(stringr)
    position_first_number <- function(string, match) {
      min(which(str_detect(str_split(string, "\\s+", simplify = TRUE), "[0-9]+")))
    }
    
    

    あなたの例の文字列で:

    > string1 <- "Hello I'd like to extract where the first 1010 is in this string"
    > position_first_number(string1)
    [1] 9
     
    > string2 <- "80111 is in this string"
    > position_first_number(string2)
    [1] 1
     
    > string3 <- "extract where the first 97865 is in this string"
    > position_first_number(string3)
    [1] 5
    
    

  • これがを使用した基本的なソリューションです rapply() w / grep() の結果を再帰する strsplit() 文字列のベクトルを処理します。

    注:スワップ " " そして fixed = TRUE"\\s+" そして fixed = FALSE (デフォルト)リテラルスペースではなく空白で文字列を分割する場合。

    rapply(strsplit(strings, " ", fixed = TRUE), function(x) grep("[0-9]+", x))
    [1] 9 1 5
    
    

    データ

    strings = c("Hello I'd like to extract where the first 1010 is in this string", 
                "80111 is in this string", "extract where the first 97865 is in this string")
    
    

  • これが別のアプローチです。最初の数字の最初の桁の後の残りの文字を削除できます。次に、空白の数を数えます。したがって、位置は空白の数に1を加えたものになります。

    first_numeric_word <- function(x) {
      x <- substr(x, 1L, regexpr("\\d+", x))
      nchar(x) - nchar(gsub(" ", "", x)) + 1L
    }
    
    

    出力

    > first_numeric_word(x)
    [1] 9 1 5
    
    

    データ

    x <- c(
      "Hello I'd like to extract where the first 1010 is in this string", 
      "80111 is in this string", 
      "extract where the first 97865 is in this string"
    )
    
    

あなたの答え